Jan 1, 1995

T. G. Dietterich, G. Bakiri

## Introduction

The task of learning from examples is to find an approximate definition for an unknown function f(x) given training examples of the form h xi ; f(xi) i. For cases in which f takes only the values f 0 ; 1 g |binary functions|there are many algorithms available. For example, the decision-tree methods, such as C4.5 (Quinlan, 1993) and CART (Breiman, Friedman, Olshen, & Stone, 1984) can construct trees whose leaves are labeled with binary values.

Most artificial neural network algorithms, such as the perceptron algorithm (Rosenblatt, 1958) and the error backpropagation (BP) algorithm (Rumelhart, Hinton, & Williams, 1986), are best suited to learning binary functions. Theoretical studies of learning have focused almost entirely on learning binary functions (Valiant, 1984; Natarajan, 1991).

## Real-World Learning Tasks

In many real-world learning tasks, however, the unknown function f often takes values from a discrete set of "classes": f ci ; : : : ; ck g . For example, in medical diagnosis, the function might map a description of a patient to one of k possible diseases. In digit recognition (e.g., LeCun, Boser, Denker, Henderson, Howard, Hubbard, & Jackel, 1989), the function maps each hand-printed digit to one of k = 10 classes. Phoneme recognition systems (e.g., Waibel, Hanazawa, Hinton, Shikano, & Lang, 1989) typically classify a speech segment into one of 50 to 60 phonemes.

## The Direct Multiclass Approach

Decision-tree algorithms can be easily generalized to handle these "multiclass" learning tasks. Each leaf of the decision tree can be labeled with one of the k classes, and internal nodes can be selected to discriminate among these classes. We will call this the direct multiclass approach.

## Other Approaches

Connectionist algorithms are more difficult to apply to multiclass problems. The standard approach is to learn k individual binary functions f1 ; : : : ; fk , one for each class. To assign a new case, x , to one of these classes, each of the fi is evaluated on x , and x is assigned the class j of the function fj that returns the highest activation (Nilsson, 1965). We will call this the one-per-class approach, since one binary function is learned for each class.

## Alternatives

An alternative approach explored by some researchers is to employ a distributed output code . This approach was pioneered by Sejnowski and Rosenberg (1987) in their widely-known NETtalk system. Each class is assigned a unique binary string of length n; we will refer to these strings as "code words." Then n binary functions are learned, one for each bit position in these binary strings. During training for an example from class i, the desired outputs of these n binary functions are specified by the code word for class i .

## Conclusion

In this paper, we compare the performance of the error-correcting code approach to the three existing approaches: the direct multiclass method (using decision trees), the one-per-class method, and (in the NETtalk task only) the meaningful distributed output representation approach. We show that error-correcting codes produce uniformly better generalization performance across a variety of multiclass domains for both the C4.5 decision-tree learning algorithm and the backpropagation neural network learning algorithm.

# Sign up to AI First Newsletter

# Characterizing Video Responses in Social...

ByFabricio Benevenuto,...

#### Apr 30, 2008

# Morita Equivalence of C^*-Crossed Produc...

ByNandor Sieben

#### Oct 3, 2010

# Toward Ethical Robotic Behavior in Human...

ByShengkang Chen, Vidu...

#### Jun 21, 2022

# StegNet: Mega Image Steganography Capaci...

ByPin Wu, Yang Yang, X...