1) Definition

  • Multiclass classification = predicting one label out of 3 or more possible classes for each input.
  • Each sample belongs to exactly one class (unlike multilabel classification, where samples can have multiple labels).

Example:

  • Handwritten digit recognition (0–9).
  • Animal classification (cat, dog, horse).

2) Formal Setup

  • Input space: $X \in \mathbb{R}^d$.
  • Label space: $Y \in \{1, 2, …, K\}$, where $K > 2$.
  • Goal: learn a function $f: X \to Y$.

3) Common Approaches

a) One-vs-Rest (OvR)

  • Train $K$ binary classifiers, one per class (“is it class i or not?”).
  • Choose class with highest confidence.

b) One-vs-One (OvO)

  • Train $K(K-1)/2$ classifiers for every pair of classes.
  • Majority voting to decide final prediction.
  • Used in SVMs.

c) Softmax classifiers (direct multiclass)

  • Single model outputs probability distribution over classes.
  • Example: logistic regression (softmax), neural networks.

4) Evaluation Metrics

  • Accuracy: proportion of correct predictions.
  • Precision, Recall, F1: extended with macro, micro, weighted averaging.
  • ROC-AUC: extended via macro/micro averaging or one-vs-rest curves.
  • Confusion matrix: shows per-class performance.

5) Example

Digit classification (0–9):

  • Model outputs probability vector: [0.01, 0.03, …, 0.92 (class 7), …, 0.01].
  • Prediction = class 7.

6) Challenges

  • Imbalanced classes: some classes much rarer than others → accuracy misleading.
  • Overlapping classes: harder to separate if features aren’t distinctive.
  • Evaluation: need per-class metrics (macro vs micro).
  • Scalability: training OvO/OvR classifiers when $K$ is large (e.g., 1000s of classes).

7) Applications

  • Digit recognition (MNIST).
  • ImageNet object classification (1000 classes).
  • Document categorization (topics).
  • Sentiment classification (positive/neutral/negative).
  • Medical diagnosis (disease type prediction).

Summary

  • Multiclass classification = predict exactly one class out of >2.
  • Approaches: OvR, OvO, softmax.
  • Metrics: accuracy, precision/recall/F1 (macro/micro/weighted), AUC variants.
  • Applications span vision, NLP, healthcare, finance.