1. Definition

  • Multi-label classification = each instance (sample) can be assigned multiple labels simultaneously.
  • Unlike multi-class classification (where exactly one label is chosen among many), here labels are not mutually exclusive.

$f: X \;\; \rightarrow \;\; \{0,1\}^K$

  • For $K$ classes, each class has a binary decision (0 or 1) whether the instance belongs to it.

2. Examples

  • Movies: A film can belong to multiple genres → Action + Comedy + Romance.
  • News articles: An article can be tagged as Politics + Economy + International.
  • Medical diagnosis: A patient may have multiple conditions (e.g., Diabetes + Hypertension).
  • Image recognition: A picture may contain dog + car + tree.

3. How It Differs from Multi-class

FeatureMulti-classMulti-label
Labels per sampleExactly 1One or more
Class exclusivityMutually exclusiveNot mutually exclusive
Typical outputSoftmax (probabilities sum to 1)Sigmoid (independent probability per class)
ExampleAnimal = {Cat, Dog, Horse} (one choice)Tags = {Cat, Dog, Horse} (any combination)

4. Modeling

  • Output layer:
    • Multi-class → Softmax activation (chooses one)
    • Multi-label → Sigmoid activation (thresholded independently for each label)
  • Loss function:
    • Multi-class → Categorical cross-entropy
    • Multi-label → Binary cross-entropy (per class, summed/averaged)

5. Evaluation Metrics

Since predictions are multiple binary decisions per sample, standard metrics differ from multi-class:

  • Per-label Precision, Recall, F1 (then averaged: micro, macro, weighted)
  • Hamming loss (fraction of misclassified labels)
  • Subset accuracy (strict: all labels correct = 1, else 0)
  • Jaccard similarity (intersection over union of predicted vs true labels)

6. Example

Suppose true labels for an image: {Cat, Dog}
Model prediction: {Cat, Horse}

  • Precision = 1 / (1+1) = 0.5 (only Cat correct, Horse is FP)
  • Recall = 1 / (1+1) = 0.5 (missed Dog, so 1 FN)
  • Jaccard = |{Cat} ∩ {Cat, Dog}| / |{Cat, Dog, Horse}| = 1/3 ≈ 0.33

Summary

  • Multi-label classification allows assigning multiple labels to each sample.
  • Labels are independent, not mutually exclusive.
  • Requires sigmoid outputs and metrics like Hamming loss, Jaccard, and micro/macro F1.