1. Definition
- Multi-label classification = each instance (sample) can be assigned multiple labels simultaneously.
- Unlike multi-class classification (where exactly one label is chosen among many), here labels are not mutually exclusive.
$f: X \;\; \rightarrow \;\; \{0,1\}^K$
- For $K$ classes, each class has a binary decision (0 or 1) whether the instance belongs to it.
2. Examples
- Movies: A film can belong to multiple genres → Action + Comedy + Romance.
- News articles: An article can be tagged as Politics + Economy + International.
- Medical diagnosis: A patient may have multiple conditions (e.g., Diabetes + Hypertension).
- Image recognition: A picture may contain dog + car + tree.
3. How It Differs from Multi-class
| Feature | Multi-class | Multi-label |
|---|---|---|
| Labels per sample | Exactly 1 | One or more |
| Class exclusivity | Mutually exclusive | Not mutually exclusive |
| Typical output | Softmax (probabilities sum to 1) | Sigmoid (independent probability per class) |
| Example | Animal = {Cat, Dog, Horse} (one choice) | Tags = {Cat, Dog, Horse} (any combination) |
4. Modeling
- Output layer:
- Multi-class → Softmax activation (chooses one)
- Multi-label → Sigmoid activation (thresholded independently for each label)
- Loss function:
- Multi-class → Categorical cross-entropy
- Multi-label → Binary cross-entropy (per class, summed/averaged)
5. Evaluation Metrics
Since predictions are multiple binary decisions per sample, standard metrics differ from multi-class:
- Per-label Precision, Recall, F1 (then averaged: micro, macro, weighted)
- Hamming loss (fraction of misclassified labels)
- Subset accuracy (strict: all labels correct = 1, else 0)
- Jaccard similarity (intersection over union of predicted vs true labels)
6. Example
Suppose true labels for an image: {Cat, Dog}
Model prediction: {Cat, Horse}
- Precision = 1 / (1+1) = 0.5 (only Cat correct, Horse is FP)
- Recall = 1 / (1+1) = 0.5 (missed Dog, so 1 FN)
- Jaccard = |{Cat} ∩ {Cat, Dog}| / |{Cat, Dog, Horse}| = 1/3 ≈ 0.33
Summary
- Multi-label classification allows assigning multiple labels to each sample.
- Labels are independent, not mutually exclusive.
- Requires sigmoid outputs and metrics like Hamming loss, Jaccard, and micro/macro F1.
