1) Definition
- In multilabel classification, each sample can have multiple labels at once.
- Example: an image may be tagged
{dog, outdoor, animal}.
- Example: an image may be tagged
- Precision measures:
- $\text{Precision} = \frac{\text{True Positive Labels (TP)}}{\text{Predicted Labels (TP + FP)}}$
- In words: Of all labels the model predicted, how many were actually correct?
2) Single-sample Example
True labels = {dog, animal}
Predicted labels = {dog, outdoor}
- TP =
{dog}→ 1 - FP =
{outdoor}→ 1 - Precision = 1/(1+1)=0.51 / (1+1) = 0.51/(1+1)=0.5
3) Averaging across the dataset
Since we want one number for the entire dataset, different averaging strategies exist:
- Micro Precision
- Pool all TP and FP across all samples.
- Formula:
- $\text{Precision}_{micro} = \frac{\sum TP}{\sum (TP+FP)}$
- Treats each label occurrence equally.
- Macro Precision
- Compute precision per label (one-vs-rest), then average.
- Every label counts equally, even rare ones.
- Weighted Precision
- Like macro, but each label’s precision is weighted by its frequency (support).
- More realistic when classes are imbalanced.
- Samples Precision
- Compute precision per sample, then average over all samples.
- Focuses on “how well the model performs on a typical instance.”
4) Multi-sample Example
Suppose 2 samples:
- Sample 1:
- True = {dog, animal}
- Pred = {dog, outdoor} → Precision = 1/2 = 0.5
- Sample 2:
- True = {car}
- Pred = {car, truck} → Precision = 1/2 = 0.5
- Samples averaging = (0.5 + 0.5)/2 = 0.5
- Micro precision = TP=2 / (TP+FP=4) = 0.5
- Macro precision = lower (~0.3) since rare labels count equally.
5) Python Example
from sklearn.metrics import precision_score
y_true = [[1,1,0,0,0], # {dog, animal}
[0,0,0,1,0]] # {car}
y_pred = [[1,0,1,0,0], # {dog, outdoor}
[0,0,0,1,1]] # {car, truck}
print("Micro :", precision_score(y_true, y_pred, average="micro"))
print("Macro :", precision_score(y_true, y_pred, average="macro"))
print("Weighted:", precision_score(y_true, y_pred, average="weighted"))
print("Samples :", precision_score(y_true, y_pred, average="samples"))
Expected output:
Micro : 0.5
Macro : 0.3
Weighted: 0.42
Samples : 0.5
Summary
- Multilabel Precision = fraction of predicted labels that are correct.
- Different averaging methods give different perspectives:
- Micro: overall correctness across all labels
- Macro: fairness across labels (treats rare ones equally)
- Weighted: adjusts for label frequency
- Samples: how well the model performs on each instance
