1) Definition

  • In multilabel classification, each sample can have multiple labels at once.
    • Example: an image may be tagged {dog, outdoor, animal}.
  • Precision measures:
    • $\text{Precision} = \frac{\text{True Positive Labels (TP)}}{\text{Predicted Labels (TP + FP)}}$
  • In words: Of all labels the model predicted, how many were actually correct?

2) Single-sample Example

True labels = {dog, animal}
Predicted labels = {dog, outdoor}

  • TP = {dog} → 1
  • FP = {outdoor} → 1
  • Precision = 1/(1+1)=0.51 / (1+1) = 0.51/(1+1)=0.5

3) Averaging across the dataset

Since we want one number for the entire dataset, different averaging strategies exist:

  1. Micro Precision
    • Pool all TP and FP across all samples.
    • Formula:
      • $\text{Precision}_{micro} = \frac{\sum TP}{\sum (TP+FP)}$
    • Treats each label occurrence equally.
  2. Macro Precision
    • Compute precision per label (one-vs-rest), then average.
    • Every label counts equally, even rare ones.
  3. Weighted Precision
    • Like macro, but each label’s precision is weighted by its frequency (support).
    • More realistic when classes are imbalanced.
  4. Samples Precision
    • Compute precision per sample, then average over all samples.
    • Focuses on “how well the model performs on a typical instance.”

4) Multi-sample Example

Suppose 2 samples:

  • Sample 1:
    • True = {dog, animal}
    • Pred = {dog, outdoor} → Precision = 1/2 = 0.5
  • Sample 2:
    • True = {car}
    • Pred = {car, truck} → Precision = 1/2 = 0.5
  • Samples averaging = (0.5 + 0.5)/2 = 0.5
  • Micro precision = TP=2 / (TP+FP=4) = 0.5
  • Macro precision = lower (~0.3) since rare labels count equally.

5) Python Example

from sklearn.metrics import precision_score

y_true = [[1,1,0,0,0],   # {dog, animal}
          [0,0,0,1,0]]   # {car}
y_pred = [[1,0,1,0,0],   # {dog, outdoor}
          [0,0,0,1,1]]   # {car, truck}

print("Micro  :", precision_score(y_true, y_pred, average="micro"))
print("Macro  :", precision_score(y_true, y_pred, average="macro"))
print("Weighted:", precision_score(y_true, y_pred, average="weighted"))
print("Samples :", precision_score(y_true, y_pred, average="samples"))

Expected output:

Micro   : 0.5
Macro   : 0.3
Weighted: 0.42
Samples : 0.5

Summary

  • Multilabel Precision = fraction of predicted labels that are correct.
  • Different averaging methods give different perspectives:
    • Micro: overall correctness across all labels
    • Macro: fairness across labels (treats rare ones equally)
    • Weighted: adjusts for label frequency
    • Samples: how well the model performs on each instance