1. F1 Score Recap (Binary Case)
- F1 score = harmonic mean of precision and recall:
$F1 = \frac{2 \times Precision \times Recall}{Precision + Recall}$
- Balances both precision (low false positives) and recall (low false negatives).
2. Multiclass Extension
For K classes, there are two main averaging strategies: macro and micro.
- Macro F1: Compute F1 per class (OvR), then average equally.
- Micro F1: Pool all predictions across classes → compute global TP, FP, FN → then calculate F1.
3. Definition of Micro F1
Step 1: Compute global precision and global recall:
$Precision_{micro} = \frac{\sum TP_i}{\sum (TP_i + FP_i)}$
$Recall_{micro} = \frac{\sum TP_i}{\sum (TP_i + FN_i)}$
Step 2: Compute F1 using those:
$F1_{micro} = \frac{2 \times Precision_{micro} \times Recall_{micro}}{Precision_{micro} + Recall_{micro}}$
4. Key Property
- In multiclass (single-label) classification, Micro Precision = Micro Recall = Micro F1.
- Reason: When each sample has exactly one label, TP + FN = total positives, and TP + FP = total predicted positives. This symmetry makes all three equal.
5. Example
Suppose we have 3 classes (A, B, C):
- TP (true positives): A=40, B=30, C=10
- FP (false positives): A=10, B=20, C=20
- FN (false negatives): A=10, B=20, C=30
Micro Precision:
$\frac{40+30+10}{(40+10)+(30+20)+(10+20)} = \frac{80}{130} \approx 0.615$
Micro Recall:
$\frac{40+30+10}{(40+10)+(30+20)+(10+30)} = \frac{80}{140} \approx 0.571$
Micro F1:
$\frac{2 \times 0.615 \times 0.571}{0.615 + 0.571} \approx 0.592$
So here: Micro F1 ≈ 0.59.
6. When to Use
- Micro F1 is useful when you want to measure overall system performance, giving more weight to majority classes.
- Macro F1 is better if you want fairness across classes, including minority ones.
Summary
- Micro F1 = F1 score computed from global TP, FP, FN across all classes.
- In multiclass (single-label), Micro Precision = Micro Recall = Micro F1.
- Good for imbalanced datasets where majority class performance dominates the evaluation.
