1) General Idea
- In multi-class classification, we often need one overall metric.
- Weighted averaging = compute the metric for each class separately, then take a weighted average based on class frequency (support).
Formula:
$M_{weighted} = \frac{\sum_{i=1}^{C} (n_i \cdot M_i)}{\sum_{i=1}^{C} n_i}$
Where:
- $C$ = number of classes
- $n_i$ = number of true samples in class $i$
- $M_i$ = metric for class $i$ (precision, recall, F1, etc.)
2) Why Weighted Averaging
- Macro averaging: treats all classes equally, even rare ones.
- Micro averaging: treats all individual samples equally, dominated by large classes.
- Weighted averaging: a middle ground → classes contribute in proportion to their size.
It helps avoid distortion when classes are imbalanced but you don’t want tiny classes to dominate the score.
3) Example
Suppose 3 classes (A, B, C):
- Support (true samples):
- A: 1000
- B: 100
- C: 10
- F1-scores per class:
- A: 0.95
- B: 0.60
- C: 0.20
- Macro F1 = (0.95 + 0.60 + 0.20) / 3 = 0.583
- Micro F1 ≈ accuracy, dominated by A.
- Weighted F1 =
$\frac{1000 \cdot 0.95 + 100 \cdot 0.60 + 10 \cdot 0.20}{1110} \approx 0.917$
Weighted F1 is closer to class A’s score because A is the majority.
4) When to Use
- Macro averaging → fairness: each class matters equally (good for minority monitoring).
- Micro averaging → overall effectiveness: majority classes dominate.
- Weighted averaging → practical balance: reflects dataset distribution but still respects minority performance somewhat.
5) In scikit-learn
from sklearn.metrics import f1_score
f1_score(y_true, y_pred, average="macro") # equal weight per class
f1_score(y_true, y_pred, average="micro") # equal weight per sample
f1_score(y_true, y_pred, average="weighted") # weight by support
Summary
- Weighted averaging = per-class metrics averaged according to class size.
- It’s a compromise between macro (equal class weight) and micro (equal sample weight).
- Useful for imbalanced datasets when you want the final metric to reflect the true class distribution.
