1) General Idea

  • In multi-class classification, we often need one overall metric.
  • Weighted averaging = compute the metric for each class separately, then take a weighted average based on class frequency (support).

Formula:

$M_{weighted} = \frac{\sum_{i=1}^{C} (n_i \cdot M_i)}{\sum_{i=1}^{C} n_i}$

Where:

  • $C$ = number of classes
  • $n_i$ = number of true samples in class $i$
  • $M_i$ = metric for class $i$ (precision, recall, F1, etc.)

2) Why Weighted Averaging

  • Macro averaging: treats all classes equally, even rare ones.
  • Micro averaging: treats all individual samples equally, dominated by large classes.
  • Weighted averaging: a middle ground → classes contribute in proportion to their size.

It helps avoid distortion when classes are imbalanced but you don’t want tiny classes to dominate the score.


3) Example

Suppose 3 classes (A, B, C):

  • Support (true samples):
    • A: 1000
    • B: 100
    • C: 10
  • F1-scores per class:
    • A: 0.95
    • B: 0.60
    • C: 0.20
  • Macro F1 = (0.95 + 0.60 + 0.20) / 3 = 0.583
  • Micro F1 ≈ accuracy, dominated by A.
  • Weighted F1 =

$\frac{1000 \cdot 0.95 + 100 \cdot 0.60 + 10 \cdot 0.20}{1110} \approx 0.917$

Weighted F1 is closer to class A’s score because A is the majority.


4) When to Use

  • Macro averaging → fairness: each class matters equally (good for minority monitoring).
  • Micro averaging → overall effectiveness: majority classes dominate.
  • Weighted averaging → practical balance: reflects dataset distribution but still respects minority performance somewhat.

5) In scikit-learn

from sklearn.metrics import f1_score

f1_score(y_true, y_pred, average="macro")    # equal weight per class
f1_score(y_true, y_pred, average="micro")    # equal weight per sample
f1_score(y_true, y_pred, average="weighted") # weight by support

Summary

  • Weighted averaging = per-class metrics averaged according to class size.
  • It’s a compromise between macro (equal class weight) and micro (equal sample weight).
  • Useful for imbalanced datasets when you want the final metric to reflect the true class distribution.