Definition
In machine learning and statistics, reweighting means assigning different weights to samples, features, or losses so that training (or evaluation) better reflects the true importance, fairness, or distribution of the data.
- It doesn’t change the raw data, but it changes how much influence each part has during training or evaluation.
Where It’s Used
- Class Imbalance
- If positives are rare (fraud, disease detection), reweight positives higher so the model doesn’t ignore them.
- Example: Give fraud cases weight = 10, normal cases = 1.
- Domain Adaptation / Covariate Shift
- Training data distribution ≠ production distribution.
- Reweight samples to match the target distribution.
- Example: Old dataset has 80% desktop users, but production is 50% mobile → reweight accordingly.
- Fairness
- Some subgroups may be underrepresented.
- Reweight to ensure equal contribution from protected groups (gender, race, region).
- Loss Reweighting
- Adjust contribution of loss terms (e.g., in multi-task learning, adversarial training).
- Importance Sampling
- Reweight samples when drawing from biased sampling distributions to make estimates unbiased.
Mathematical View
If loss is normally:
$L = \frac{1}{N} \sum_{i=1}^N \ell(f(x_i), y_i)$
With reweighting:
$L = \frac{1}{N} \sum_{i=1}^N w_i \, \ell(f(x_i), y_i)$
- $w_i$ = weight assigned to sample $i$.
- Large $w_i$ = more influence in training.
Examples
- Imbalanced Fraud Dataset
- Dataset: 1% fraud, 99% normal.
- Without reweighting → model predicts “not fraud” always.
- With reweighting (fraud weight = 99, normal weight = 1) → fraud cases count more, improving recall.
- Fairness Reweighting
- Loan model has 80% male, 20% female.
- Reweight female applicants higher to ensure fairness in predictions.
Benefits
- Handles imbalance and distribution mismatch.
- Improves fairness and robustness.
- Keeps data intact (no need to oversample/undersample).
Challenges
- If weights are extreme, model may overfit to minority samples.
- Choosing weights correctly can be tricky (needs validation).
Summary
Reweighting = adjusting the influence of samples, features, or losses using weights to handle imbalance, drift, or fairness issues.
It ensures the model reflects the true importance of data points instead of just raw counts.
