1) Definition

  • Label drift occurs when the distribution of the target variable (labels) changes between training and production (or across time).
  • Formally:
    • $p_{\text{train}}(y) \;\neq\; p_{\text{prod}}(y)$
  • Importantly, the relationship $p(y \mid x)$ may still be the same; it’s the marginal distribution of $y$ that has shifted.

In contrast:


2) Why it matters

  • Model evaluation bias: Offline test metrics (on training distribution) may not reflect real-world deployment.
  • Thresholds / calibration: Class priors change → probability calibration becomes wrong.
  • Resource planning: In ops, drifted labels mean reality is changing (e.g., more fraud, higher churn).

3) Examples

Binary classification

  • Fraud detection: In training, 2% of transactions are fraud. In production, suddenly 5% are fraud.
  • If your threshold was tuned for a 2% base rate, false positives/negatives will spike.

Multi-class classification

  • Customer support tickets: Distribution shifts from 70% “billing issues” to 40% billing + 60% “technical issues”.
  • A model optimized on past priors underperforms because it “expects” more billing cases.

Regression

  • Energy demand forecasting: Average demand rises by 20% due to a cold winter.
  • Even if the model’s conditional mapping works, it will systematically underpredict.

4) How to detect label drift

In real-time production, you often don’t know labels immediately → label drift monitoring usually lags (delayed ground truth).


5) Mitigation strategies


6) Example Calculation

Training distribution (churn model):

  • Churn = 20%
  • Non-churn = 80%

Production distribution (last month):

  • Churn = 30%
  • Non-churn = 70%

Label drift (absolute change in priors):

$|0.30 – 0.20| = 0.10 \quad \Rightarrow \quad 10\%\text{ shift}$

That shift alone can cause model miscalibration and deteriorating metrics.


Summary

  • Label drift = shift in target variable distribution.
  • Impacts calibration, metrics, business decisions.
  • Must be detected via delayed labels and mitigated via recalibration, reweighting, retraining.