Definition
Drift detection is the process of monitoring and identifying changes in data distributions or model behavior over time that can harm model performance.
- Real-world data often evolves (seasonality, new users, business changes).
- Drift detection ensures the model stays valid and triggers retraining or recalibration when needed.
Types of Drift
- Covariate Drift (Feature Drift)
- Input feature distributions change.
- Example: previously 80% desktop users, now 60% mobile.
- Label Drift (Prior Probability Shift)
- Distribution of target labels changes.
- Example: fraud rate rises from 1% → 3%.
- Concept Drift
- Relationship between features and labels changes.
- Example: same features no longer predict fraud well because fraud tactics evolved.
How Drift is Detected
- Statistical Tests
- Kolmogorov–Smirnov (KS test), Chi-Square test → compare old vs. new distributions.
- Cramér’s V, PSI (Population Stability Index).
- Distance Measures
- KL divergence, Jensen–Shannon divergence, Wasserstein distance, MMD (Maximum Mean Discrepancy), Energy distance.
- Classifier Two-Sample Tests
- Train a classifier to distinguish “old vs. new” samples.
- If it performs well → drift detected.
- Monitoring Model Performance (Lagging Indicator)
- Track loss, AUC, accuracy over time.
- A sudden drop signals possible drift.
Practical Approaches
- Set baselines → distributions from training data.
- Compare rolling windows → current vs. historical data (last 24h vs. last 4 weeks).
- Alert thresholds → drift > 0.2 PSI → trigger investigation.
- Integrate with pipelines → retrain or recalibrate automatically.
Example
- Credit scoring model trained on 2022 data.
- In 2025, income distribution shifts (more gig economy workers).
- Drift detection system (using PSI) signals covariate drift.
- Model monitor flags possible concept drift → retraining triggered.
Why Drift Detection Matters
- Prevents silent model degradation.
- Ensures fairness (drift can affect subgroups disproportionately).
- Supports compliance (financial, medical AI regulations require monitoring).
Summary
Drift detection = monitoring changes in data or model behavior that can degrade performance.
- Types: covariate drift, label drift, concept drift.
- Techniques: statistical tests, distance measures, classifier tests, performance monitoring.
- Goal: ensure models stay accurate, fair, and reliable in production.
