Definition
Leading indicators are early signals that provide advance warning about potential future outcomes.
- They predict what might happen, rather than confirming what already happened.
- In machine learning, they often relate to input data quality and distribution.
Characteristics
- Proactive → can act before failure/performance drop.
- Indirect → don’t measure end results directly, but conditions that affect them.
- Short-term sensitivity → can detect changes quickly.
Examples in Machine Learning
- Data Drift
- Feature distribution changes (e.g., income values skew higher than in training).
- Category frequency changes (e.g., new device types).
- Input Data Quality
- Missing values rate suddenly increases.
- Unexpected data types or schema changes.
- Operational Metrics
- Latency spikes in feature pipelines.
- Higher error rates in upstream data sources.
- Representation Shift
- Embeddings of user behavior look different from historical patterns.
Why They Matter
- They serve as an early warning system before lagging indicators (e.g., AUC, loss, accuracy) show degradation.
- Allow proactive retraining, data pipeline fixes, or alerts.
Example
- Fraud detection model:
- Leading indicator: Transaction patterns in new countries suddenly rise.
- Lagging indicator: AUC of fraud classifier drops after 1 week.
Here, the leading indicator gave an early signal before the lagging metric confirmed performance loss.
Summary
Leading indicators = early warning metrics (like drift, data quality checks, representation shift).
They don’t prove model failure, but they signal risks that future KPIs (loss, AUC, calibration) may degrade.
