Definition
Population Stability Index (PSI) is a metric used to measure how much a variable’s distribution (often model scores or feature values) has shifted between two populations:
- Baseline (expected) population → often training data or last month’s distribution.
- Current (observed) population → often scoring data from today, this month, or production.
It helps detect data drift and model stability issues.
Formula
$\text{PSI} = \sum_{i=1}^N \left( (p_i – q_i) \times \ln\frac{p_i}{q_i} \right)$
Where:
- $p_i$ = proportion in bin i for the expected population (baseline).
- $q_i$ = proportion in bin i for the actual population (current).
- $N$ = number of bins (usually 10 or 20).
Interpretation
| PSI Value | Interpretation |
|---|---|
| < 0.1 | No significant change (distribution stable). |
| 0.1 – 0.25 | Moderate shift (monitor closely, may signal drift). |
| > 0.25 | Large shift (serious data drift, model may need retraining). |
Example
Suppose you split credit score predictions into 3 bins:
| Bin (Score Range) | % in Baseline ($p_i$) | % in Current ($q_i$) |
|---|---|---|
| Low (0–300) | 20% (0.20) | 10% (0.10) |
| Medium (301–600) | 50% (0.50) | 60% (0.60) |
| High (601–900) | 30% (0.30) | 30% (0.30) |
Now compute PSI per bin:
- Low:
- $(0.20 – 0.10) \times \ln(0.20 / 0.10) = 0.10 \times \ln(2) = 0.0693$
- Medium:
- $(0.50 – 0.60) \times \ln(0.50 / 0.60) = -0.10 \times \ln(0.833) = 0.0182$
- High:
- $(0.30 – 0.30) \times \ln(0.30 / 0.30) = 0$
PSI = 0.0693 + 0.0182 + 0 = 0.0875 → < 0.1 → stable.
Key Use Cases
- Credit Risk Models → detect shifts in score distribution.
- Fraud Detection → monitor transaction patterns.
- Machine Learning Monitoring → detect data drift between training and production.
In short: PSI is a drift detection metric comparing two distributions. A high PSI means the population has changed significantly → model retraining may be needed.
