Definition

Population Stability Index (PSI) is a metric used to measure how much a variable’s distribution (often model scores or feature values) has shifted between two populations:

  • Baseline (expected) population → often training data or last month’s distribution.
  • Current (observed) population → often scoring data from today, this month, or production.

It helps detect data drift and model stability issues.


Formula

$\text{PSI} = \sum_{i=1}^N \left( (p_i – q_i) \times \ln\frac{p_i}{q_i} \right)$

Where:

  • $p_i$​ = proportion in bin i for the expected population (baseline).
  • $q_i$ = proportion in bin i for the actual population (current).
  • $N$ = number of bins (usually 10 or 20).

Interpretation

PSI ValueInterpretation
< 0.1No significant change (distribution stable).
0.1 – 0.25Moderate shift (monitor closely, may signal drift).
> 0.25Large shift (serious data drift, model may need retraining).

Example

Suppose you split credit score predictions into 3 bins:

Bin (Score Range)% in Baseline ($p_i$​)% in Current ($q_i$​)
Low (0–300)20% (0.20)10% (0.10)
Medium (301–600)50% (0.50)60% (0.60)
High (601–900)30% (0.30)30% (0.30)

Now compute PSI per bin:

  • Low:
    • $(0.20 – 0.10) \times \ln(0.20 / 0.10) = 0.10 \times \ln(2) = 0.0693$
  • Medium:
    • $(0.50 – 0.60) \times \ln(0.50 / 0.60) = -0.10 \times \ln(0.833) = 0.0182$
  • High:
    • $(0.30 – 0.30) \times \ln(0.30 / 0.30) = 0$

PSI = 0.0693 + 0.0182 + 0 = 0.0875 → < 0.1 → stable.


Key Use Cases

  • Credit Risk Models → detect shifts in score distribution.
  • Fraud Detection → monitor transaction patterns.
  • Machine Learning Monitoring → detect data drift between training and production.

In short: PSI is a drift detection metric comparing two distributions. A high PSI means the population has changed significantly → model retraining may be needed.