Definition
KS shift (Kolmogorov–Smirnov shift) measures how much a variable’s distribution has changed between two populations (e.g., training vs. production).
It is based on the KS statistic, which quantifies the maximum difference between the two cumulative distribution functions (CDFs).
Formally:
$D = \sup_x \, \big| F_{\text{baseline}}(x) – F_{\text{current}}(x) \big|$
- $F_{\text{baseline}}(x)$: cumulative distribution of baseline data (training, past month).
- $F_{\text{current}}(x)$: cumulative distribution of current data (scoring, today).
- $D$: maximum vertical distance between the two curves → the KS shift value.
Interpretation
| KS Shift Value | Interpretation |
|---|---|
| < 0.1 | No meaningful drift (distributions are very similar). |
| 0.1 – 0.2 | Moderate shift (keep an eye on it). |
| > 0.2 | Significant drift (model performance may degrade). |
Thresholds vary by industry. In credit risk, a KS shift > 0.1 is often considered warning.
Example
Suppose we have model scores for baseline (training) vs. current (production).
- At some score threshold (say 0.6):
- 70% of baseline scores are ≤ 0.6 ($F_{\text{baseline}} = 0.70$)
- 55% of current scores are ≤ 0.6 ($F_{\text{current}} = 0.55$)
Difference = $|0.70 – 0.55| = 0.15$.
If this is the maximum gap across all thresholds, then:
$\text{KS shift} = 0.15 \quad \Rightarrow \quad \text{moderate drift.}$
Why It’s Useful
- Model monitoring → check if score distributions drift.
- Regulatory credit models → KS statistic is already used for discriminatory power; KS shift extends it for population stability monitoring.
- Complement to PSI:
- PSI → bin-based, measures overall distribution change.
- KS shift → threshold-free, captures worst-case deviation between CDFs.
In short: KS shift = maximum difference between baseline and current cumulative distributions. If the KS shift is large, your model is likely seeing data drift.
