KS Statistic (Kolmogorov–Smirnov Statistic)

1) Meaning

The KS statistic is a measure of the maximum difference between two cumulative distribution functions (CDFs).

In statistics: used to test if two samples come from the same distribution (KS test).
In machine learning (especially credit scoring / binary classification): the KS statistic measures how well a model separates positive vs. negative classes.

Intuition:

If the two distributions are very different → KS is large.
If they overlap heavily → KS is small.

2) Formula

For binary classification (positive = 1, negative = 0):

Compute CDF of positives: proportion of actual positives up to a given score threshold.
Compute CDF of negatives: proportion of actual negatives up to the same threshold.
KS statistic = maximum vertical distance between the two CDFs:

$KS = \max_x \; | F_{\text{positive}}(x) – F_{\text{negative}}(x) |$

Where $F$ = cumulative distribution.

3) Example

Suppose we score 1,000 customers with a credit risk model:

500 are “good” (non-default), 500 are “bad” (default).
Sort customers by predicted score.
At each threshold, compute:
- % of bads captured (True Positive Rate)
- % of goods captured (False Positive Rate)
KS = largest gap between these two curves.

If at score = 0.65:

CDF bads = 0.70 (70% of bads identified)
CDF goods = 0.30 (30% of goods misclassified)
Gap = 0.40 → KS = 40%

4) Interpretation

KS ranges from 0 → 1 (or 0% → 100%).
- Higher KS = better separation.
- KS = 0 → model has no power (both distributions identical).
- KS ≈ 0.4–0.6 → strong discriminatory power (common in credit risk models).
In practice:
- KS < 0.2 → weak model.
- KS ~ 0.3–0.4 → moderate.
- KS > 0.4 → strong.

5) Applications

Credit Scoring: KS is one of the most used model evaluation metrics.
Hypothesis Testing: KS test compares sample vs. reference distribution.
Model Validation: Detects overfitting or poor generalization if KS differs greatly between train and test sets.

6) Relation to Other Metrics

Similar to AUC (ROC curve): both measure separability of classes.
AUC integrates the whole ROC curve, while KS focuses on the maximum point of separation.
Both are threshold-independent metrics.

Bottom line:
The KS statistic measures the maximum difference between cumulative distributions of two groups (positives vs. negatives). In ML, a higher KS means better class separation, making it a key metric in credit scoring and binary classification evaluation.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

KS Statistic (Kolmogorov–Smirnov Statistic)

1) Meaning

2) Formula

3) Example

4) Interpretation

5) Applications

6) Relation to Other Metrics

Like this:

Related

Leave a ReplyCancel reply

1) Meaning

2) Formula

3) Example

4) Interpretation

5) Applications

6) Relation to Other Metrics

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery