Definition
Platt Scaling is a probability calibration technique that takes the raw outputs (scores or logits) from a classifier and converts them into well-calibrated probabilities using a logistic regression model.
- Originally introduced for Support Vector Machines (SVMs), since their outputs are not naturally probabilistic.
- Now widely used for other models (e.g., boosting, neural nets).
How It Works
- A classifier produces raw scores $f(x)$ (e.g., SVM decision function, logit values).
- Fit a logistic regression on a held-out validation set:
$P(y=1 \mid f(x)) = \frac{1}{1 + \exp(A f(x) + B)}$
- $A, B$ are parameters learned by minimizing negative log-likelihood on calibration data.
- The mapping is sigmoid-shaped, squashing scores into$[0,1]$.
- Use this mapping to output calibrated probabilities for new data.
Example
Suppose an SVM outputs decision scores:
| Score f(x)f(x)f(x) | True Label |
|---|---|
| 2.5 | 1 |
| -1.2 | 0 |
| 0.8 | 1 |
| -0.5 | 0 |
Platt scaling fits a sigmoid curve that best maps these scores to probabilities, e.g.:
- $f(x)=2.5 \to P=0.92$
- $f(x)=-1.2 \to P=0.09$
- $f(x)=0.8 \to P=0.65$
Applications
- SVMs: Decision function values → probabilities.
- Boosting models (e.g., XGBoost, LightGBM): Often produce overconfident probabilities → Platt scaling improves calibration.
- General ML: Any classifier with scores can use Platt scaling.
Advantages
Simple (just logistic regression on top of scores).
Works well with reasonably large calibration sets.
Outputs smooth probability estimates.
Limitations
Assumes a sigmoid relationship between scores and true probabilities → may not hold in all cases.
Needs a separate validation set for calibration (risk of overfitting if too small).
For more flexible calibration, Isotonic Regression is often preferred (non-parametric).
Platt Scaling vs Isotonic Regression
- Platt Scaling → Parametric, assumes sigmoid shape, less prone to overfitting on small data.
- Isotonic Regression → Non-parametric, more flexible, but may overfit if little data.
In short:
Platt Scaling = fit a logistic regression on raw model scores → transform them into calibrated probabilities.
It’s simple, effective, and widely used, especially for SVMs.
