Definition

Platt Scaling is a probability calibration technique that takes the raw outputs (scores or logits) from a classifier and converts them into well-calibrated probabilities using a logistic regression model.

  • Originally introduced for Support Vector Machines (SVMs), since their outputs are not naturally probabilistic.
  • Now widely used for other models (e.g., boosting, neural nets).

How It Works

  1. A classifier produces raw scores $f(x)$ (e.g., SVM decision function, logit values).
  2. Fit a logistic regression on a held-out validation set:

$P(y=1 \mid f(x)) = \frac{1}{1 + \exp(A f(x) + B)}$

  • $A, B$ are parameters learned by minimizing negative log-likelihood on calibration data.
  • The mapping is sigmoid-shaped, squashing scores into$[0,1]$.
  1. Use this mapping to output calibrated probabilities for new data.

Example

Suppose an SVM outputs decision scores:

Score f(x)f(x)f(x)True Label
2.51
-1.20
0.81
-0.50

Platt scaling fits a sigmoid curve that best maps these scores to probabilities, e.g.:

  • $f(x)=2.5 \to P=0.92$
  • $f(x)=-1.2 \to P=0.09$
  • $f(x)=0.8 \to P=0.65$

Applications

  • SVMs: Decision function values → probabilities.
  • Boosting models (e.g., XGBoost, LightGBM): Often produce overconfident probabilities → Platt scaling improves calibration.
  • General ML: Any classifier with scores can use Platt scaling.

Advantages

Simple (just logistic regression on top of scores).
Works well with reasonably large calibration sets.
Outputs smooth probability estimates.


Limitations

Assumes a sigmoid relationship between scores and true probabilities → may not hold in all cases.
Needs a separate validation set for calibration (risk of overfitting if too small).
For more flexible calibration, Isotonic Regression is often preferred (non-parametric).


Platt Scaling vs Isotonic Regression

  • Platt Scaling → Parametric, assumes sigmoid shape, less prone to overfitting on small data.
  • Isotonic Regression → Non-parametric, more flexible, but may overfit if little data.

In short:
Platt Scaling = fit a logistic regression on raw model scores → transform them into calibrated probabilities.
It’s simple, effective, and widely used, especially for SVMs.