Platt Scaling

Definition

Platt Scaling is a probability calibration technique that takes the raw outputs (scores or logits) from a classifier and converts them into well-calibrated probabilities using a logistic regression model.

Originally introduced for Support Vector Machines (SVMs), since their outputs are not naturally probabilistic.
Now widely used for other models (e.g., boosting, neural nets).

How It Works

A classifier produces raw scores $f(x)$ (e.g., SVM decision function, logit values).
Fit a logistic regression on a held-out validation set:

$P(y=1 \mid f(x)) = \frac{1}{1 + \exp(A f(x) + B)}$

$A, B$ are parameters learned by minimizing negative log-likelihood on calibration data.
The mapping is sigmoid-shaped, squashing scores into$[0,1]$.

Use this mapping to output calibrated probabilities for new data.

Example

Suppose an SVM outputs decision scores:

Score f(x)f(x)f(x)	True Label
2.5	1
-1.2	0
0.8	1
-0.5	0

Platt scaling fits a sigmoid curve that best maps these scores to probabilities, e.g.:

$f(x)=2.5 \to P=0.92$
$f(x)=-1.2 \to P=0.09$
$f(x)=0.8 \to P=0.65$

Applications

SVMs: Decision function values → probabilities.
Boosting models (e.g., XGBoost, LightGBM): Often produce overconfident probabilities → Platt scaling improves calibration.
General ML: Any classifier with scores can use Platt scaling.

Advantages

Simple (just logistic regression on top of scores).
Works well with reasonably large calibration sets.
Outputs smooth probability estimates.

Limitations

Assumes a sigmoid relationship between scores and true probabilities → may not hold in all cases.
Needs a separate validation set for calibration (risk of overfitting if too small).
For more flexible calibration, Isotonic Regression is often preferred (non-parametric).

Platt Scaling vs Isotonic Regression

Platt Scaling → Parametric, assumes sigmoid shape, less prone to overfitting on small data.
Isotonic Regression → Non-parametric, more flexible, but may overfit if little data.

In short:
Platt Scaling = fit a logistic regression on raw model scores → transform them into calibrated probabilities.
It’s simple, effective, and widely used, especially for SVMs.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Platt Scaling

Definition

How It Works

Example

Applications

Advantages

Limitations

Platt Scaling vs Isotonic Regression

Like this:

Related

Leave a ReplyCancel reply

Definition

How It Works

Example

Applications

Advantages

Limitations

Platt Scaling vs Isotonic Regression

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery