Temperature Scaling

Definition

Temperature Scaling is a probability calibration technique that adjusts the “confidence” of a model’s predicted probabilities by dividing logits by a single parameter called the temperature ($T > 0$).

It is a post-processing step — applied after training, on logits, without changing the classifier’s accuracy.

How It Works

Start with logits (raw outputs before softmax) from a classifier:
- $z = (z_1, z_2, \dots, z_K)$
Apply Temperature Scaling:
- $\text{Softmax}_i(z, T) = \frac{\exp(z_i / T)}{\sum_{j=1}^K \exp(z_j / T)}$
  - If $T = 1$: standard softmax (no change).
  - If $T > 1$: probabilities become softer / less confident.
  - If $T < 1$: probabilities become sharper / more confident.
Choose $T$:
- Fit $T$ on a validation set by minimizing Negative Log-Likelihood (NLL).
- Only one parameter ($T$) needs to be learned.

Intuition

Deep networks are often overconfident (e.g., predict 0.99 when correct frequency is 0.8).
By increasing temperature ($T > 1$), the distribution flattens → probabilities better calibrated.
It does not change the predicted class, only the probability confidence.

Example

Suppose a model outputs logits for 3 classes: $z = [2.0, 1.0, 0.1]$

With $T = 1$: $\text{Softmax} = [0.65, 0.24, 0.11]$
With $T = 2$: $\text{Softmax} = [0.48, 0.29, 0.23]$

Prediction still class 1, but probabilities are less extreme, closer to reality.

Applications

Deep Learning Calibration (CNNs, Transformers, etc.)
Safety-critical ML (medical, autonomous driving, finance) → reliable probabilities matter.
Uncertainty estimation in classification tasks.

Advantages

Simple (only one parameter).
Effective for modern neural networks.
Doesn’t hurt classification accuracy (decision boundaries unchanged).

Limitations

Only rescales probabilities, cannot fix more complex miscalibration.
Requires a good validation set.
Less flexible than isotonic regression.

Comparison to Other Methods

Platt Scaling → logistic regression on scores, assumes sigmoid relationship.
Isotonic Regression → non-parametric, more flexible but risk of overfitting.
Temperature Scaling → parametric, simple, works especially well in deep learning.

In short:
Temperature Scaling is a calibration method that adjusts softmax probabilities by dividing logits by a learned temperature parameter $T$. It softens or sharpens predicted probabilities, making them better aligned with reality without changing accuracy.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Temperature Scaling

Definition

How It Works

Intuition

Example

Applications

Advantages

Limitations

Comparison to Other Methods

Like this:

Related

Leave a ReplyCancel reply

Definition

How It Works

Intuition

Example

Applications

Advantages

Limitations

Comparison to Other Methods

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery