Definition

Recalibration is the process of adjusting a model’s predicted probabilities so that they better reflect the true likelihood of outcomes.

  • A model may predict confidence scores (e.g., 0.9 probability of fraud), but those probabilities may be overconfident or underconfident.
  • Recalibration corrects this mismatch.

Why It’s Needed

  • Many ML models (SVMs, neural networks, boosted trees) are poorly calibrated — their raw outputs aren’t true probabilities.
  • In high-stakes domains (medicine, finance, fraud detection), decision-making requires well-calibrated probabilities, not just ranking.
  • Example: If the model predicts 0.7, we want ~70% of those cases to truly be positive.

How It’s Done

  1. Fit a calibration model on a validation set
    • Compare predicted scores vs. true outcomes.
    • Adjust predictions using a calibration technique.
  2. Common Methods
    • Platt Scaling → logistic regression on raw scores.
    • Isotonic Regression → non-parametric monotonic mapping.
    • Temperature Scaling → used in deep learning; adjusts logits with one parameter.
    • Bayesian Recalibration → update predictions with prior distributions.
  3. Calibration Curve
    • Plot predicted probability vs. actual observed frequency.
    • If the curve deviates from diagonal, recalibration is needed.

Examples

1. Classification Model

  • Spam filter outputs probability: Email A = 0.9 spam.
  • But in reality, only 70% of emails with score 0.9 are spam.
  • After recalibration (Platt scaling), the corrected probability might be 0.72.

2. Risk Model in Healthcare

  • A mortality prediction model predicts 0.2 (20% risk).
  • If actual mortality for that group is 30%, the model is underconfident.
  • Recalibration corrects the score closer to 0.3.

Difference: Recalibration vs. Threshold Tuning

  • Threshold tuning → change the cutoff (e.g., from 0.5 → 0.3) to balance precision/recall.
  • Recalibration → adjust the probabilities themselves so they represent true likelihoods.

Benefits

  • Improves decision quality in risk-sensitive domains.
  • Makes probabilities trustworthy.
  • Works as a post-processing step (no need to retrain the base model).

Challenges

  • Needs a reliable calibration dataset (validation set).
  • Recalibration may degrade ranking performance (e.g., AUC unchanged, but probabilities more accurate).

Summary
Recalibration = correcting a model’s predicted probabilities so they match actual observed outcomes.

  • Methods: Platt scaling, isotonic regression, temperature scaling.
  • Goal: ensure probabilities are meaningful and usable in real-world decisions.