Definition
Recalibration is the process of adjusting a model’s predicted probabilities so that they better reflect the true likelihood of outcomes.
- A model may predict confidence scores (e.g., 0.9 probability of fraud), but those probabilities may be overconfident or underconfident.
- Recalibration corrects this mismatch.
Why It’s Needed
- Many ML models (SVMs, neural networks, boosted trees) are poorly calibrated — their raw outputs aren’t true probabilities.
- In high-stakes domains (medicine, finance, fraud detection), decision-making requires well-calibrated probabilities, not just ranking.
- Example: If the model predicts 0.7, we want ~70% of those cases to truly be positive.
How It’s Done
- Fit a calibration model on a validation set
- Compare predicted scores vs. true outcomes.
- Adjust predictions using a calibration technique.
- Common Methods
- Platt Scaling → logistic regression on raw scores.
- Isotonic Regression → non-parametric monotonic mapping.
- Temperature Scaling → used in deep learning; adjusts logits with one parameter.
- Bayesian Recalibration → update predictions with prior distributions.
- Calibration Curve
- Plot predicted probability vs. actual observed frequency.
- If the curve deviates from diagonal, recalibration is needed.
Examples
1. Classification Model
- Spam filter outputs probability: Email A = 0.9 spam.
- But in reality, only 70% of emails with score 0.9 are spam.
- After recalibration (Platt scaling), the corrected probability might be 0.72.
2. Risk Model in Healthcare
- A mortality prediction model predicts 0.2 (20% risk).
- If actual mortality for that group is 30%, the model is underconfident.
- Recalibration corrects the score closer to 0.3.
Difference: Recalibration vs. Threshold Tuning
- Threshold tuning → change the cutoff (e.g., from 0.5 → 0.3) to balance precision/recall.
- Recalibration → adjust the probabilities themselves so they represent true likelihoods.
Benefits
- Improves decision quality in risk-sensitive domains.
- Makes probabilities trustworthy.
- Works as a post-processing step (no need to retrain the base model).
Challenges
- Needs a reliable calibration dataset (validation set).
- Recalibration may degrade ranking performance (e.g., AUC unchanged, but probabilities more accurate).
Summary
Recalibration = correcting a model’s predicted probabilities so they match actual observed outcomes.
- Methods: Platt scaling, isotonic regression, temperature scaling.
- Goal: ensure probabilities are meaningful and usable in real-world decisions.
