1. Definition
A reliability curve compares a model’s predicted probabilities with the actual observed frequencies of outcomes.
- x-axis = predicted probability (binned, e.g., 0.0–0.1, 0.1–0.2, …)
- y-axis = actual fraction of positives in each bin
- Perfect calibration → points lie on the diagonal line $y = x$.
2. Why It’s Useful
- Evaluates whether probabilities can be trusted as real-world likelihoods.
- Identifies if a model is:
- Overconfident: Predicted probs too high (curve below diagonal).
- Underconfident: Predicted probs too low (curve above diagonal).
- Goes beyond accuracy/ROC → tells if probabilities themselves are meaningful.
3. Example
Suppose a fraud detection model predicts probabilities for 1000 transactions.
| Predicted Prob. Bin | Avg. Predicted | Actual Fraud Rate |
|---|---|---|
| 0.1–0.2 | 0.15 | 0.05 |
| 0.3–0.4 | 0.35 | 0.20 |
| 0.7–0.8 | 0.75 | 0.60 |
- In the 0.7–0.8 bin → model said 75% chance of fraud, but reality is 60%. → overconfident.
Graphically: the curve falls below the diagonal.
4. How to Plot in Python
import matplotlib.pyplot as plt
from sklearn.calibration import calibration_curve
import numpy as np
# Example data
y_true = [0, 0, 1, 1, 0, 1, 0, 1, 0, 0]
y_prob = [0.1, 0.4, 0.35, 0.8, 0.2, 0.7, 0.3, 0.9, 0.6, 0.05]
# Get calibration curve
prob_true, prob_pred = calibration_curve(y_true, y_prob, n_bins=5)
plt.plot(prob_pred, prob_true, marker='o', label="Model")
plt.plot([0,1], [0,1], linestyle="--", label="Perfectly Calibrated")
plt.xlabel("Predicted probability")
plt.ylabel("Observed frequency")
plt.title("Reliability Curve (Calibration Curve)")
plt.legend()
plt.show()
This will produce a plot where:
- Dashed line (y=x): Perfect calibration.
- Curve above line: Model underconfident.
- Curve below line: Model overconfident.
5. Use Cases
- Credit Fraud Detection: Ensure fraud probabilities match reality for decision thresholds.
- Medical Diagnosis: Patients must be told realistic risks (e.g., “20% chance of disease”).
- Weather Forecasting: Classic example → if “30% rain” is forecast 100 times, it should rain about 30 times.
Summary
- Reliability Curve = predicted prob vs. observed frequency plot.
- Checks calibration quality of probability estimates.
- Perfect calibration → diagonal line.
- Helps detect overconfidence or underconfidence in model predictions.
