1. Definition
The Brier Score (BS) measures the mean squared error between predicted probabilities and actual outcomes.
- For binary classification (outcome = 0 or 1):
$BS = \frac{1}{N} \sum_{i=1}^N ( \hat{p}_i – y_i )^2$
where:
- $N$ = number of predictions
- $\hat{p}_i$ = predicted probability for instance $i$
- $y_i$ = true outcome (0 or 1)
2. Interpretation
- Range: 0 to 1
- 0 = perfect predictions (probabilities exactly match outcomes)
- 1 = worst possible predictions (completely wrong and overconfident)
- Lower is better (unlike AUC which is “higher is better”).
Example:
- If a model predicts 0.8 probability for fraud, and the actual label is 1 → error = (0.8 – 1)² = 0.04.
- If the actual label was 0 instead → error = (0.8 – 0)² = 0.64 (a much larger penalty).
3. Why It’s Useful
- Calibration + Accuracy: Brier Score incorporates both the correctness of the classification and the quality of the probability estimates.
- Better than Accuracy for Probabilities:
- Accuracy only considers the final decision (0/1).
- Brier Score penalizes overconfident wrong predictions more heavily.
4. Variants
- Decomposition (Murphy’s decomposition):
The Brier Score can be broken into three parts:- Reliability (Calibration): How close predicted probabilities are to true frequencies.
- Resolution: How well the predictions separate different outcome groups.
- Uncertainty: Inherent difficulty of the prediction problem.
5. Use Cases
- Credit Fraud Detection:
- Brier Score helps assess whether fraud probabilities (0.01, 0.3, 0.9, etc.) are realistic.
- Medical Diagnosis:
- Important because doctors rely on calibrated risk scores (e.g., “30% chance of disease”).
- Weather Forecasting:
- Classic use case: a forecast of “70% chance of rain” should be correct about 70% of the time.
6. Example in Python
from sklearn.metrics import brier_score_loss
# true labels (0 = no fraud, 1 = fraud)
y_true = [0, 0, 1, 1]
# predicted probabilities for the positive class
y_prob = [0.1, 0.4, 0.8, 0.9]
bs = brier_score_loss(y_true, y_prob)
print("Brier Score:", bs)
Output:
Brier Score: 0.055→ The model is well calibrated (low score).
Summary:
- Brier Score = mean squared error of probability predictions.
- Range = [0, 1], lower is better.
- Captures both accuracy and calibration of probabilities.
- Common in fraud detection, healthcare, weather forecasting.
