Brier Score

1. Definition

The Brier Score (BS) measures the mean squared error between predicted probabilities and actual outcomes.

For binary classification (outcome = 0 or 1):

$BS = \frac{1}{N} \sum_{i=1}^N ( \hat{p}_i – y_i )^2$

where:

$N$ = number of predictions
$\hat{p}_i$ = predicted probability for instance $i$
$y_i$ = true outcome (0 or 1)

2. Interpretation

Range: 0 to 1
- 0 = perfect predictions (probabilities exactly match outcomes)
- 1 = worst possible predictions (completely wrong and overconfident)
Lower is better (unlike AUC which is “higher is better”).

Example:

If a model predicts 0.8 probability for fraud, and the actual label is 1 → error = (0.8 – 1)² = 0.04.
If the actual label was 0 instead → error = (0.8 – 0)² = 0.64 (a much larger penalty).

3. Why It’s Useful

Calibration + Accuracy: Brier Score incorporates both the correctness of the classification and the quality of the probability estimates.
Better than Accuracy for Probabilities:
- Accuracy only considers the final decision (0/1).
- Brier Score penalizes overconfident wrong predictions more heavily.

4. Variants

Decomposition (Murphy’s decomposition):
The Brier Score can be broken into three parts:
- Reliability (Calibration): How close predicted probabilities are to true frequencies.
- Resolution: How well the predictions separate different outcome groups.
- Uncertainty: Inherent difficulty of the prediction problem.

5. Use Cases

Credit Fraud Detection:
- Brier Score helps assess whether fraud probabilities (0.01, 0.3, 0.9, etc.) are realistic.
Medical Diagnosis:
- Important because doctors rely on calibrated risk scores (e.g., “30% chance of disease”).
Weather Forecasting:
- Classic use case: a forecast of “70% chance of rain” should be correct about 70% of the time.

6. Example in Python

from sklearn.metrics import brier_score_loss

# true labels (0 = no fraud, 1 = fraud)
y_true = [0, 0, 1, 1]

# predicted probabilities for the positive class
y_prob = [0.1, 0.4, 0.8, 0.9]

bs = brier_score_loss(y_true, y_prob)
print("Brier Score:", bs)

from sklearn.metrics import brier_score_loss

# true labels (0 = no fraud, 1 = fraud)
y_true = [0, 0, 1, 1]

# predicted probabilities for the positive class
y_prob = [0.1, 0.4, 0.8, 0.9]

bs = brier_score_loss(y_true, y_prob)
print("Brier Score:", bs)

Output:

Brier Score: 0.055

Brier Score: 0.055

→ The model is well calibrated (low score).

Summary:

Brier Score = mean squared error of probability predictions.
Range = [0, 1], lower is better.
Captures both accuracy and calibration of probabilities.
Common in fraud detection, healthcare, weather forecasting.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Brier Score

1. Definition

2. Interpretation

3. Why It’s Useful

4. Variants

5. Use Cases

6. Example in Python

Like this:

Related

Leave a ReplyCancel reply

1. Definition

2. Interpretation

3. Why It’s Useful

4. Variants

5. Use Cases

6. Example in Python

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery