Maximum Calibration Error (MCE)

Definition

Maximum Calibration Error (MCE) is a metric used to evaluate probability calibration of a predictive model (often a classifier).

It measures the worst-case deviation between the predicted probabilities and the actual observed frequencies.
Unlike Expected Calibration Error (ECE), which averages errors across bins, MCE focuses on the largest single discrepancy.

Formal Setup

Suppose your classifier outputs predicted probabilities $\hat{p}_i \in [0,1]$ for each instance.
Partition predictions into bins $B_1, B_2, \ldots, B_M$ based on their predicted probability (e.g., [0–0.1], [0.1–0.2], …).
For each bin $B_m$:
- Confidence (average predicted probability in bin):
  - $\text{conf}(B_m) = \frac{1}{|B_m|} \sum_{i \in B_m} \hat{p}_i$
- Accuracy (fraction of true positives in bin):
  - $\text{acc}(B_m) = \frac{1}{|B_m|} \sum_{i \in B_m} \mathbf{1}(\hat{y}_i = y_i)$
Then: $\text{MCE} = \max_{m=1,\dots,M} \; \big| \text{acc}(B_m) – \text{conf}(B_m) \big|$

Intuition

ECE tells you on average how far off your probabilities are from reality.
MCE tells you the worst case bin where your predicted probability was most misleading.
Example: If a model predicts 90% confidence but the true frequency is only 60% → calibration error = 0.30. If this is the largest deviation among bins, then MCE = 0.30.

Properties

Range: $0 \leq \text{MCE} \leq 1$.
Perfect calibration: MCE = 0 (confidence always equals accuracy).
Sensitive to binning: Choice of number of bins MMM matters. Too few bins → underestimation of error; too many → noisy estimate.
Interpretation: A high MCE means at least one probability bucket is very misleading, which could be dangerous in high-stakes settings (medicine, finance, risk estimation).

Example

Suppose you divide predictions into 5 bins:

Bin	Confidence	Accuracy	\|Accuracy – Confidence\|
[0.0–0.2]	0.10	0.12	0.02
[0.2–0.4]	0.30	0.28	0.02
[0.4–0.6]	0.50	0.52	0.02
[0.6–0.8]	0.70	0.60	0.10
[0.8–1.0]	0.90	0.75	0.15

ECE = weighted average of these errors.
MCE = max = 0.15.

So the model’s worst calibration gap is 15%.

Use Cases

Risk assessment (e.g., medical diagnosis, fraud detection).
Any domain where confidence reliability matters.
Often reported alongside ECE and Brier score.

Summary:
MCE = the largest absolute difference between predicted confidence and actual accuracy across probability bins. It highlights the worst-case calibration failure of your model, unlike ECE which averages across bins.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Maximum Calibration Error (MCE)

Definition

Formal Setup

Intuition

Properties

Example

Use Cases

Like this:

Related

Leave a ReplyCancel reply

Definition

Formal Setup

Intuition

Properties

Example

Use Cases

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery