1. Definition

Log Loss measures the performance of a classification model where the output is a probability between 0 and 1.
It penalizes wrong and overconfident predictions more heavily than mild mistakes.

For binary classification:

$LogLoss = -\frac{1}{N} \sum_{i=1}^N \Big[ y_i \cdot \log(\hat{p}_i) + (1 – y_i) \cdot \log(1 – \hat{p}_i) \Big]$

where:

  • $N$ = number of samples
  • $y_i$​ = true label (0 or 1)
  • $\hat{p}_i$ = predicted probability for the positive class

2. Interpretation

  • Range: [0, ∞)
    • 0 = perfect predictions (probabilities exactly match outcomes)
    • Higher values = worse predictions
  • Strong penalty for being confident and wrong:
    • Predicting 0.99 when actual = 0 → heavy penalty
    • Predicting 0.6 when actual = 0 → smaller penalty

3. Example

Suppose we have 3 predictions:

True LabelPredicted Prob ($\hat{p}$)Contribution to Log Loss
10.9$-\log(0.9) = 0.105$
00.2$-\log(0.8) = 0.223$
10.1$-\log(0.1) = 2.303$

Average Log Loss = (0.105 + 0.223 + 2.303) / 3 = 0.877

Notice how the last prediction (confident but wrong) dominates the loss.


4. Why It’s Useful


5. Comparison with Brier Score

  • Brier Score: Mean squared error of predicted probabilities. Treats 0.9 vs. 1.0 and 0.6 vs. 1.0 more evenly.
  • Log Loss: Stronger penalty for overconfident wrong predictions, making it harsher.

Example: Predicting 0.01 when the true label is 1 →

  • Brier error = (0.01 – 1)² = 0.9801
  • Log Loss error = –log(0.01) = 4.605 (much harsher penalty)

6. Example in Python

from sklearn.metrics import log_loss

y_true = [1, 0, 1]
y_prob = [0.9, 0.2, 0.1]

loss = log_loss(y_true, y_prob)
print("Log Loss:", loss)

Output:

Log Loss: 0.877

Summary

  • Log Loss = average negative log likelihood of the true class.
  • Range = [0, ∞), lower is better.
  • Strongly penalizes overconfident wrong predictions.
  • Common in classification tasks and directly optimized by many ML algorithms.