1. Definition
Log Loss measures the performance of a classification model where the output is a probability between 0 and 1.
It penalizes wrong and overconfident predictions more heavily than mild mistakes.
$LogLoss = -\frac{1}{N} \sum_{i=1}^N \Big[ y_i \cdot \log(\hat{p}_i) + (1 – y_i) \cdot \log(1 – \hat{p}_i) \Big]$
where:
- $N$ = number of samples
- $y_i$ = true label (0 or 1)
- $\hat{p}_i$ = predicted probability for the positive class
2. Interpretation
- Range: [0, ∞)
- 0 = perfect predictions (probabilities exactly match outcomes)
- Higher values = worse predictions
- Strong penalty for being confident and wrong:
- Predicting 0.99 when actual = 0 → heavy penalty
- Predicting 0.6 when actual = 0 → smaller penalty
3. Example
Suppose we have 3 predictions:
| True Label | Predicted Prob ($\hat{p}$) | Contribution to Log Loss |
|---|---|---|
| 1 | 0.9 | $-\log(0.9) = 0.105$ |
| 0 | 0.2 | $-\log(0.8) = 0.223$ |
| 1 | 0.1 | $-\log(0.1) = 2.303$ |
Average Log Loss = (0.105 + 0.223 + 2.303) / 3 = 0.877
Notice how the last prediction (confident but wrong) dominates the loss.
4. Why It’s Useful
- Captures Confidence: Unlike accuracy, log loss punishes “confident but wrong” predictions.
- Probabilistic Evaluation: Essential for tasks where calibrated probabilities matter (fraud, medical diagnosis, weather).
- Model Training: Many algorithms (e.g., logistic regression, neural networks) optimize log loss directly.
5. Comparison with Brier Score
- Brier Score: Mean squared error of predicted probabilities. Treats 0.9 vs. 1.0 and 0.6 vs. 1.0 more evenly.
- Log Loss: Stronger penalty for overconfident wrong predictions, making it harsher.
Example: Predicting 0.01 when the true label is 1 →
- Brier error = (0.01 – 1)² = 0.9801
- Log Loss error = –log(0.01) = 4.605 (much harsher penalty)
6. Example in Python
from sklearn.metrics import log_loss
y_true = [1, 0, 1]
y_prob = [0.9, 0.2, 0.1]
loss = log_loss(y_true, y_prob)
print("Log Loss:", loss)
Output:
Log Loss: 0.877
Summary
- Log Loss = average negative log likelihood of the true class.
- Range = [0, ∞), lower is better.
- Strongly penalizes overconfident wrong predictions.
- Common in classification tasks and directly optimized by many ML algorithms.
