1. Definition
RMSLE measures the average squared difference between the logarithms of predicted and actual values, then takes the square root:
$RMSLE = \sqrt{\frac{1}{N} \sum_{i=1}^N \left( \log(\hat{y}_i + 1) – \log(y_i + 1) \right)^2 }$
where:
- $N$ = number of samples
- $y_i$ = actual value
- $\hat{y}_i$ = predicted value
- Adding +1 prevents issues when $y=0$ or $\hat{y}=0$.
2. Intuition
- Instead of penalizing absolute differences, it penalizes relative differences.
- Focuses more on the ratio between predicted and actual, not the raw scale.
- Small relative errors are treated leniently, while large relative errors are penalized more.
3. Example
Suppose actual vs predicted sales:
| Observation | Actual ($y$) | Predicted ($\hat{y}$) | log(y+1) | log(ŷ+1) | Difference² |
|---|---|---|---|---|---|
| 1 | 100 | 90 | 4.615 | 4.500 | 0.013 |
| 2 | 200 | 220 | 5.303 | 5.398 | 0.009 |
| 3 | 400 | 360 | 5.993 | 5.888 | 0.011 |
$RMSLE = \sqrt{\frac{0.013 + 0.009 + 0.011}{3}} \approx 0.102$
→ The relative error is small even though raw errors (like 40 units) exist.
4. Why Use RMSLE?
- Useful when:
- Target values span several orders of magnitude (e.g., house prices, population counts).
- Relative accuracy is more important than absolute accuracy.
- Overestimations and underestimations should be penalized equally in log-space.
- Less sensitive to large absolute errors for very high values.
5. Comparison with Other Metrics
- MAE/RMSE: Penalize absolute errors (scale-dependent).
- MAPE: Penalizes percentage error (scale-free, but unstable when $y \approx 0$).
- RMSLE: Penalizes logarithmic difference → more robust when actual values vary widely.
6. Python Example
import numpy as np
def rmsle(y_true, y_pred):
return np.sqrt(np.mean((np.log1p(y_pred) - np.log1p(y_true))**2))
y_true = np.array([100, 200, 400])
y_pred = np.array([90, 220, 360])
print("RMSLE:", rmsle(y_true, y_pred))
Output:
RMSLE: 0.102
Summary
- RMSLE = RMSE applied in log-space (with +1 shift).
- Measures relative error rather than absolute error.
- Best for problems with wide target value ranges (sales, prices, population, counts).
- Lower RMSLE = better predictions.
