1. Definition

RMSLE measures the average squared difference between the logarithms of predicted and actual values, then takes the square root:

$RMSLE = \sqrt{\frac{1}{N} \sum_{i=1}^N \left( \log(\hat{y}_i + 1) – \log(y_i + 1) \right)^2 }$

where:

  • $N$ = number of samples
  • $y_i$​ = actual value
  • $\hat{y}_i$ = predicted value
  • Adding +1 prevents issues when $y=0$ or $\hat{y}=0$.

2. Intuition

  • Instead of penalizing absolute differences, it penalizes relative differences.
  • Focuses more on the ratio between predicted and actual, not the raw scale.
  • Small relative errors are treated leniently, while large relative errors are penalized more.

3. Example

Suppose actual vs predicted sales:

ObservationActual ($y$)Predicted ($\hat{y}$​)log(y+1)log(ŷ+1)Difference²
1100904.6154.5000.013
22002205.3035.3980.009
34003605.9935.8880.011

$RMSLE = \sqrt{\frac{0.013 + 0.009 + 0.011}{3}} \approx 0.102$

→ The relative error is small even though raw errors (like 40 units) exist.


4. Why Use RMSLE?

  • Useful when:
    • Target values span several orders of magnitude (e.g., house prices, population counts).
    • Relative accuracy is more important than absolute accuracy.
    • Overestimations and underestimations should be penalized equally in log-space.
  • Less sensitive to large absolute errors for very high values.

5. Comparison with Other Metrics

  • MAE/RMSE: Penalize absolute errors (scale-dependent).
  • MAPE: Penalizes percentage error (scale-free, but unstable when $y \approx 0$).
  • RMSLE: Penalizes logarithmic difference → more robust when actual values vary widely.

6. Python Example

import numpy as np

def rmsle(y_true, y_pred):
    return np.sqrt(np.mean((np.log1p(y_pred) - np.log1p(y_true))**2))

y_true = np.array([100, 200, 400])
y_pred = np.array([90, 220, 360])

print("RMSLE:", rmsle(y_true, y_pred))

Output:

RMSLE: 0.102

Summary

  • RMSLE = RMSE applied in log-space (with +1 shift).
  • Measures relative error rather than absolute error.
  • Best for problems with wide target value ranges (sales, prices, population, counts).
  • Lower RMSLE = better predictions.