RMSLE (Root Mean Squared Logarithmic Error)

1. Definition

RMSLE measures the average squared difference between the logarithms of predicted and actual values, then takes the square root:

$RMSLE = \sqrt{\frac{1}{N} \sum_{i=1}^N \left( \log(\hat{y}_i + 1) – \log(y_i + 1) \right)^2 }$

where:

$N$ = number of samples
$y_i$ = actual value
$\hat{y}_i$ = predicted value
Adding +1 prevents issues when $y=0$ or $\hat{y}=0$.

2. Intuition

Instead of penalizing absolute differences, it penalizes relative differences.
Focuses more on the ratio between predicted and actual, not the raw scale.
Small relative errors are treated leniently, while large relative errors are penalized more.

3. Example

Suppose actual vs predicted sales:

Observation	Actual ($y$)	Predicted ($\hat{y}$)	log(y+1)	log(ŷ+1)	Difference²
1	100	90	4.615	4.500	0.013
2	200	220	5.303	5.398	0.009
3	400	360	5.993	5.888	0.011

$RMSLE = \sqrt{\frac{0.013 + 0.009 + 0.011}{3}} \approx 0.102$

→ The relative error is small even though raw errors (like 40 units) exist.

4. Why Use RMSLE?

Useful when:
- Target values span several orders of magnitude (e.g., house prices, population counts).
- Relative accuracy is more important than absolute accuracy.
- Overestimations and underestimations should be penalized equally in log-space.
Less sensitive to large absolute errors for very high values.

5. Comparison with Other Metrics

MAE/RMSE: Penalize absolute errors (scale-dependent).
MAPE: Penalizes percentage error (scale-free, but unstable when $y \approx 0$).
RMSLE: Penalizes logarithmic difference → more robust when actual values vary widely.

6. Python Example

import numpy as np

def rmsle(y_true, y_pred):
    return np.sqrt(np.mean((np.log1p(y_pred) - np.log1p(y_true))**2))

y_true = np.array([100, 200, 400])
y_pred = np.array([90, 220, 360])

print("RMSLE:", rmsle(y_true, y_pred))

import numpy as np

def rmsle(y_true, y_pred):
    return np.sqrt(np.mean((np.log1p(y_pred) - np.log1p(y_true))**2))

y_true = np.array([100, 200, 400])
y_pred = np.array([90, 220, 360])

print("RMSLE:", rmsle(y_true, y_pred))

Output:

RMSLE: 0.102

RMSLE: 0.102

Summary

RMSLE = RMSE applied in log-space (with +1 shift).
Measures relative error rather than absolute error.
Best for problems with wide target value ranges (sales, prices, population, counts).
Lower RMSLE = better predictions.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

RMSLE (Root Mean Squared Logarithmic Error)

1. Definition

2. Intuition

3. Example

4. Why Use RMSLE?

5. Comparison with Other Metrics

6. Python Example

Like this:

Related

Leave a ReplyCancel reply

1. Definition

2. Intuition

3. Example

4. Why Use RMSLE?

5. Comparison with Other Metrics

6. Python Example

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery