1) Definition
RMSE is a standard metric for evaluating regression models. It measures the average magnitude of error between predicted values and actual values, with stronger penalties for large errors.
Mathematically:
$\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i – \hat{y}_i)^2}$
- $y_i$ = actual value
- $\hat{y}_i$ = predicted value
- $n$ = number of observations
2) Interpretation
- Unit: Same unit as the target variable (unlike MSE which is squared units).
- Lower is better: Smaller RMSE means predictions are closer to the actual values.
- Sensitive to outliers: Because errors are squared, a few large mistakes can increase RMSE a lot.
3) Example
Suppose actual sales vs predicted sales:
| Observation | Actual ($y$) | Predicted ($\hat{y}$) | Error ($y – \hat{y}$) | Squared Error |
|---|---|---|---|---|
| 1 | 100 | 90 | 10 | 100 |
| 2 | 200 | 220 | -20 | 400 |
| 3 | 300 | 310 | -10 | 100 |
- Mean Squared Error (MSE) = (100 + 400 + 100)/3 = 200
- RMSE = $\sqrt{200} \approx 14.14$
Interpretation: On average, the model’s predictions are off by about 14 units of sales.
4) Comparison with Other Metrics
- MAE (Mean Absolute Error): Average of absolute errors, less sensitive to outliers.
- RMSE: Penalizes larger errors more heavily due to squaring.
- MAPE (Mean Absolute Percentage Error): Expresses error as a percentage, scale-free.
5) When to Use RMSE
- When large errors are especially costly (e.g., under-predicting demand by a lot causes stockouts).
- When you want a metric that emphasizes model accuracy in the presence of variability.
- When the data scale is consistent and interpretability in natural units matters.
6) Limitations
- Not scale-independent: RMSE depends on the scale of the target variable, making it hard to compare across datasets.
- Outlier-sensitive: A single extreme prediction error can dominate RMSE.
- Not percentage-based: Harder to communicate to business stakeholders compared to MAPE.
Summary:
RMSE is the square root of the average squared error, giving a measure of how far predictions are from actuals in the same unit as the target. It’s widely used, especially when large errors must be penalized more severely than small ones.
