1. What They Are
- Benchmarks in forecasting are simple, well-understood methods that serve as a baseline standard of performance.
- Any advanced forecasting method (ARIMA, Prophet, LSTM, Transformer) should outperform these benchmarks; if not, it’s not useful.
2. Common Forecasting Benchmarks
Naïve Method (Random Walk)
$\hat{y}_{t+1} = y_t$
- Forecast = last observed value.
- Surprisingly competitive, especially in finance (stock prices).
Seasonal Naïve Method
$\hat{y}_{t+h} = y_{t+h-m}$
- Forecast = value from the same season last cycle (e.g., last year’s same month).
- Works well for strongly seasonal data (retail, energy, tourism).
Mean Method
$\hat{y}_{t+1} = \bar{y}$
- Forecast = average of all past observations.
- Best when series is stationary with no trend/seasonality.
Drift Method (Trend Naïve)
$\hat{y}_{t+h} = y_t + h \cdot \frac{y_t – y_1}{t-1}$
- Extends a line from the first to the last observation into the future.
- Captures simple linear trend.
Exponential Smoothing (Simple, Holt, Holt-Winters)
- Weighted averages of past data with stronger weight on recent values.
- Still considered a benchmark in competitions (ETS family).
3. Benchmarks in Competitions
- M-Competitions (M1–M5): Always compare advanced models against simple baselines (Naïve, Seasonal Naïve, Exponential Smoothing).
- Findings: Naïve and exponential smoothing methods often perform surprisingly well.
4. Metrics Used for Benchmarking
- MAE (Mean Absolute Error), RMSE (Root Mean Squared Error)
- MAPE (Mean Absolute Percentage Error)
- MASE (Mean Absolute Scaled Error): explicitly scales error relative to the naïve forecast → $MASE = \frac{\text{MAE(model)}}{\text{MAE(naïve)}}$ If MASE < 1 → model beats the naïve benchmark.
5. Why Benchmarks Matter
- They provide a sanity check.
- Prevent researchers from overclaiming improvement.
- Show whether added complexity is worth it.
Summary:
Forecasting benchmarks are simple, transparent methods like naïve, seasonal naïve, mean, and drift that serve as minimum performance standards. Competitions like the M-series showed that even simple benchmarks can be strong, which is why advanced models must always be evaluated against them using error metrics (MAE, RMSE, MASE, etc.).
