1. What They Are

  • Benchmarks in forecasting are simple, well-understood methods that serve as a baseline standard of performance.
  • Any advanced forecasting method (ARIMA, Prophet, LSTM, Transformer) should outperform these benchmarks; if not, it’s not useful.

2. Common Forecasting Benchmarks

Naïve Method (Random Walk)

$\hat{y}_{t+1} = y_t$

  • Forecast = last observed value.
  • Surprisingly competitive, especially in finance (stock prices).

Seasonal Naïve Method

$\hat{y}_{t+h} = y_{t+h-m}$

  • Forecast = value from the same season last cycle (e.g., last year’s same month).
  • Works well for strongly seasonal data (retail, energy, tourism).

Mean Method

$\hat{y}_{t+1} = \bar{y}$

  • Forecast = average of all past observations.
  • Best when series is stationary with no trend/seasonality.

Drift Method (Trend Naïve)

$\hat{y}_{t+h} = y_t + h \cdot \frac{y_t – y_1}{t-1}$

  • Extends a line from the first to the last observation into the future.
  • Captures simple linear trend.

Exponential Smoothing (Simple, Holt, Holt-Winters)

  • Weighted averages of past data with stronger weight on recent values.
  • Still considered a benchmark in competitions (ETS family).

3. Benchmarks in Competitions

  • M-Competitions (M1–M5): Always compare advanced models against simple baselines (Naïve, Seasonal Naïve, Exponential Smoothing).
  • Findings: Naïve and exponential smoothing methods often perform surprisingly well.

4. Metrics Used for Benchmarking

  • MAE (Mean Absolute Error), RMSE (Root Mean Squared Error)
  • MAPE (Mean Absolute Percentage Error)
  • MASE (Mean Absolute Scaled Error): explicitly scales error relative to the naïve forecast → $MASE = \frac{\text{MAE(model)}}{\text{MAE(naïve)}}$​ If MASE < 1 → model beats the naïve benchmark.

5. Why Benchmarks Matter

  • They provide a sanity check.
  • Prevent researchers from overclaiming improvement.
  • Show whether added complexity is worth it.

Summary:
Forecasting benchmarks are simple, transparent methods like naïve, seasonal naïve, mean, and drift that serve as minimum performance standards. Competitions like the M-series showed that even simple benchmarks can be strong, which is why advanced models must always be evaluated against them using error metrics (MAE, RMSE, MASE, etc.).