1. What They Are

  • A series of large-scale forecasting competitions organized by Spyros Makridakis and colleagues.
  • Purpose: provide empirical evidence about which forecasting methods work best in practice.
  • Each competition released large sets of real-world time series, required participants to generate forecasts, and then compared results using common error metrics.

2. Timeline of M-Competitions

M1 (1982)

  • Data: 1001 time series (monthly, quarterly, annual, others).
  • Findings:
    • No single method dominated.
    • Simple methods (Naïve, Exponential Smoothing) performed about as well as complex statistical methods.

M2 (1993)

  • Extended dataset, included judgmental forecasts (human expert input).
  • Findings:
    • Combining statistical + judgmental forecasts improved accuracy.
    • Once again, simple statistical models were competitive.

M3 (2000)

  • Data: 3003 time series.
  • Findings:
    • Exponential smoothing (ETS) and ARIMA remained strong.
    • Combining forecasts from multiple methods consistently improved performance.
    • Highlighted importance of forecasting accuracy measurement across diverse data.

M4 (2018)

  • Data: 100,000 time series (largest so far).
  • Findings:
    • Hybrid methods (statistical + machine learning) outperformed pure ML or pure statistical methods.
    • Forecast combinations again proved superior.
    • Showed the growing role of ensembles.

M5 (2020, hosted on Kaggle)

  • Data: Walmart’s hierarchical sales (tens of thousands of SKUs).
  • Focus: probabilistic forecasting (not just point forecasts).
  • Findings:
    • Machine learning (especially Gradient Boosting: LightGBM, XGBoost) performed very well.
    • Best methods combined ML + time series reconciliation (ensuring consistency across product hierarchy).
    • Probabilistic forecasts (quantiles, intervals) became the new standard.

3. Key Lessons Across M-Competitions

  1. No universal winner — model performance depends on data characteristics.
  2. Simple benchmarks (Naïve, ETS, ARIMA) are strong baselines and must always be included.
  3. Combining forecasts (ensembles) nearly always improves performance.
  4. Hybrid approaches (statistical + ML) outperform pure methods.
  5. Increasing focus on probabilistic forecasting (not just point predictions).
  6. Competitions provide real-world validation for methods, beyond theory.

4. Impact

  • Reshaped forecasting research by emphasizing empirical evidence over theory alone.
  • Encouraged adoption of simple methods and forecast combinations in industry.
  • Inspired modern ML competitions (like Kaggle’s forecasting challenges).

Summary:
The M-competitions (M1–M5) are landmark forecasting competitions that showed:

  • Simple methods (ETS, ARIMA) are tough baselines.
  • Combining forecasts improves accuracy.
  • Hybrid statistical + ML methods dominate modern forecasting.
  • Probabilistic forecasting is the new standard.