1. What They Are
- A series of large-scale forecasting competitions organized by Spyros Makridakis and colleagues.
- Purpose: provide empirical evidence about which forecasting methods work best in practice.
- Each competition released large sets of real-world time series, required participants to generate forecasts, and then compared results using common error metrics.
2. Timeline of M-Competitions
M1 (1982)
- Data: 1001 time series (monthly, quarterly, annual, others).
- Findings:
- No single method dominated.
- Simple methods (Naïve, Exponential Smoothing) performed about as well as complex statistical methods.
M2 (1993)
- Extended dataset, included judgmental forecasts (human expert input).
- Findings:
- Combining statistical + judgmental forecasts improved accuracy.
- Once again, simple statistical models were competitive.
M3 (2000)
- Data: 3003 time series.
- Findings:
- Exponential smoothing (ETS) and ARIMA remained strong.
- Combining forecasts from multiple methods consistently improved performance.
- Highlighted importance of forecasting accuracy measurement across diverse data.
M4 (2018)
- Data: 100,000 time series (largest so far).
- Findings:
- Hybrid methods (statistical + machine learning) outperformed pure ML or pure statistical methods.
- Forecast combinations again proved superior.
- Showed the growing role of ensembles.
M5 (2020, hosted on Kaggle)
- Data: Walmart’s hierarchical sales (tens of thousands of SKUs).
- Focus: probabilistic forecasting (not just point forecasts).
- Findings:
- Machine learning (especially Gradient Boosting: LightGBM, XGBoost) performed very well.
- Best methods combined ML + time series reconciliation (ensuring consistency across product hierarchy).
- Probabilistic forecasts (quantiles, intervals) became the new standard.
3. Key Lessons Across M-Competitions
- No universal winner — model performance depends on data characteristics.
- Simple benchmarks (Naïve, ETS, ARIMA) are strong baselines and must always be included.
- Combining forecasts (ensembles) nearly always improves performance.
- Hybrid approaches (statistical + ML) outperform pure methods.
- Increasing focus on probabilistic forecasting (not just point predictions).
- Competitions provide real-world validation for methods, beyond theory.
4. Impact
- Reshaped forecasting research by emphasizing empirical evidence over theory alone.
- Encouraged adoption of simple methods and forecast combinations in industry.
- Inspired modern ML competitions (like Kaggle’s forecasting challenges).
Summary:
The M-competitions (M1–M5) are landmark forecasting competitions that showed:
- Simple methods (ETS, ARIMA) are tough baselines.
- Combining forecasts improves accuracy.
- Hybrid statistical + ML methods dominate modern forecasting.
- Probabilistic forecasting is the new standard.
