1. What They Are
- Forecasting competitions are organized challenges where participants compete to predict future values of time series or other real-world data.
- They provide a benchmark: everyone gets the same historical data, then submits forecasts for unseen future periods.
- Organizers evaluate accuracy with metrics like MASE, sMAPE, RMSE.
2. Why They Matter
- Stimulate development of new forecasting methods.
- Provide large-scale empirical evidence about which methods work best in practice.
- Reveal strengths and weaknesses of statistical vs machine learning approaches.
- Set standards for reproducibility and evaluation in forecasting research.
3. Major Forecasting Competitions
M-Competitions (Makridakis Competitions)
- M1 (1982): First large-scale empirical comparison of forecasting methods.
- Found that simple methods (exponential smoothing, naive) often rivaled complex ones.
- M2 (1993): Expanded, confirmed earlier findings.
- M3 (2000): 3003 time series; accuracy of many methods compared.
- M4 (2018): 100,000 series; compared statistical, ML, and hybrid methods.
- Hybrid methods (combining statistical + ML) performed best.
- M5 (2020, Kaggle): Walmart sales forecasting; demand forecasting with hierarchy.
- Gradient boosting (LightGBM, XGBoost) + hierarchical reconciliation performed well.
Kaggle Forecasting Competitions
- Rossmann Store Sales, Walmart M5, Web Traffic Forecasting.
- Highlight practical business use cases, often with tabular ML + time series.
Tourism Forecasting Competitions
- Predict international tourism demand; highlight seasonality/trends.
Other Domains
- Energy forecasting (load, price, renewable generation).
- Epidemiology (COVID-19 case forecasts).
- Financial forecasting competitions.
4. Key Lessons from Competitions
- No single method always wins → performance depends on data characteristics.
- Simple benchmarks are strong (e.g., Naïve, ETS, ARIMA).
- Combining forecasts (ensembles) usually outperforms single methods.
- Hybrid approaches (statistical + ML) often dominate pure ML or pure classical methods.
- Probabilistic forecasting (prediction intervals, quantiles) is increasingly valued, not just point forecasts.
5. Evaluation Metrics Used
- MASE (Mean Absolute Scaled Error) – scale-free.
- sMAPE (Symmetric Mean Absolute Percentage Error).
- RMSE, MAE.
- Pinball Loss (for quantile forecasts).
Summary:
Forecasting competitions (like the M-series and Kaggle’s Walmart M5) provide common datasets and evaluation metrics for researchers to test forecasting methods. They show that no single approach dominates, but simple methods, ensembles, and hybrids often perform best. They have shaped the evolution of forecasting practice and the balance between statistics and machine learning.
