1. Why Bayesian for time series?
Classical time series (like ARIMA, exponential smoothing) assumes fixed parameters estimated by maximum likelihood.
But real-world forecasting often requires:
- Uncertainty quantification (probability distribution of future values, not just a point forecast)
- Flexibility (incorporating prior knowledge, structural assumptions, missing data)
- Online updating (re-forecast as new data arrives)
Bayesian methods naturally provide these through posterior distributions.
2. General Bayesian framework
We model time series $y_{1:T}$ with latent parameters $\theta$:
$p(\theta \mid y_{1:T}) \propto p(y_{1:T} \mid \theta) \, p(\theta)$
- $p(\theta)$: prior beliefs (e.g., smoothness, trend, seasonal structure)
- $p(y_{1:T} \mid \theta)$: likelihood (often Gaussian or Poisson)
- Posterior distributions describe parameter and prediction uncertainty.
For forecasting:
$p(y_{T+h} \mid y_{1:T}) = \int p(y_{T+h} \mid \theta, y_{1:T}) \, p(\theta \mid y_{1:T}) \, d\theta$
3. Common Bayesian time series models
(a) Bayesian ARIMA / ARMA
- Same as ARIMA but with priors on parameters ($\phi, \theta, \sigma^2$).
- Inference via MCMC or Variational Inference.
- Produces posterior distributions for parameters and forecasts.
(b) State Space Models (SSM) / Dynamic Linear Models (DLMs)
- Time series is expressed with hidden states:
- $x_t = F x_{t-1} + w_t, \quad y_t = H x_t + v_t$ where $w_t, v_t$ are noise terms.
- Examples: local level models, trend + seasonality decomposition.
- Bayesian filtering methods:
- Kalman filter (conjugate Gaussian case)
- Particle filter (nonlinear/non-Gaussian case)
(c) Bayesian Structural Time Series (BSTS)
- Decompose data into trend + seasonality + regressors + holiday effects.
- Used in Google’s CausalImpact library (for estimating effect of interventions).
- Great for causal inference in time series (A/B tests, policy evaluation).
(d) Gaussian Process Time Series
- Place a GP prior on the latent function $f(t)$:
- $y_t \sim \mathcal{N}(f(t), \sigma^2), \quad f \sim \mathcal{GP}(0, k(t,t’))$
- Kernel $k$ captures smoothness, periodicity, etc.
- Flexible but computationally heavy for long series.
(e) Bayesian VAR (Vector Autoregression)
- For multiple time series:
- $y_t = A_1 y_{t-1} + \cdots + A_p y_{t-p} + \epsilon_t$
- Priors on $A_i$ help with overfitting (shrinkage priors, Minnesota prior).
4. Inference methods
- Conjugate Gibbs sampling (e.g., Normal–Inverse-Gamma for AR models)
- Hamiltonian Monte Carlo (HMC) in Stan/PyMC
- Variational inference for faster approximation
- Particle MCMC for nonlinear state-space models
5. Advantages
- Full posterior predictive distributions (not just point forecasts).
- Uncertainty in both parameters and future values.
- Natural way to incorporate domain knowledge as priors.
- Flexible: handles missing data, regime shifts, interventions.
6. Challenges
- Computational cost (MCMC for large data).
- Requires careful prior specification.
- Not always easy to scale (e.g., GP on long series).
Simple summary:
Bayesian time series models put probability distributions over parameters (and sometimes over functions), producing full predictive distributions with uncertainty. They extend classical models like ARIMA into a Bayesian framework, and also include more advanced methods like state-space models, structural models, Gaussian processes, and Bayesian VARs.
