1. Why Bayesian for time series?

Classical time series (like ARIMA, exponential smoothing) assumes fixed parameters estimated by maximum likelihood.
But real-world forecasting often requires:

  • Uncertainty quantification (probability distribution of future values, not just a point forecast)
  • Flexibility (incorporating prior knowledge, structural assumptions, missing data)
  • Online updating (re-forecast as new data arrives)

Bayesian methods naturally provide these through posterior distributions.


2. General Bayesian framework

We model time series $y_{1:T}$​ with latent parameters $\theta$:

$p(\theta \mid y_{1:T}) \propto p(y_{1:T} \mid \theta) \, p(\theta)$

  • $p(\theta)$: prior beliefs (e.g., smoothness, trend, seasonal structure)
  • $p(y_{1:T} \mid \theta)$: likelihood (often Gaussian or Poisson)
  • Posterior distributions describe parameter and prediction uncertainty.

For forecasting:

$p(y_{T+h} \mid y_{1:T}) = \int p(y_{T+h} \mid \theta, y_{1:T}) \, p(\theta \mid y_{1:T}) \, d\theta$


3. Common Bayesian time series models

(a) Bayesian ARIMA / ARMA

  • Same as ARIMA but with priors on parameters ($\phi, \theta, \sigma^2$).
  • Inference via MCMC or Variational Inference.
  • Produces posterior distributions for parameters and forecasts.

(b) State Space Models (SSM) / Dynamic Linear Models (DLMs)

  • Time series is expressed with hidden states:
    • $x_t = F x_{t-1} + w_t, \quad y_t = H x_t + v_t$​ where $w_t, v_t$​ are noise terms.
  • Examples: local level models, trend + seasonality decomposition.
  • Bayesian filtering methods:
    • Kalman filter (conjugate Gaussian case)
    • Particle filter (nonlinear/non-Gaussian case)

(c) Bayesian Structural Time Series (BSTS)

  • Decompose data into trend + seasonality + regressors + holiday effects.
  • Used in Google’s CausalImpact library (for estimating effect of interventions).
  • Great for causal inference in time series (A/B tests, policy evaluation).

(d) Gaussian Process Time Series

  • Place a GP prior on the latent function $f(t)$:
    • $y_t \sim \mathcal{N}(f(t), \sigma^2), \quad f \sim \mathcal{GP}(0, k(t,t’))$
  • Kernel $k$ captures smoothness, periodicity, etc.
  • Flexible but computationally heavy for long series.

(e) Bayesian VAR (Vector Autoregression)

  • For multiple time series:
    • $y_t = A_1 y_{t-1} + \cdots + A_p y_{t-p} + \epsilon_t​$
  • Priors on $A_i$ help with overfitting (shrinkage priors, Minnesota prior).

4. Inference methods

  • Conjugate Gibbs sampling (e.g., Normal–Inverse-Gamma for AR models)
  • Hamiltonian Monte Carlo (HMC) in Stan/PyMC
  • Variational inference for faster approximation
  • Particle MCMC for nonlinear state-space models

5. Advantages

  • Full posterior predictive distributions (not just point forecasts).
  • Uncertainty in both parameters and future values.
  • Natural way to incorporate domain knowledge as priors.
  • Flexible: handles missing data, regime shifts, interventions.

6. Challenges

  • Computational cost (MCMC for large data).
  • Requires careful prior specification.
  • Not always easy to scale (e.g., GP on long series).

Simple summary:
Bayesian time series models put probability distributions over parameters (and sometimes over functions), producing full predictive distributions with uncertainty. They extend classical models like ARIMA into a Bayesian framework, and also include more advanced methods like state-space models, structural models, Gaussian processes, and Bayesian VARs.