1) What is a “linear process”?

A linear process is a way to build a time-ordered random sequence XtX_t by taking white noise (pure random shocks) WtW_t​ and mixing many time-shifted copies of it with fixed weights.

The general form is:Xt=j=ψjWtjX_t = \sum_{j=-\infty}^{\infty} \psi_j\, W_{t-j}

  • WtW_t​ is white noise: random values with mean 0, constant variance, and no correlation across time.
  • ψj\psi_j​ are numbers (weights) that determine how strongly each shock influences XtX_t​.
  • The sum uses shocks from many time positions relative to tt.

Why “linear”?

Because XtX_t​ is a linear combination (weighted sum) of the noise terms WtjW_{t-j}​.

Why do we need a condition on ψj\psi_j?

To make this infinite sum mathematically meaningful and stable, we require:j=ψj<\sum_{j=-\infty}^{\infty} |\psi_j| < \infty

This “absolute summability” condition ensures:

  • The infinite sum behaves nicely (does not depend on how you group/reorder terms),
  • XtX_t has finite variance (it does not “blow up”).

2) Why can the formula involve “past, present, and future”?

Look at the term WtjW_{t-j}​:

  • If j>0j>0, then tj<tt-j < t: this uses past shocks (normal and natural).
  • If j=0j=0, it uses the current shock WtW_t​.
  • If j<0j<0, then tj=t+jt-j = t+|j|: this uses future shocks relative to time tt.

Using future information to define XtX_t​ is usually not physically realistic for forecasting or real-time systems. So people often focus on causal models.


3) Causality: “Depends only on present and past”

A linear process is called causal if it uses only Wt,Wt1,Wt2,W_t, W_{t-1}, W_{t-2}, \dots​, meaning:Xt=j=0ψjWtjX_t = \sum_{j=0}^{\infty} \psi_j\, W_{t-j}

This is also called an MA($\infty$) form (moving average of infinite order).

Important: despite the name “moving average,” the weights ψj\psi_j​ do not have to:

  • sum to 1,
  • be nonnegative.

Those restrictions are used in smoothing averages for data visualization, not in general stochastic modeling.


4) A quick practical intuition for linear processes

Think of WtW_t​ as “new random news” or “shocks” arriving at time tt.

Then a linear process says:

  • Today’s value XtX_t​ is the result of today’s shock plus echoes of earlier shocks (and possibly, in non-causal forms, future shocks).
  • The weights ψ0,ψ1,ψ2,\psi_0, \psi_1, \psi_2,\dots control how long shocks persist and how large their influence is.

If the weights decay quickly, older shocks matter less.


5) Numerical series: why absolute convergence matters

For ordinary sums like j=0aj\sum_{j=0}^{\infty} a_j​, “converges” means partial sums approach a finite value.

“Converges absolutely” means aj\sum |a_j| converges.

Absolute convergence is important because it guarantees:

  • convergence,
  • and that rearranging terms doesn’t change the sum.

For two-sided infinite sums j=ψj\sum_{j=-\infty}^{\infty} \psi_j​, absolute convergence makes the sum unambiguous.

That same stability idea is used when summing ψjWtj\sum \psi_j W_{t-j} in linear processes.


6) The geometric series: the key tool behind inverses

A classic fact is:j=0zj=11zif z<1\sum_{j=0}^{\infty} z^j = \frac{1}{1-z} \quad \text{if } |z|<1

This is the mathematical reason you can “invert” certain operators like (1ϕB)(1-\phi B) when ϕ<1|\phi|<1.


7) The backshift operator BB: a compact way to talk about time shifts

Define the backshift operator BB by:BXt=Xt1B X_t = X_{t-1}

So:

  • B2Xt=Xt2B^2 X_t = X_{t-2}
  • B1Xt=Xt+1B^{-1} X_t = X_{t+1}​ (a forward shift)

Differencing in this language

The first difference is:Xt=XtXt1=(1B)Xt\nabla X_t = X_t – X_{t-1} = (1-B)X_t

Seasonal (lag-dd) differencing is:dXt=XtXtd=(1Bd)Xt\nabla_d X_t = X_t – X_{t-d} = (1-B^d)X_t

So BB lets you write time-series operations as algebra.


8) Linear filters: “operators that transform a series”

A linear filter is an operator:ψ(B)=j=ψjBj\psi(B) = \sum_{j=-\infty}^{\infty} \psi_j B^j

Applying it to a time series xtx_t​ produces:yt=ψ(B)xt=j=ψjxtjy_t = \psi(B)x_t = \sum_{j=-\infty}^{\infty} \psi_j x_{t-j}

This is exactly the same pattern as a linear process, except:

  • a linear process applies the filter to white noise WtW_t​,
  • a general filter can apply to any series xtx_t​.

Example: smoothing (two-sided moving average)

A simple smoothing filter replaces each value by the average of itself and nearby values:yt=xt2+xt1+xt+xt+1+xt+25y_t = \frac{x_{t-2}+x_{t-1}+x_t+x_{t+1}+x_{t+2}}{5}

This uses both past and future points of the observed data. That is fine for smoothing a recorded dataset, even though it would be unsuitable for real-time forecasting.


9) Inverse filters: undoing a transformation

Consider:Yt=(1ϕB)Xt=XtϕXt1Y_t = (1-\phi B)X_t = X_t – \phi X_{t-1}

Question: can we recover XtX_t​ from YtY_t?

If ϕ<1|\phi|<1: causal inverse exists

Using the geometric series idea, the inverse is:(1ϕB)1=j=0ϕjBj(1-\phi B)^{-1} = \sum_{j=0}^{\infty} \phi^j B^j

So:Xt=j=0ϕjYtjX_t = \sum_{j=0}^{\infty} \phi^j Y_{t-j}

This uses only present and past YY values (causal).

If ϕ>1|\phi|>1: inverse exists but is non-causal

You can still invert, but the inverse involves negative powers of BB, meaning it uses future values.

If ϕ=±1\phi=\pm 1: no inverse

This is the “boundary case” where the geometric-series approach fails, and the operator cannot be undone in a stable way.


10) A central result: linear processes are weakly stationary

If:

  • WtW_t is mean-zero white noise with variance σ2\sigma^2,
  • ψj<\sum |\psi_j|<\infty,

then the linear process:Xt=j=ψjWtjX_t=\sum_{j=-\infty}^{\infty}\psi_j W_{t-j}

is weakly stationary, meaning:

  • constant mean over time,
  • autocovariance depends only on lag hh, not on tt.

Autocovariance formula

The autocovariance function is:γX(h)=σ2j=ψjψj+h\gamma_X(h) = \sigma^2 \sum_{j=-\infty}^{\infty} \psi_j \psi_{j+h}

Interpretation:

  • Correlation at lag hh comes from overlap between the weight sequence and a shifted copy of itself.

11) MA(qq) processes: finite moving averages

An MA($q$) process is:Xt=Wt+θ1Wt1++θqWtqX_t = W_t + \theta_1 W_{t-1} + \cdots + \theta_q W_{t-q}

This is causal because it uses only present/past shocks.

Key feature:

  • Its autocovariance is zero beyond lag $q$.

Formally:

  • If h>q|h|>q, then γX(h)=0\gamma_X(h)=0.

So MA models create short memory (dependence only up to a fixed lag).


12) AR(1) as a linear process via inversion

An AR(1) model satisfies:XtϕXt1=WtX_t – \phi X_{t-1} = W_t

Equivalently:(1ϕB)Xt=Wt(1-\phi B)X_t = W_t

If ϕ<1|\phi|<1: causal AR(1) exists and becomes MA(\infty)

Invert the operator:Xt=(1ϕB)1Wt=j=0ϕjWtjX_t = (1-\phi B)^{-1}W_t = \sum_{j=0}^{\infty} \phi^j W_{t-j}

So AR(1) can be viewed as a linear process with weights ψj=ϕj\psi_j=\phi^j for j0j\ge 0.

Autocovariance and autocorrelation

For ϕ<1|\phi|<1:γX(h)=σ21ϕ2ϕh,ρX(h)=ϕh\gamma_X(h)=\frac{\sigma^2}{1-\phi^2}\phi^{|h|}, \quad \rho_X(h)=\phi^{|h|}

Meaning:

  • correlation decays geometrically with lag.

If ϕ>1|\phi|>1: stationary solution is non-causal

A stationary solution exists, but it depends on future noise terms, which is generally undesirable for forecasting.

If ϕ=±1\phi=\pm 1: no stationary AR(1)

When ϕ=1\phi=1, you get a random walk, which is not stationary.


13) ARMA(1,1): combining AR and MA

An ARMA(1,1) satisfies:XtϕXt1=Wt+θWt1X_t – \phi X_{t-1} = W_t + \theta W_{t-1}

Operator form:(1ϕB)Xt=(1+θB)Wt(1-\phi B)X_t = (1+\theta B)W_t

If ϕ<1|\phi|<1, the model is causal and can be written as:Xt=(1ϕB)1(1+θB)WtX_t = (1-\phi B)^{-1}(1+\theta B)W_t

This yields an MA(\infty) representation where the effect of a shock decays over time like ϕj\phi^j, but with a modified first step due to θ\theta.

Invertibility (separate concept)

  • “Causal” means XtX_t can be written using past WW’s.
  • “Invertible” means WtW_t​ can be written using past XX’s.

For ARMA(1,1), invertibility holds when θ<1|\theta|<1.

Invertibility matters because it makes the model identifiable and estimation more stable in practice.


Summary (plain-English takeaway)

  • A linear process builds a time series by adding up many time-shifted “random shocks” with weights.
  • The backshift operator BB is a clean algebraic way to describe time shifts and filters.
  • Linear filters transform a series via weighted combinations of shifted values.
  • Some filters have stable inverses, strongly tied to the geometric series.
  • Under a mild condition on weights (ψj<\sum |\psi_j|<\infty), linear processes are weakly stationary.
  • MA, AR, and ARMA models can all be understood inside this “linear process + filter” framework.
  • Causality and invertibility tell you whether a model depends only on past information and whether you can recover shocks from observed data.