linear processes

1) What is a “linear process”?

A linear process is a way to build a time-ordered random sequence $X_t$ by taking white noise (pure random shocks) $W_t$ and mixing many time-shifted copies of it with fixed weights.

The general form is: $X_t = \sum_{j=-\infty}^{\infty} \psi_j\, W_{t-j}$

$W_t$ is white noise: random values with mean 0, constant variance, and no correlation across time.
$\psi_j$ are numbers (weights) that determine how strongly each shock influences $X_t$ .
The sum uses shocks from many time positions relative to $t$ .

Why “linear”?

Because $X_t$ is a linear combination (weighted sum) of the noise terms $W_{t-j}$ .

Why do we need a condition on $\psi_j$ ?

To make this infinite sum mathematically meaningful and stable, we require: $\sum_{j=-\infty}^{\infty} |\psi_j| < \infty$

This “absolute summability” condition ensures:

The infinite sum behaves nicely (does not depend on how you group/reorder terms),
$X_t$ has finite variance (it does not “blow up”).

2) Why can the formula involve “past, present, and future”?

Look at the term $W_{t-j}$ :

If $j>0$ , then $t-j < t$ : this uses past shocks (normal and natural).
If $j=0$ , it uses the current shock $W_t$ .
If $j<0$ , then $t-j = t+|j|$ : this uses future shocks relative to time $t$ .

Using future information to define $X_t$ is usually not physically realistic for forecasting or real-time systems. So people often focus on causal models.

3) Causality: “Depends only on present and past”

A linear process is called causal if it uses only $W_t, W_{t-1}, W_{t-2}, \dots$ , meaning: $X_t = \sum_{j=0}^{\infty} \psi_j\, W_{t-j}$

This is also called an MA($\infty$) form (moving average of infinite order).

Important: despite the name “moving average,” the weights $\psi_j$ do not have to:

sum to 1,
be nonnegative.

Those restrictions are used in smoothing averages for data visualization, not in general stochastic modeling.

4) A quick practical intuition for linear processes

Think of $W_t$ as “new random news” or “shocks” arriving at time $t$ .

Then a linear process says:

Today’s value $X_t$ is the result of today’s shock plus echoes of earlier shocks (and possibly, in non-causal forms, future shocks).
The weights $\psi_0, \psi_1, \psi_2,\dots$ control how long shocks persist and how large their influence is.

If the weights decay quickly, older shocks matter less.

5) Numerical series: why absolute convergence matters

For ordinary sums like $\sum_{j=0}^{\infty} a_j$ , “converges” means partial sums approach a finite value.

“Converges absolutely” means $\sum |a_j|$ converges.

Absolute convergence is important because it guarantees:

convergence,
and that rearranging terms doesn’t change the sum.

For two-sided infinite sums $\sum_{j=-\infty}^{\infty} \psi_j$ , absolute convergence makes the sum unambiguous.

That same stability idea is used when summing $\sum \psi_j W_{t-j}$ in linear processes.

6) The geometric series: the key tool behind inverses

A classic fact is: $\sum_{j=0}^{\infty} z^j = \frac{1}{1-z} \quad \text{if } |z|<1$

This is the mathematical reason you can “invert” certain operators like $(1-\phi B)$ when $|\phi|<1$ .

7) The backshift operator $B$ : a compact way to talk about time shifts

Define the backshift operator $B$ by: $B X_t = X_{t-1}$

So:

$B^2 X_t = X_{t-2}$
$B^{-1} X_t = X_{t+1}$ (a forward shift)

Differencing in this language

The first difference is: $\nabla X_t = X_t – X_{t-1} = (1-B)X_t$

Seasonal (lag- $d$ ) differencing is: $\nabla_d X_t = X_t – X_{t-d} = (1-B^d)X_t$

So $B$ lets you write time-series operations as algebra.

8) Linear filters: “operators that transform a series”

A linear filter is an operator: $\psi(B) = \sum_{j=-\infty}^{\infty} \psi_j B^j$

Applying it to a time series $x_t$ produces: $y_t = \psi(B)x_t = \sum_{j=-\infty}^{\infty} \psi_j x_{t-j}$

This is exactly the same pattern as a linear process, except:

a linear process applies the filter to white noise $W_t$ ,
a general filter can apply to any series $x_t$ .

Example: smoothing (two-sided moving average)

A simple smoothing filter replaces each value by the average of itself and nearby values: $y_t = \frac{x_{t-2}+x_{t-1}+x_t+x_{t+1}+x_{t+2}}{5}$

This uses both past and future points of the observed data. That is fine for smoothing a recorded dataset, even though it would be unsuitable for real-time forecasting.

9) Inverse filters: undoing a transformation

Consider: $Y_t = (1-\phi B)X_t = X_t – \phi X_{t-1}$

Question: can we recover $X_t$ from $Y_t$ ?

If $|\phi|<1$ : causal inverse exists

Using the geometric series idea, the inverse is: $(1-\phi B)^{-1} = \sum_{j=0}^{\infty} \phi^j B^j$

So: $X_t = \sum_{j=0}^{\infty} \phi^j Y_{t-j}$

This uses only present and past $Y$ values (causal).

If $|\phi|>1$ : inverse exists but is non-causal

You can still invert, but the inverse involves negative powers of $B$ , meaning it uses future values.

If $\phi=\pm 1$ : no inverse

This is the “boundary case” where the geometric-series approach fails, and the operator cannot be undone in a stable way.

10) A central result: linear processes are weakly stationary

If:

$W_t$ is mean-zero white noise with variance $\sigma^2$ ,
$\sum |\psi_j|<\infty$ ,

then the linear process: $X_t=\sum_{j=-\infty}^{\infty}\psi_j W_{t-j}$

is weakly stationary, meaning:

constant mean over time,
autocovariance depends only on lag $h$ , not on $t$ .

Autocovariance formula

The autocovariance function is: $\gamma_X(h) = \sigma^2 \sum_{j=-\infty}^{\infty} \psi_j \psi_{j+h}$

Interpretation:

Correlation at lag $h$ comes from overlap between the weight sequence and a shifted copy of itself.

11) MA( $q$ ) processes: finite moving averages

An MA($q$) process is: $X_t = W_t + \theta_1 W_{t-1} + \cdots + \theta_q W_{t-q}$

This is causal because it uses only present/past shocks.

Key feature:

Its autocovariance is zero beyond lag $q$.

Formally:

If $|h|>q$ , then $\gamma_X(h)=0$ .

So MA models create short memory (dependence only up to a fixed lag).

12) AR(1) as a linear process via inversion

An AR(1) model satisfies: $X_t – \phi X_{t-1} = W_t$

Equivalently: $(1-\phi B)X_t = W_t$

If $|\phi|<1$ : causal AR(1) exists and becomes MA( $\infty$ )

Invert the operator: $X_t = (1-\phi B)^{-1}W_t = \sum_{j=0}^{\infty} \phi^j W_{t-j}$

So AR(1) can be viewed as a linear process with weights $\psi_j=\phi^j$ for $j\ge 0$ .

Autocovariance and autocorrelation

For $|\phi|<1$ : $\gamma_X(h)=\frac{\sigma^2}{1-\phi^2}\phi^{|h|}, \quad \rho_X(h)=\phi^{|h|}$

Meaning:

correlation decays geometrically with lag.

If $|\phi|>1$ : stationary solution is non-causal

A stationary solution exists, but it depends on future noise terms, which is generally undesirable for forecasting.

If $\phi=\pm 1$ : no stationary AR(1)

When $\phi=1$ , you get a random walk, which is not stationary.

13) ARMA(1,1): combining AR and MA

An ARMA(1,1) satisfies: $X_t – \phi X_{t-1} = W_t + \theta W_{t-1}$

Operator form: $(1-\phi B)X_t = (1+\theta B)W_t$

If $|\phi|<1$ , the model is causal and can be written as: $X_t = (1-\phi B)^{-1}(1+\theta B)W_t$

This yields an MA( $\infty$ ) representation where the effect of a shock decays over time like $\phi^j$ , but with a modified first step due to $\theta$ .

Invertibility (separate concept)

“Causal” means $X_t$ can be written using past $W$ ’s.
“Invertible” means $W_t$ can be written using past $X$ ’s.

For ARMA(1,1), invertibility holds when $|\theta|<1$ .

Invertibility matters because it makes the model identifiable and estimation more stable in practice.

Summary (plain-English takeaway)

A linear process builds a time series by adding up many time-shifted “random shocks” with weights.
The backshift operator $B$ is a clean algebraic way to describe time shifts and filters.
Linear filters transform a series via weighted combinations of shifted values.
Some filters have stable inverses, strongly tied to the geometric series.
Under a mild condition on weights ( $\sum |\psi_j|<\infty$ ), linear processes are weakly stationary.
MA, AR, and ARMA models can all be understood inside this “linear process + filter” framework.
Causality and invertibility tell you whether a model depends only on past information and whether you can recover shocks from observed data.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

linear processes

1) What is a “linear process”?

Why “linear”?

Why do we need a condition on $\psi_j$ ?

2) Why can the formula involve “past, present, and future”?

3) Causality: “Depends only on present and past”

4) A quick practical intuition for linear processes

5) Numerical series: why absolute convergence matters

6) The geometric series: the key tool behind inverses

7) The backshift operator $B$ : a compact way to talk about time shifts

Differencing in this language

8) Linear filters: “operators that transform a series”

Example: smoothing (two-sided moving average)

9) Inverse filters: undoing a transformation

If $|\phi|<1$ : causal inverse exists

If $|\phi|>1$ : inverse exists but is non-causal

If $\phi=\pm 1$ : no inverse

10) A central result: linear processes are weakly stationary

Autocovariance formula

11) MA( $q$ ) processes: finite moving averages

12) AR(1) as a linear process via inversion

If $|\phi|<1$ : causal AR(1) exists and becomes MA( $\infty$ )

Autocovariance and autocorrelation

If $|\phi|>1$ : stationary solution is non-causal

If $\phi=\pm 1$ : no stationary AR(1)

13) ARMA(1,1): combining AR and MA

Invertibility (separate concept)

Summary (plain-English takeaway)

Like this:

Related

Leave a ReplyCancel reply

1) What is a “linear process”?

Why “linear”?

Why do we need a condition on ψj\psi_j?

2) Why can the formula involve “past, present, and future”?

3) Causality: “Depends only on present and past”

4) A quick practical intuition for linear processes

5) Numerical series: why absolute convergence matters

6) The geometric series: the key tool behind inverses

7) The backshift operator BB: a compact way to talk about time shifts

Differencing in this language

8) Linear filters: “operators that transform a series”

Example: smoothing (two-sided moving average)

9) Inverse filters: undoing a transformation

If ∣ϕ∣<1|\phi|<1: causal inverse exists

If ∣ϕ∣>1|\phi|>1: inverse exists but is non-causal

If ϕ=±1\phi=\pm 1: no inverse

10) A central result: linear processes are weakly stationary

Autocovariance formula

11) MA(qq) processes: finite moving averages

12) AR(1) as a linear process via inversion

If ∣ϕ∣<1|\phi|<1: causal AR(1) exists and becomes MA(∞\infty)

Autocovariance and autocorrelation

If ∣ϕ∣>1|\phi|>1: stationary solution is non-causal

If ϕ=±1\phi=\pm 1: no stationary AR(1)

13) ARMA(1,1): combining AR and MA

Invertibility (separate concept)

Summary (plain-English takeaway)

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery

Why do we need a condition on $\psi_j$ ?

7) The backshift operator $B$ : a compact way to talk about time shifts

If $|\phi|<1$ : causal inverse exists

If $|\phi|>1$ : inverse exists but is non-causal

If $\phi=\pm 1$ : no inverse

11) MA( $q$ ) processes: finite moving averages

If $|\phi|<1$ : causal AR(1) exists and becomes MA( $\infty$ )

If $|\phi|>1$ : stationary solution is non-causal

If $\phi=\pm 1$ : no stationary AR(1)