Understanding ARMA Processes

1) What an ARMA model is trying to do

A time series is a sequence of observations over time (sales per month, temperature per day, etc.).
An ARMA model is a mathematical way to describe how today’s value relates to past values and past random shocks, so that we can:

explain the “memory” or dependence in the data, and
forecast future values in a principled way.

ARMA is short for:

AR = AutoRegressive: today depends on past values of the series itself
MA = Moving Average: today depends on past random shocks (“innovations”)

2) The ARMA(p, q) definition, in simple terms

An ARMA(p, q) process $(X_t)$ is a stationary time series that satisfies: $\phi(B)X_t = \theta(B)W_t$

Key objects

(a) White noise $W_t$

Think of $W_t$ as the “new surprise” at time $t$ :

mean 0
constant variance
no correlation across time

In applied terms: $W_t$ is the unpredictable part.

(b) The backshift operator $B$

$B$ is a compact way to denote “one step back in time”: $BX_t = X_{t-1},\quad B^2X_t = X_{t-2}, \dots$

(c) The AR polynomial $\phi(z)$

$\phi(z) = 1 – \phi_1 z – \phi_2 z^2 – \cdots – \phi_p z^p$

When you replace $z$ with $B$ , this becomes an operator: $\phi(B)X_t = X_t – \phi_1 X_{t-1} – \cdots – \phi_p X_{t-p}$

So the AR part says: “today’s value minus a weighted sum of the last $p$ values…”

(d) The MA polynomial $\theta(z)$

$\theta(z) = 1 + \theta_1 z + \cdots + \theta_q z^q$

Similarly, $\theta(B)W_t = W_t + \theta_1 W_{t-1} + \cdots + \theta_q W_{t-q}$

So the MA part says: “…equals today’s new shock plus a weighted sum of the last $q$ shocks.”

Expanded form (what it really means)

$X_t – \phi_1 X_{t-1} – \cdots – \phi_p X_{t-p} = W_t + \theta_1 W_{t-1} + \cdots + \theta_q W_{t-q}$

Interpretation:

The left side captures “predictable structure” based on past $X$ ’s.
The right side captures “random disturbances” and how they may persist for a short time.

3) Why “no common factors / no common roots” matters

The definition requires that $\phi(z)$ and $\theta(z)$ have no common roots (equivalently no common factor).

Why this is necessary

If they share a factor, then part of the model is redundant and can be cancelled out, meaning:

you might write it like ARMA(p+r, q+r)
but after simplification it is really a smaller ARMA(p, q)

This is about identifying the true orders $p$ and $q$ .
Without this condition, many different-looking equations could describe the same process.

Example idea (conceptual)

If both sides contain a factor like $(1 – aB)$ , you can divide both sides by $(1 – aB)$ and get a simpler equivalent model.

4) Worked example: why an apparent ARMA(2,2) becomes ARMA(1,1)

You are given: $X_t – 1.5X_{t-1} + 0.56X_{t-2} = W_t – 0.3W_{t-1} – 0.4W_{t-2}$

This corresponds to:

AR polynomial: $\phi(z)=1-1.5z+0.56z^2$
MA polynomial: $\theta(z)=1-0.3z-0.4z^2$

Both can be factored:

$\phi(z) = (1 – 0.8z)(1 – 0.7z)$
$\theta(z) = (1 – 0.8z)(1 + 0.5z)$

They share the common factor $(1 – 0.8z)$ , so cancel it: $(1 – 0.7B)X_t = (1 + 0.5B)W_t$

That is ARMA(1,1).

Meaning

Even though the original equation included terms up to lag 2 on both sides, one “chunk” was duplicated on both sides. After removing that duplication, the true memory lengths are 1 and 1.

5) Factoring polynomials in practice (why software is used)

To check redundancy and to test causality/invertibility, you need the roots of $\phi(z)$ and $\theta(z)$ .

High-order polynomials are tedious to factor by hand, so in practice you compute roots numerically (e.g., with a function like polyroot in R).

If the same root appears in both $\phi(z)$ and $\theta(z)$ , you have a common factor.

6) How the sign of parameters affects “shape” of the series (MA(1) and AR(1))

MA(1): $X_t = W_t + \theta W_{t-1}$

If $\theta>0$ : consecutive values tend to move in the same direction more often (looks smoother).
If $\theta<0$ : consecutive values tend to alternate direction (looks “jumpy” or zig-zag).

This connects to the lag-1 autocorrelation: $\rho(1)=\frac{\theta}{1+\theta^2}$

So the sign of $\theta$ directly controls whether lag-1 correlation is positive or negative.

AR(1): $X_t = \phi X_{t-1}+W_t$

If $\phi>0$ and close to 1: strong persistence; values drift slowly (high momentum).
If $\phi<0$ : alternation; values tend to flip sign/direction (more jagged).

Lag- $h$ autocorrelation behaves like: $\rho(h)=\phi^{|h|}$

7) Causality and invertibility for ARMA(p, q)

These two properties sound abstract but are extremely practical.

7.1 Causality (for forecasting usefulness)

An ARMA process is causal if it can be written as: $X_t = \sum_{j=0}^{\infty}\psi_j W_{t-j}$

Meaning: $X_t$ can be expressed using only current and past shocks—no future information.

This is what you want for real forecasting: you can generate today’s value using past surprises.

7.2 Invertibility (for interpreting shocks)

It is invertible if you can rewrite shocks in terms of observed $X$ ’s: $W_t = \sum_{j=0}^{\infty}\pi_j X_{t-j}$

Meaning: from the observed series, you can recover the underlying “innovation shocks” in a stable way.

Invertibility is important because many different MA representations can generate the same autocorrelation structure. Invertibility picks a standard, well-behaved representation.

8) The “unit circle” test (the practical criterion)

This is the main operational result:

Causal if and only if all roots of $\phi(z)$ lie outside the unit circle: $|r|>1$ .
Invertible if and only if all roots of $\theta(z)$ lie outside the unit circle: $|s|>1$ .

Why “outside the unit circle”?

Because when roots are outside $|z|=1$ , certain infinite series expansions converge, which makes:

the causal expansion (in past shocks) stable, and
the invertible expansion (recovering shocks from data) stable.

If a root is inside the unit circle, the corresponding infinite expansion “blows up” or becomes unstable.

9) Example: determine (p, q), causality, invertibility

Given: $X_t = 1.1X_{t-1}-0.07X_{t-2}-0.147X_{t-3} + W_t + 0.5W_{t-1}-0.84W_{t-2}$

Candidate polynomials:

$\phi(z)=1-1.1z+0.07z^2+0.147z^3$ (degree 3)
$\theta(z)=1+0.5z-0.84z^2$ (degree 2)

The computed roots show one common root $1.428571…$ appears in both, so there is redundancy. After removing the common factor, the true degrees become:

AR degree 2
MA degree 1

So the process is ARMA(2,1).

Causality check

Remaining AR roots: $1.428571$ and $-3.333333$

$|1.428571|>1$
$|-3.333333|>1$

Therefore causal.

Invertibility check

Remaining MA root: $-0.8333333$

$|-0.8333333|<1$

Therefore not invertible.

Simplified final equation

After cancellation, the model can be written without redundancy as: $X_t = 0.4X_{t-1} + 0.21X_{t-2} + W_t + 1.2W_{t-1}$

10) Practical intuition: what ARMA(p, q) “means” in one sentence

An ARMA(p, q) model says:

Today’s value is partly explained by a weighted combination of the last $p$ values (AR),
plus a weighted combination of the last $q$ surprises (MA),
and the model is set up so this structure is stable over time (stationary),
and, ideally, usable in the real world (causal) and uniquely interpretable (invertible).

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Understanding ARMA Processes

1) What an ARMA model is trying to do

2) The ARMA(p, q) definition, in simple terms

Key objects

(a) White noise $W_t$

(b) The backshift operator $B$

(c) The AR polynomial $\phi(z)$

(d) The MA polynomial $\theta(z)$

Expanded form (what it really means)

3) Why “no common factors / no common roots” matters

Why this is necessary

Example idea (conceptual)

4) Worked example: why an apparent ARMA(2,2) becomes ARMA(1,1)

Meaning

5) Factoring polynomials in practice (why software is used)

6) How the sign of parameters affects “shape” of the series (MA(1) and AR(1))

MA(1): $X_t = W_t + \theta W_{t-1}$

AR(1): $X_t = \phi X_{t-1}+W_t$

7) Causality and invertibility for ARMA(p, q)

7.1 Causality (for forecasting usefulness)

7.2 Invertibility (for interpreting shocks)

8) The “unit circle” test (the practical criterion)

Why “outside the unit circle”?

9) Example: determine (p, q), causality, invertibility

Causality check

Invertibility check

Simplified final equation

10) Practical intuition: what ARMA(p, q) “means” in one sentence

Like this:

Related

Leave a ReplyCancel reply

1) What an ARMA model is trying to do

2) The ARMA(p, q) definition, in simple terms

Key objects

(a) White noise WtW_t​

(b) The backshift operator BB

(c) The AR polynomial ϕ(z)\phi(z)

(d) The MA polynomial θ(z)\theta(z)

Expanded form (what it really means)

3) Why “no common factors / no common roots” matters

Why this is necessary

Example idea (conceptual)

4) Worked example: why an apparent ARMA(2,2) becomes ARMA(1,1)

Meaning

5) Factoring polynomials in practice (why software is used)

6) How the sign of parameters affects “shape” of the series (MA(1) and AR(1))

MA(1): Xt=Wt+θWt−1X_t = W_t + \theta W_{t-1}

AR(1): Xt=ϕXt−1+WtX_t = \phi X_{t-1}+W_t

7) Causality and invertibility for ARMA(p, q)

7.1 Causality (for forecasting usefulness)

7.2 Invertibility (for interpreting shocks)

8) The “unit circle” test (the practical criterion)

Why “outside the unit circle”?

9) Example: determine (p, q), causality, invertibility

Causality check

Invertibility check

Simplified final equation

10) Practical intuition: what ARMA(p, q) “means” in one sentence

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery

(a) White noise $W_t$

(b) The backshift operator $B$

(c) The AR polynomial $\phi(z)$

(d) The MA polynomial $\theta(z)$

MA(1): $X_t = W_t + \theta W_{t-1}$

AR(1): $X_t = \phi X_{t-1}+W_t$