1) What an ARMA model is trying to do

A time series is a sequence of observations over time (sales per month, temperature per day, etc.).
An ARMA model is a mathematical way to describe how today’s value relates to past values and past random shocks, so that we can:

  • explain the “memory” or dependence in the data, and
  • forecast future values in a principled way.

ARMA is short for:

  • AR = AutoRegressive: today depends on past values of the series itself
  • MA = Moving Average: today depends on past random shocks (“innovations”)

2) The ARMA(p, q) definition, in simple terms

An ARMA(p, q) process (Xt)(X_t) is a stationary time series that satisfies:ϕ(B)Xt=θ(B)Wt\phi(B)X_t = \theta(B)W_t

Key objects

(a) White noise WtW_t

Think of WtW_t as the “new surprise” at time tt:

  • mean 0
  • constant variance
  • no correlation across time

In applied terms: WtW_t​ is the unpredictable part.

(b) The backshift operator BB

BB is a compact way to denote “one step back in time”:BXt=Xt1,B2Xt=Xt2,BX_t = X_{t-1},\quad B^2X_t = X_{t-2}, \dots

(c) The AR polynomial ϕ(z)\phi(z)

ϕ(z)=1ϕ1zϕ2z2ϕpzp\phi(z) = 1 – \phi_1 z – \phi_2 z^2 – \cdots – \phi_p z^p

When you replace zz with BB, this becomes an operator:ϕ(B)Xt=Xtϕ1Xt1ϕpXtp\phi(B)X_t = X_t – \phi_1 X_{t-1} – \cdots – \phi_p X_{t-p}

So the AR part says: “today’s value minus a weighted sum of the last pp values…”

(d) The MA polynomial θ(z)\theta(z)

θ(z)=1+θ1z++θqzq\theta(z) = 1 + \theta_1 z + \cdots + \theta_q z^q

Similarly,θ(B)Wt=Wt+θ1Wt1++θqWtq\theta(B)W_t = W_t + \theta_1 W_{t-1} + \cdots + \theta_q W_{t-q}

So the MA part says: “…equals today’s new shock plus a weighted sum of the last qq shocks.”

Expanded form (what it really means)

Xtϕ1Xt1ϕpXtp=Wt+θ1Wt1++θqWtqX_t – \phi_1 X_{t-1} – \cdots – \phi_p X_{t-p} = W_t + \theta_1 W_{t-1} + \cdots + \theta_q W_{t-q}

Interpretation:

  • The left side captures “predictable structure” based on past XX’s.
  • The right side captures “random disturbances” and how they may persist for a short time.

3) Why “no common factors / no common roots” matters

The definition requires that ϕ(z)\phi(z) and θ(z)\theta(z) have no common roots (equivalently no common factor).

Why this is necessary

If they share a factor, then part of the model is redundant and can be cancelled out, meaning:

  • you might write it like ARMA(p+r, q+r)
  • but after simplification it is really a smaller ARMA(p, q)

This is about identifying the true orders pp and qq.
Without this condition, many different-looking equations could describe the same process.

Example idea (conceptual)

If both sides contain a factor like (1aB)(1 – aB), you can divide both sides by (1aB)(1 – aB) and get a simpler equivalent model.


4) Worked example: why an apparent ARMA(2,2) becomes ARMA(1,1)

You are given:Xt1.5Xt1+0.56Xt2=Wt0.3Wt10.4Wt2X_t – 1.5X_{t-1} + 0.56X_{t-2} = W_t – 0.3W_{t-1} – 0.4W_{t-2}

This corresponds to:

  • AR polynomial: ϕ(z)=11.5z+0.56z2\phi(z)=1-1.5z+0.56z^2
  • MA polynomial: θ(z)=10.3z0.4z2\theta(z)=1-0.3z-0.4z^2

Both can be factored:

  • ϕ(z)=(10.8z)(10.7z)\phi(z) = (1 – 0.8z)(1 – 0.7z)
  • θ(z)=(10.8z)(1+0.5z)\theta(z) = (1 – 0.8z)(1 + 0.5z)

They share the common factor (10.8z)(1 – 0.8z), so cancel it:(10.7B)Xt=(1+0.5B)Wt(1 – 0.7B)X_t = (1 + 0.5B)W_t

That is ARMA(1,1).

Meaning

Even though the original equation included terms up to lag 2 on both sides, one “chunk” was duplicated on both sides. After removing that duplication, the true memory lengths are 1 and 1.


5) Factoring polynomials in practice (why software is used)

To check redundancy and to test causality/invertibility, you need the roots of ϕ(z)\phi(z) and θ(z)\theta(z).

High-order polynomials are tedious to factor by hand, so in practice you compute roots numerically (e.g., with a function like polyroot in R).

If the same root appears in both ϕ(z)\phi(z) and θ(z)\theta(z), you have a common factor.


6) How the sign of parameters affects “shape” of the series (MA(1) and AR(1))

MA(1): Xt=Wt+θWt1X_t = W_t + \theta W_{t-1}

  • If θ>0\theta>0: consecutive values tend to move in the same direction more often (looks smoother).
  • If θ<0\theta<0: consecutive values tend to alternate direction (looks “jumpy” or zig-zag).

This connects to the lag-1 autocorrelation:ρ(1)=θ1+θ2\rho(1)=\frac{\theta}{1+\theta^2}

So the sign of θ\theta directly controls whether lag-1 correlation is positive or negative.

AR(1): Xt=ϕXt1+WtX_t = \phi X_{t-1}+W_t

  • If ϕ>0\phi>0 and close to 1: strong persistence; values drift slowly (high momentum).
  • If ϕ<0\phi<0: alternation; values tend to flip sign/direction (more jagged).

Lag-hh autocorrelation behaves like:ρ(h)=ϕh\rho(h)=\phi^{|h|}


7) Causality and invertibility for ARMA(p, q)

These two properties sound abstract but are extremely practical.

7.1 Causality (for forecasting usefulness)

An ARMA process is causal if it can be written as:Xt=j=0ψjWtjX_t = \sum_{j=0}^{\infty}\psi_j W_{t-j}

Meaning: XtX_t​ can be expressed using only current and past shocks—no future information.

This is what you want for real forecasting: you can generate today’s value using past surprises.

7.2 Invertibility (for interpreting shocks)

It is invertible if you can rewrite shocks in terms of observed XX’s:Wt=j=0πjXtjW_t = \sum_{j=0}^{\infty}\pi_j X_{t-j}

Meaning: from the observed series, you can recover the underlying “innovation shocks” in a stable way.

Invertibility is important because many different MA representations can generate the same autocorrelation structure. Invertibility picks a standard, well-behaved representation.


8) The “unit circle” test (the practical criterion)

This is the main operational result:

  • Causal if and only if all roots of $\phi(z)$ lie outside the unit circle: r>1|r|>1.
  • Invertible if and only if all roots of $\theta(z)$ lie outside the unit circle: s>1|s|>1.

Why “outside the unit circle”?

Because when roots are outside z=1|z|=1, certain infinite series expansions converge, which makes:

  • the causal expansion (in past shocks) stable, and
  • the invertible expansion (recovering shocks from data) stable.

If a root is inside the unit circle, the corresponding infinite expansion “blows up” or becomes unstable.


9) Example: determine (p, q), causality, invertibility

Given:Xt=1.1Xt10.07Xt20.147Xt3+Wt+0.5Wt10.84Wt2X_t = 1.1X_{t-1}-0.07X_{t-2}-0.147X_{t-3} + W_t + 0.5W_{t-1}-0.84W_{t-2}

Candidate polynomials:

  • ϕ(z)=11.1z+0.07z2+0.147z3\phi(z)=1-1.1z+0.07z^2+0.147z^3 (degree 3)
  • θ(z)=1+0.5z0.84z2\theta(z)=1+0.5z-0.84z^2 (degree 2)

The computed roots show one common root 1.428571…1.428571… appears in both, so there is redundancy. After removing the common factor, the true degrees become:

  • AR degree 2
  • MA degree 1

So the process is ARMA(2,1).

Causality check

Remaining AR roots: 1.4285711.428571 and 3.333333-3.333333

  • 1.428571>1|1.428571|>1
  • 3.333333>1|-3.333333|>1

Therefore causal.

Invertibility check

Remaining MA root: 0.8333333-0.8333333

  • 0.8333333<1|-0.8333333|<1

Therefore not invertible.

Simplified final equation

After cancellation, the model can be written without redundancy as:Xt=0.4Xt1+0.21Xt2+Wt+1.2Wt1X_t = 0.4X_{t-1} + 0.21X_{t-2} + W_t + 1.2W_{t-1}


10) Practical intuition: what ARMA(p, q) “means” in one sentence

An ARMA(p, q) model says:

  • Today’s value is partly explained by a weighted combination of the last pp values (AR),
  • plus a weighted combination of the last qq surprises (MA),
  • and the model is set up so this structure is stable over time (stationary),
  • and, ideally, usable in the real world (causal) and uniquely interpretable (invertible).