1) The practical problem being solved
After looking at a time series and deciding that an AR model might be a reasonable approximation, you still need to choose the numerical coefficients of that model.
For example, if you decide to try an AR(2) model, you are proposing that the value today depends linearly on the previous two values, plus an unpredictable shock:
- : the underlying time series model you want to build
- : the coefficients you must estimate from data
- : “new information” or shock at time (often called white noise), with variance
The key question is: How can we estimate $\phi_1,\dots,\phi_p$ (and also ) using only the observed data ?
The method described here produces initial estimates that are often good enough to get started and are also useful as inputs to more refined estimation procedures.
2) The method-of-moments idea (why “moments” show up)
A “moment” is a statistical quantity like a mean, variance, or covariance. Covariances are sometimes called “mixed moments” because they involve products of two variables.
The core strategy is:
- Compute sample autocovariances (or sample autocorrelations ) from the observed dataset.
- Pretend—temporarily—that these sample quantities are equal to the model’s true autocovariances (or autocorrelations) for the first few lags.
- Use the mathematical relationships that must hold in an AR(p) model to solve for the unknown coefficients.
This is reasonable because, when the dataset is long enough, sample autocovariances tend to be close to the true autocovariances of the underlying process with high probability.
3) Where the Yule–Walker equations come from (the basic derivation idea)
Assume the series follows an AR(p) model:
To connect the unknown coefficients to observable correlation structure:
- Multiply both sides by for
- Take expectations (long-run averages)
Because is an unpredictable shock, it is uncorrelated with past values for . That makes the equations clean.
This produces a system of equations that ties the autocovariances of the process to the AR coefficients:
For :
For :
…
For :
This family of equations is called the Yule–Walker equations.
4) The matrix form (why it matters)
Instead of writing many equations line by line, the system can be written compactly:
Where:
- is the coefficient vector.
- is a matrix built from autocovariances:
This matrix has a special “lag structure” (constant diagonals), which makes it a standard object in time series.
You can also divide everything by and express the same relationship in terms of correlations:
Where is the same kind of matrix but using autocorrelations instead of autocovariances .
5) Solving “backwards”: from ACF to AR coefficients
Often you already have estimates of autocorrelation from data, and you want coefficients.
AR(2) example logic
For an AR(2) model, the correlation-based system is:
So if someone tells you and , you solve:
Solving yields , .
This demonstrates the key point:
- Knowing a few early-lag correlations is enough to determine AR coefficients, assuming the AR order is known and the model assumptions apply.
6) Estimating the shock variance
An AR model does not just have coefficients ; it also needs the variance of the noise term .
A standard identity for an AR(p) model is:
This is extremely useful because it converts:
- the series variance ,
- the correlations ,
- and the AR coefficients
into the noise variance .
So once you estimate and from data, you can estimate as well.
7) Turning the theory into a data-based estimator (Yule–Walker estimators)
In real data, you do not know or . You estimate them from the sample:
- sample autocovariance
- sample autocorrelation
Then you mimic the theoretical equations:
Solving gives the Yule–Walker estimator:
Then the noise variance estimate is:
These are “method-of-moments” estimators because they match a finite number of covariance/correlation moments.
8) Example 1: Recovering coefficients from a simulated AR(2)
A simulated series is generated from:
A long length is used so that sample correlations are stable.
From the data, the sample autocorrelations are computed:
Then the AR(2) Yule–Walker system is built:
Solving yields:
These are close to the true values and . The difference is normal: even with large , estimates are not exact.
Then the variance of the series is estimated via , and the noise variance estimate is computed:
This is close to 1, which was the simulation default.
What this demonstrates:
- With enough data, the Yule–Walker method can recover AR coefficients reasonably well.
- The noise variance can also be estimated from the same ingredients.
9) Example 2: Fitting an AR(2) model to a real dataset (LakeHuron)
A real dataset is treated as a candidate for an AR(2) approximation.
From the data:
Solving the same AR(2) Yule–Walker system gives:
And estimated series variance:
Then the noise variance estimate:
A critical practical correction: the mean is not zero
An AR(2) model written as:
is naturally a mean-zero model when written in this form (if the noise has mean zero and the process is stationary). But the dataset consists of large positive numbers, with sample mean:
So it is not reasonable to model the raw series as mean-zero.
Why the coefficient estimation still works
Autocovariance and autocorrelation are computed after subtracting the mean (explicitly or implicitly). Adding a constant shifts the entire series but does not change how values co-move around the mean.
So the estimated remain valid for the mean-adjusted series:
Meaning: the AR(2) structure is being fit to the fluctuations around the mean, not to the absolute level.
Converting back to a model for the original series
Define a model for the original scale by:
Then , as desired.
Algebraically, this leads to an AR(2) model with an intercept (constant term):
where the intercept is computed by:
Using the provided numbers:
So the final fitted model is:
with .
Interpretation of this fitted model:
- The series has strong persistence because the coefficient on is large.
- The negative coefficient on introduces a corrective effect that can produce oscillation or damping behavior, depending on the combination of coefficients.
- The noise variance indicates how much unpredictable variation remains after accounting for the AR structure.
10) What “preliminary” means here
This approach is intentionally positioned as an initial, fast way to obtain parameters because:
- it relies only on estimated correlations,
- it is computationally simple,
- it often provides reasonable starting values,
but it is not always the best final estimator under all conditions. More systematic methods can refine parameter estimates and compare competing models more rigorously.
11) Key takeaways
- Once you pick an AR order $p$, the early autocovariances/correlations determine the AR coefficients via a linear system.
- Replacing unknown true correlations with sample correlations produces the Yule–Walker estimators.
- You can estimate the noise variance from the same ingredients using a standard identity.
- If the observed series has a nonzero mean, you should include an intercept or model mean-adjusted data, because AR equations in their simplest form are centered around zero.
- These estimates are often used as a first approximation and as inputs to more advanced model-fitting procedures.
