Sample ACF and Sample PACF

1) What a time series “correlation over time” means

When you observe a sequence over time—such as monthly sales, daily temperatures, or yearly lake levels—you often want to know whether today’s value is related to past values.

Two closely related tools help quantify this:

Autocorrelation (ACF): how strongly the series is correlated with itself after shifting by $h$ time steps (called lag $h$).
Partial autocorrelation (PACF): how strongly the series at time $t$ is related to the series at time $t-h$ after removing the influence of the intermediate lags $t-1, t-2, \dots, t-h+1$ .

In practice, you do not know the true (population) ACF/PACF of the data-generating mechanism, so you compute sample versions from the finite dataset you have.

2) Sample ACF: what it is and how it is computed

Assume you observe data $x_1, x_2, \dots, x_n$ .

Step A: compute the sample mean

$\bar{x}=\frac{1}{n}\sum_{t=1}^{n} x_t$

Step B: compute the sample autocovariance at lag $h$

For a lag $h$ (positive or negative, but usually we focus on $h\ge 0$ ): $\hat{\gamma}(h) = \frac{1}{n}\sum_{t=1}^{n-|h|} (x_t-\bar{x})(x_{t+h}-\bar{x})$

This measures how values $h$ steps apart “move together,” centered around the mean.
The sum ends at $n-|h|$ so that $t+h$ stays within the observed data range.

Step C: convert autocovariance to autocorrelation

$\hat{\rho}(h) = \frac{\hat{\gamma}(h)}{\hat{\gamma}(0)}, \quad |h|<n$

$\hat{\gamma}(0)$ is the sample variance (up to the $1/n$ scaling), so dividing by it produces a dimensionless correlation between $-1$ and $1$ (approximately, in typical cases).

Interpretation:

If $\hat{\rho}(h)$ is large and positive, values $h$ steps apart tend to be similar.
If it is large and negative, high values tend to be followed $h$ steps later by low values (and vice versa).
If it is near 0, the series shows little linear dependence at that lag.

3) Sample PACF: what it is conceptually

Autocorrelation at lag $h$ can be “indirect.”

Example: If $x_t$ depends strongly on $x_{t-1}$ , and $x_{t-1}$ depends on $x_{t-2}$ , then $x_t$ and $x_{t-2}$ may look correlated even if there is no direct relationship beyond the chain through lag 1.

Partial autocorrelation at lag $h$ aims to measure the direct relationship between $x_t$ and $x_{t-h}$ after accounting for lags 1 through $h-1$ .

4) Sample PACF: how it is computed

To compute the sample PACF at lag $h$ , you fit a linear regression-style relationship:

Predict $x_{t}$ using the previous $h$ values: $x_t \approx \phi_{h,1}x_{t-1}+\phi_{h,2}x_{t-2}+\cdots+\phi_{h,h}x_{t-h}$
The sample partial autocorrelation at lag $h$ is: $\hat{\phi}_{h,h}$ meaning the coefficient on the farthest lag $x_{t-h}$ when you allow all intermediate lags into the predictor set.

The computation uses a system of equations built from sample autocovariances

The coefficients $\hat{\phi}_{h,1},\dots,\hat{\phi}_{h,h}$ solve a Toeplitz linear system (a symmetric matrix whose entries depend on lag differences). In compact matrix form: $\hat{\phi}_h = \hat{\Gamma}_h^{-1}\hat{\gamma}_h = \hat{R}_h^{-1}\hat{\rho}_h$

Where:

$\hat{\phi}_h = [\hat{\phi}_{h,1},\dots,\hat{\phi}_{h,h}]^\top$
$\hat{\gamma}_h = [\hat{\gamma}(1),\dots,\hat{\gamma}(h)]^\top$
$\hat{\rho}_h = [\hat{\rho}(1),\dots,\hat{\rho}(h)]^\top$
$\hat{\Gamma}_h$ is the $h\times h$ matrix with entries $\hat{\gamma}(i-j)$
$\hat{R}_h$ is the corresponding $h\times h$ matrix with entries $\hat{\rho}(i-j)$

Practical meaning:
PACF is not computed by a simple single formula at each lag; it is derived from solving a linear algebra problem that “removes” the effects of intermediate lags.

5) Why these sample quantities are treated as random

Even if the underlying process (the “true mechanism”) is fixed, your observed dataset is only one realization—one random outcome—from that mechanism.

So $\hat{\gamma}(h)$ , $\hat{\rho}(h)$ , and $\hat{\phi}_{h,h}$ vary from sample to sample. If you simulated the same process many times, you would get many slightly different sample ACF/PACF curves.

6) Large-sample behavior: what happens when the dataset is long

Assume the data were generated by a standard time-series model such as an ARMA(p, q) process (a broad family that includes AR and MA models).

If:

the length $n$ is very large, and
the lag $h$ you care about is much smaller than $n$ (written $h \ll n$ ),

then:

Sample autocovariance and autocorrelation stabilize

$\hat{\gamma}(h)$ tends to be close to the true autocovariance $\gamma(h)$
$\hat{\rho}(h)$ tends to be close to the true autocorrelation $\rho(h)$

Sample PACF stabilizes for pure AR models
If the true process is a causal AR(p) process, then $\hat{\phi}_{h,h}$ tends to be close to the true PACF value $\phi_{h,h}$ .
Approximate normality and shrinking variability
A more refined statement is that the vector $(\hat{\rho}(1),\dots,\hat{\rho}(h))$ is approximately multivariate normal around $(\rho(1),\dots,\rho(h))$ , and its variance decreases roughly like $1/n$ .
So as $n$ grows, the sample ACF/PACF plots become more stable and less noisy.

Intuition: longer datasets give more repeated evidence of how the process behaves, so the estimated correlations become more reliable.

7) How ACF and PACF help choose a model form (AR vs MA vs ARMA)

A common practical use of ACF and PACF plots is preliminary order selection—forming an initial guess for what kind of ARMA-type model might be reasonable.

The classic heuristics are:

A) MA(q) signature (moving average)

ACF cuts off after lag $q$ (drops to near 0 beyond $q$ )
PACF tails off gradually

So if the sample ACF becomes negligible after lag 2, an MA(2) is a plausible candidate.

B) AR(p) signature (autoregressive)

PACF cuts off after lag $p$
ACF tails off gradually

So if the sample PACF becomes negligible after lag 3 while the ACF decays slowly, an AR(3) is a plausible candidate.

C) Mixed ARMA signature

Both ACF and PACF tail off
That suggests a mixed ARMA(p, q), but choosing $p$ and $q$ q from plots alone is harder and typically needs more systematic selection methods later.

Important caution: These are heuristics, not guarantees—real data is messy, and finite samples can make “cutoffs” look imperfect.

8) Why real data is different from simulated data

Simulated AR/MA/ARMA examples are “clean” because they match the theoretical assumptions exactly.

Real datasets:

may include trends, seasonality, structural breaks, outliers, and changing variance,
and are rarely generated by a perfect ARMA model.

Still, ARMA models are often useful approximations within a practical tolerance. The guiding mindset is: a model can be imperfect and still be valuable for understanding and forecasting.

9) Concrete example idea: LakeHuron

A real dataset (annual lake levels) is used to illustrate that:

the ACF/PACF may suggest something like an AR(2),
but other nearby choices (like AR(1)) might be competitive and simpler.

This highlights a common modeling tradeoff:

fit versus complexity
A simpler model may be preferred if it performs nearly as well.

10) Practical takeaway

Sample ACF summarizes “overall” correlation patterns across lags.
Sample PACF isolates “direct” lag relationships after accounting for intermediates.
With a sufficiently long dataset, these estimates become more stable and closer to the underlying truth (when an ARMA-style approximation is reasonable).
The shapes of ACF/PACF plots provide a strong initial signal for whether an AR, MA, or mixed ARMA model is a reasonable starting point, and for guessing the order in the pure AR or pure MA cases.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Sample ACF and Sample PACF

1) What a time series “correlation over time” means

2) Sample ACF: what it is and how it is computed

Step A: compute the sample mean

Step B: compute the sample autocovariance at lag $h$

Step C: convert autocovariance to autocorrelation

3) Sample PACF: what it is conceptually

4) Sample PACF: how it is computed

The computation uses a system of equations built from sample autocovariances

5) Why these sample quantities are treated as random

6) Large-sample behavior: what happens when the dataset is long

7) How ACF and PACF help choose a model form (AR vs MA vs ARMA)

A) MA(q) signature (moving average)

B) AR(p) signature (autoregressive)

C) Mixed ARMA signature

8) Why real data is different from simulated data

9) Concrete example idea: LakeHuron

10) Practical takeaway

Like this:

Related

Leave a ReplyCancel reply

1) What a time series “correlation over time” means

2) Sample ACF: what it is and how it is computed

Step A: compute the sample mean

Step B: compute the sample autocovariance at lag hh

Step C: convert autocovariance to autocorrelation

3) Sample PACF: what it is conceptually

4) Sample PACF: how it is computed

The computation uses a system of equations built from sample autocovariances

5) Why these sample quantities are treated as random

6) Large-sample behavior: what happens when the dataset is long

7) How ACF and PACF help choose a model form (AR vs MA vs ARMA)

A) MA(q) signature (moving average)

B) AR(p) signature (autoregressive)

C) Mixed ARMA signature

8) Why real data is different from simulated data

9) Concrete example idea: LakeHuron

10) Practical takeaway

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery

Step B: compute the sample autocovariance at lag $h$