1) What a time series “correlation over time” means

When you observe a sequence over time—such as monthly sales, daily temperatures, or yearly lake levels—you often want to know whether today’s value is related to past values.

Two closely related tools help quantify this:

  • Autocorrelation (ACF): how strongly the series is correlated with itself after shifting by hh time steps (called lag $h$).
  • Partial autocorrelation (PACF): how strongly the series at time tt is related to the series at time tht-h after removing the influence of the intermediate lags t1,t2,,th+1t-1, t-2, \dots, t-h+1.

In practice, you do not know the true (population) ACF/PACF of the data-generating mechanism, so you compute sample versions from the finite dataset you have.


2) Sample ACF: what it is and how it is computed

Assume you observe data x1,x2,,xnx_1, x_2, \dots, x_n​.

Step A: compute the sample mean

xˉ=1nt=1nxt\bar{x}=\frac{1}{n}\sum_{t=1}^{n} x_t

Step B: compute the sample autocovariance at lag hh

For a lag hh (positive or negative, but usually we focus on h0h\ge 0):γ^(h)=1nt=1nh(xtxˉ)(xt+hxˉ)\hat{\gamma}(h) = \frac{1}{n}\sum_{t=1}^{n-|h|} (x_t-\bar{x})(x_{t+h}-\bar{x})

  • This measures how values hh steps apart “move together,” centered around the mean.
  • The sum ends at nhn-|h| so that t+ht+h stays within the observed data range.

Step C: convert autocovariance to autocorrelation

ρ^(h)=γ^(h)γ^(0),h<n\hat{\rho}(h) = \frac{\hat{\gamma}(h)}{\hat{\gamma}(0)}, \quad |h|<n

  • γ^(0)\hat{\gamma}(0) is the sample variance (up to the 1/n1/n scaling), so dividing by it produces a dimensionless correlation between 1-1 and 11 (approximately, in typical cases).

Interpretation:

  • If ρ^(h)\hat{\rho}(h) is large and positive, values hh steps apart tend to be similar.
  • If it is large and negative, high values tend to be followed hh steps later by low values (and vice versa).
  • If it is near 0, the series shows little linear dependence at that lag.

3) Sample PACF: what it is conceptually

Autocorrelation at lag hh can be “indirect.”

Example: If xtx_t depends strongly on xt1x_{t-1}​, and xt1x_{t-1} depends on xt2x_{t-2}​, then xtx_t​ and xt2x_{t-2}​ may look correlated even if there is no direct relationship beyond the chain through lag 1.

Partial autocorrelation at lag $h$ aims to measure the direct relationship between xtx_t​ and xthx_{t-h}​ after accounting for lags 1 through h1h-1.


4) Sample PACF: how it is computed

To compute the sample PACF at lag hh, you fit a linear regression-style relationship:

  • Predict xtx_{t}​ using the previous hh values: xtϕh,1xt1+ϕh,2xt2++ϕh,hxthx_t \approx \phi_{h,1}x_{t-1}+\phi_{h,2}x_{t-2}+\cdots+\phi_{h,h}x_{t-h}
  • The sample partial autocorrelation at lag $h$ is: ϕ^h,h\hat{\phi}_{h,h}​meaning the coefficient on the farthest lag xthx_{t-h}​ when you allow all intermediate lags into the predictor set.

The computation uses a system of equations built from sample autocovariances

The coefficients ϕ^h,1,,ϕ^h,h\hat{\phi}_{h,1},\dots,\hat{\phi}_{h,h}​ solve a Toeplitz linear system (a symmetric matrix whose entries depend on lag differences). In compact matrix form:ϕ^h=Γ^h1γ^h=R^h1ρ^h\hat{\phi}_h = \hat{\Gamma}_h^{-1}\hat{\gamma}_h = \hat{R}_h^{-1}\hat{\rho}_h

Where:

  • ϕ^h=[ϕ^h,1,,ϕ^h,h]\hat{\phi}_h = [\hat{\phi}_{h,1},\dots,\hat{\phi}_{h,h}]^\top
  • γ^h=[γ^(1),,γ^(h)]\hat{\gamma}_h = [\hat{\gamma}(1),\dots,\hat{\gamma}(h)]^\top
  • ρ^h=[ρ^(1),,ρ^(h)]\hat{\rho}_h = [\hat{\rho}(1),\dots,\hat{\rho}(h)]^\top
  • Γ^h\hat{\Gamma}_h​ is the h×hh\times h matrix with entries γ^(ij)\hat{\gamma}(i-j)
  • R^h\hat{R}_h​ is the corresponding h×hh\times h matrix with entries ρ^(ij)\hat{\rho}(i-j)

Practical meaning:
PACF is not computed by a simple single formula at each lag; it is derived from solving a linear algebra problem that “removes” the effects of intermediate lags.


5) Why these sample quantities are treated as random

Even if the underlying process (the “true mechanism”) is fixed, your observed dataset is only one realization—one random outcome—from that mechanism.

So γ^(h)\hat{\gamma}(h), ρ^(h)\hat{\rho}(h), and ϕ^h,h\hat{\phi}_{h,h}​ vary from sample to sample. If you simulated the same process many times, you would get many slightly different sample ACF/PACF curves.


6) Large-sample behavior: what happens when the dataset is long

Assume the data were generated by a standard time-series model such as an ARMA(p, q) process (a broad family that includes AR and MA models).

If:

  • the length nn is very large, and
  • the lag hh you care about is much smaller than nn (written hnh \ll n),

then:

  1. Sample autocovariance and autocorrelation stabilize
  • γ^(h)\hat{\gamma}(h) tends to be close to the true autocovariance γ(h)\gamma(h)
  • ρ^(h)\hat{\rho}(h) tends to be close to the true autocorrelation ρ(h)\rho(h)
  1. Sample PACF stabilizes for pure AR models
    If the true process is a causal AR(p) process, then ϕ^h,h\hat{\phi}_{h,h}​ tends to be close to the true PACF value ϕh,h\phi_{h,h}.
  2. Approximate normality and shrinking variability
    A more refined statement is that the vector (ρ^(1),,ρ^(h))(\hat{\rho}(1),\dots,\hat{\rho}(h)) is approximately multivariate normal around (ρ(1),,ρ(h))(\rho(1),\dots,\rho(h)), and its variance decreases roughly like 1/n1/n.
    So as nn grows, the sample ACF/PACF plots become more stable and less noisy.

Intuition: longer datasets give more repeated evidence of how the process behaves, so the estimated correlations become more reliable.


7) How ACF and PACF help choose a model form (AR vs MA vs ARMA)

A common practical use of ACF and PACF plots is preliminary order selection—forming an initial guess for what kind of ARMA-type model might be reasonable.

The classic heuristics are:

A) MA(q) signature (moving average)

  • ACF cuts off after lag qq (drops to near 0 beyond qq)
  • PACF tails off gradually

So if the sample ACF becomes negligible after lag 2, an MA(2) is a plausible candidate.

B) AR(p) signature (autoregressive)

  • PACF cuts off after lag pp
  • ACF tails off gradually

So if the sample PACF becomes negligible after lag 3 while the ACF decays slowly, an AR(3) is a plausible candidate.

C) Mixed ARMA signature

  • Both ACF and PACF tail off
  • That suggests a mixed ARMA(p, q), but choosing pp and qqq from plots alone is harder and typically needs more systematic selection methods later.

Important caution: These are heuristics, not guarantees—real data is messy, and finite samples can make “cutoffs” look imperfect.


8) Why real data is different from simulated data

Simulated AR/MA/ARMA examples are “clean” because they match the theoretical assumptions exactly.

Real datasets:

  • may include trends, seasonality, structural breaks, outliers, and changing variance,
  • and are rarely generated by a perfect ARMA model.

Still, ARMA models are often useful approximations within a practical tolerance. The guiding mindset is: a model can be imperfect and still be valuable for understanding and forecasting.


9) Concrete example idea: LakeHuron

A real dataset (annual lake levels) is used to illustrate that:

  • the ACF/PACF may suggest something like an AR(2),
  • but other nearby choices (like AR(1)) might be competitive and simpler.

This highlights a common modeling tradeoff:

  • fit versus complexity
    A simpler model may be preferred if it performs nearly as well.

10) Practical takeaway

  • Sample ACF summarizes “overall” correlation patterns across lags.
  • Sample PACF isolates “direct” lag relationships after accounting for intermediates.
  • With a sufficiently long dataset, these estimates become more stable and closer to the underlying truth (when an ARMA-style approximation is reasonable).
  • The shapes of ACF/PACF plots provide a strong initial signal for whether an AR, MA, or mixed ARMA model is a reasonable starting point, and for guessing the order in the pure AR or pure MA cases.