Example: birthdays and birthdates

1. Data and goal

The dataset contains daily counts of births in the United States from 1969 to 1988. For each calendar day in this 20-year period, there is the total number of births.

The goal is to decompose the observed time series into interpretable components, such as:

slow long-term trends over decades,
shorter-term non-periodic variation,
weekly (day-of-week) patterns,
yearly seasonal patterns,
special-day effects (e.g., Halloween, Christmas, Valentine’s Day),
unstructured noise.

Gaussian processes are used as flexible building blocks to represent these components at different time scales.

2. First additive Gaussian process model

Let $t$ t index days, starting from $t = 1$ on 1 January 1969. Let $y_t$ be the normalized number of births on day t (mean 0, standard deviation 1).

The model decomposes $y_t$ as: $y_t = f_1(t) + f_2(t) + f_3(t) + f_4(t) + f_5(t) + \varepsilon_t,$

where each $f_k$ targets variation at a specific time scale or structure, and $\varepsilon_t$ is residual noise.

2.1 Long-term trend: $f_1(t)$

Purpose: capture slow, multi-year trends in birth counts.
Model: $f_1(t) \sim \text{GP}\big(0,\ k_1\big), \quad k_1(t, t’) = \sigma_1^2 \exp\left(-\frac{|t – t’|^2}{2 l_1^2}\right).$
Interpretation:
- $\sigma_1^2$ : variance (amplitude) of long-term fluctuations.
- $l_1$ : length-scale controlling how quickly the long-term trend can change over time (large $l_1$ = very smooth).

2.2 Faster non-periodic variation: $f_2(t)$

Purpose: capture medium-scale, non-periodic fluctuations, faster than the long-term trend but still smooth.
Model: $f_2(t) \sim \text{GP}\big(0,\ k_2\big), \quad k_2(t, t’) = \sigma_2^2 \exp\left(-\frac{|t – t’|^2}{2 l_2^2}\right).$
Interpretation:
- Same kernel form as $f_1$ , but with different amplitude $\sigma_2$ and length-scale $l_2$ , allowing a shorter time scale of variation.

2.3 Weekly pattern that can change over time: $f_3(t)$

Purpose: represent day-of-week effects (Monday vs Tuesday vs weekend) that can slowly evolve over the years.
Model: $f_3(t) \sim \text{GP}\big(0,\ k_3\big),$ with kernel $k_3(t, t’) = \sigma_3^2 \exp\left( – \frac{2 \sin^2\big(\pi (t – t’)/7\big)}{l_{3,1}^2} \right) \exp\left( – \frac{|t – t’|^2}{2 l_{3,2}^2} \right).$
Interpretation:
- The periodic term with period 7 days captures weekly patterns.
- The squared exponential term in time allows the weekly pattern to change slowly over years (for example, day-of-week scheduling practices changing over the decades).

2.4 Yearly seasonal pattern: $f_4(t)$

Purpose: model within-year seasonal structure (e.g., fewer births around certain times of year, more in others).
To align with the calendar, define $s(t) = t \mod 365.25,$ representing day-of-year on a continuous scale.
Model: $f_4(t) \sim \text{GP}\big(0,\ k_4\big),$ with kernel $k_4(s, s’) = \sigma_4^2 \exp\left( – \frac{2 \sin^2\big(\pi (s – s’)/365.25\big)}{l_{4,1}^2} \right) \exp\left( – \frac{|s – s’|^2}{2 l_{4,2}^2} \right).$
Interpretation:
- The periodic term with period 365.25 days captures repeating annual patterns.
- The additional squared exponential term allows slow evolution of the seasonal pattern over years.

2.5 Special-day effects and weekend interactions: $f_5(t)$

Purpose: capture specific effects on certain holidays and special dates, possibly different on weekdays vs weekends.
Define:
- Ispecial day(t)I_{\text{special day}}(t): a row vector of 13 indicator variables, one for each chosen special day or range of days:
  - New Year’s Day, Valentine’s Day, Leap Day, April 1st, Independence Day, Halloween, Christmas, the days between Christmas and New Year’s, etc.
- $I_{\text{weekend}}(t)$ : indicator = 1 if day t is Saturday or Sunday; 0 otherwise.
Model: f5(t)=Ispecial day(t) βa+Iweekend(t) Ispecial day(t) βb,f_5(t) = I_{\text{special day}}(t)\, \beta_a + I_{\text{weekend}}(t)\, I_{\text{special day}}(t)\, \beta_b,where
- $\beta_a$ is a length-13 vector for special-day effects on weekdays,
- $\beta_b$ is a length-13 vector for additional shift when the same special days fall on weekends.
Interpretation:
- Allows, for example, “Halloween on a weekday” vs “Halloween on a Saturday” to have different impacts.

2.6 Residual noise

$\varepsilon_t \sim N(0, \sigma^2)$ represents remaining unstructured variation not captured by the smoother components.

2.7 Priors and inference

Time-scale parameters $l$ receive weakly informative log–t priors to help identifiability (encouraging reasonable scales and avoiding pathological extremes).
Other hyperparameters (variances like $\sigma_k^2$ and observation noise σ) receive log-uniform priors.
The daily birth counts $y_t$ are normalized (mean 0, SD 1) to improve numerical stability and prior specification.

Because the sum $f_1 + f_2 + f_3 + f_4 + f_5$ of Gaussian processes and a linear term is still Gaussian, the overall model for y is a single GP with covariance $k(t, t’) = k_1(t, t’) + k_2(t, t’) + k_3(t, t’) + k_4(t, t’) + k_5(t, t’).$

For fixed hyperparameters θ, the Gaussian marginal likelihood of y has the standard GP form, and its gradients with respect to θ can be computed analytically. The hyperparameters are then estimated via the posterior mode (marginal likelihood × prior). Given that the number of days $n \approx 20 \times 365.25$ is large but still manageable, using the mode is acceptable, and more expensive full MCMC would be slow.

Central composite design (CCD) integration around the mode produced predictive results essentially indistinguishable from full integration, validating the mode-based approach.

2.8 Extracting component-wise predictions

For any component $f_k$ , the posterior mean at prediction points $\tilde{x}$ can be obtained using the usual GP conditioning formula, but using only the corresponding kernel $K_k$ in the cross-covariance:

Example for the slow trend $f_1$ : $\mathbb{E}[\tilde{f}_1] = K_1(\tilde{x}, x)\,\big(K(x,x) + \sigma^2 I\big)^{-1} y,$

where:

$K_1(\tilde{x}, x)$ is the covariance between new points and training points under kernel $k_1$ ,
$K(x,x)$ is the sum of all kernels $k_1 + k_2 + k_3 + k_4 + k_5$ .

Interpretation of the first model:

The slow trend captures multi-year changes in birth numbers.
The fast non-periodic component describes smoother, medium-term fluctuations.
The weekly component shows strong day-of-week patterns that become more pronounced over time, consistent with increased use of scheduled births.
The seasonal component correlates with factors like temperature or daylight approximately nine months earlier.
The special-day effects confirm patterns such as fewer births on some holidays and more on others, likely driven by elective C-sections and induced labor.

3. Improved model: more flexible special-day and short-scale structure

The first model relies on a small preselected set of special days, which makes it impossible to detect unexpected special-day effects, or patterns such as a “ringing” pattern where births are reduced on a special day and shifted to neighboring days.

Residuals from the first model also display slight autocorrelation, suggesting that a very short time-scale component is missing.

To address these issues, an improved additive GP model is constructed: $y_t = f_1(t) + f_2(t) + f_3(t) + f_4(t) + f_5(t) + f_6(t) + f_7(t) + f_8(t) + \varepsilon_t,$

with components:

$f_1$ : long-term trend (same form as before, squared exponential kernel).
$f_2$ : shorter-term non-periodic variation (same structure as before).
$f_3$ : weekly quasi-periodic pattern (same periodic × SE structure as before).
$f_4$ : yearly smooth seasonal pattern with improved leap-day handling.
$f_5$ : yearly fast-changing day-of-year effect for weekdays.
$f_6$ : yearly fast-changing day-of-year effect for weekends.
$f_7$ : fixed effects for floating holidays whose dates change each year.
$f_8$ : very short-term non-periodic component to capture residual autocorrelation.
$\varepsilon_t$ : Gaussian residual noise as before.

Key differences from the first model:

3.1 Updated yearly periodic structure and leap-day handling

A modified time index $s(t)$ is created so that the effective year length is exactly 365 days, using a 0.5-day adjustment around leap day.
This modification simplifies the periodic structure for yearly effects.

The revised seasonal component: $f_4(t) \sim \text{GP}(0, k_4), \quad k_4(t, t’) = \sigma_4^2 \exp\left( – \frac{2 \sin^2\big(\pi (s – s’) / 365\big)}{l_{4,1}^2} \right) \exp\left( -\frac{|s – s’|^2}{2 l_{4,2}^2} \right).$

3.2 Day-of-year effects separated by weekday vs weekend

To allow every day-of-year to have its own effect, and to differentiate weekdays and weekends:

For weekdays: $f_5(t) \sim \text{GP}(0, k_5), \quad k_5(t, t’) = I_{\text{weekday}}(t, t’) \,\sigma_5^2 \exp\left( – \frac{2 \sin^2\big(\pi (s – s’) / 365\big)}{l_5^2} \right),$ where $I_{\text{weekday}}(t, t’) = 1$ if both t and t′ are weekdays, and 0 otherwise.
For weekends: $f_6(t) \sim \text{GP}(0, k_6), \quad k_6(t, t’) = I_{\text{weekend}}(t, t’) \,\sigma_6^2 \exp\left( – \frac{2 \sin^2\big(\pi (s – s’) / 365\big)}{l_6^2} \right),$ where $I_{\text{weekend}}(t, t’) = 1$ if both t and t′ are Saturday or Sunday.

These components allow different yearly patterns on weekdays and weekends, including strong local deviations such as dips before or after holidays.

3.3 Floating holidays: $f_7(t)$

Certain special days do not occur on a fixed calendar date, but on a specific weekday in a given month (e.g., Thanksgiving, Memorial Day, Labor Day, Leap Day).

Define:
- $I_{\text{special}}(t)$ : row vector of 4 indicator variables, one for each of these floating holidays.
Model: $f_7(t) = I_{\text{special}}(t)\, \beta,$ where β is a 4-dimensional coefficient vector.

3.4 Extra short-term component: $f_8(t)$

To capture very short-range autocorrelation not explained by the other components: $f_8(t) \sim \text{GP}(0, k_8), \quad k_8(t, t’) = \sigma_8^2 \exp\left( – \frac{|t – t’|^2}{2 l_8^2} \right),$

with small length-scale $l_8$ , allowing rapid day-to-day fluctuations.

3.5 Residuals and priors

Residual noise remains $\varepsilon_t \sim N(0, \sigma^2)$ .
Time-scale parameters l again get weakly informative log–t priors.
Other hyperparameters (variances, noise scale) use log-uniform priors.
Birth counts are normalized (mean 0, standard deviation 1) as before.

3.6 Model comparison via leave-one-out cross-validation

Using properties of multivariate Gaussian distributions, leave-one-out (LOO) predictive distributions for each day can be computed efficiently, with similar computational cost to standard GP predictions.

The LOO log pointwise predictive density ($lppd_{\text{loo-cv}}$) serves as a measure of predictive accuracy:

First model: $\text{lppd}_{\text{loo}} = 2074$ ,
Improved model: $\text{lppd}_{\text{loo}} = 2477$ .

The higher value for the improved model indicates substantially better predictive performance, confirming that the added structure (full day-of-year effects, extra short-scale component, refined leap-day handling) captures real signal rather than noise.

3.7 Interpretation of the improved decomposition

In the improved model:

The slow trend and day-of-week effects are essentially unchanged compared to the first model, indicating that these components are robust.
The seasonal component becomes smoother, because patterns like increased births before major holidays or before year-end are now handled by the day-of-year components $f_5$ and $f_6$ .
The day-of-year component clearly reveals systematic reductions and increases around special days (for example, fewer births on the holiday, more births on nearby days), consistent with birth scheduling.
The extra short-scale component $f_8$ absorbs local residual autocorrelation.

The model still could be refined further; for example, it would be natural to constrain the positive and negative local effects around each special day to average approximately to zero (making explicit that babies “missing” from one day must be “moved” to nearby days). However, the improved model already gives a clear and interpretable decomposition of the major patterns at different time scales.

4. Main lessons from the example

Additive Gaussian processes allow a complex time series to be decomposed into interpretable components with different time scales and structures (trend, seasonal, weekly, special days, etc.).
The sum of GPs is still a GP, so standard GP formulas for marginal likelihood and prediction remain valid, even for quite rich additive structures.
It is possible to iteratively improve the model by inspecting residuals and component behavior, then adding targeted GP components (e.g., short-term, day-of-year effects) without losing control over estimation.
Cross-validated predictive performance (LOO) provides a quantitative check that each added component genuinely improves predictive accuracy rather than just overfitting.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Example: birthdays and birthdates

1. Data and goal

2. First additive Gaussian process model

2.1 Long-term trend: $f_1(t)$

2.2 Faster non-periodic variation: $f_2(t)$

2.3 Weekly pattern that can change over time: $f_3(t)$

2.4 Yearly seasonal pattern: $f_4(t)$

2.5 Special-day effects and weekend interactions: $f_5(t)$

2.6 Residual noise

2.7 Priors and inference

2.8 Extracting component-wise predictions

3. Improved model: more flexible special-day and short-scale structure

3.1 Updated yearly periodic structure and leap-day handling

3.2 Day-of-year effects separated by weekday vs weekend

3.3 Floating holidays: $f_7(t)$

3.4 Extra short-term component: $f_8(t)$

3.5 Residuals and priors

3.6 Model comparison via leave-one-out cross-validation

3.7 Interpretation of the improved decomposition

4. Main lessons from the example

Like this:

Related

Leave a ReplyCancel reply

1. Data and goal

2. First additive Gaussian process model

2.1 Long-term trend: f1(t)f_1(t)

2.2 Faster non-periodic variation: f2(t)f_2(t)

2.3 Weekly pattern that can change over time: f3(t)f_3(t)

2.4 Yearly seasonal pattern: f4(t)f_4(t)

2.5 Special-day effects and weekend interactions: f5(t)f_5(t)

2.6 Residual noise

2.7 Priors and inference

2.8 Extracting component-wise predictions

3. Improved model: more flexible special-day and short-scale structure

3.1 Updated yearly periodic structure and leap-day handling

3.2 Day-of-year effects separated by weekday vs weekend

3.3 Floating holidays: f7(t)f_7(t)

3.4 Extra short-term component: f8(t)f_8(t)

3.5 Residuals and priors

3.6 Model comparison via leave-one-out cross-validation

3.7 Interpretation of the improved decomposition

4. Main lessons from the example

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery

2.1 Long-term trend: $f_1(t)$

2.2 Faster non-periodic variation: $f_2(t)$

2.3 Weekly pattern that can change over time: $f_3(t)$

2.4 Yearly seasonal pattern: $f_4(t)$

2.5 Special-day effects and weekend interactions: $f_5(t)$

3.3 Floating holidays: $f_7(t)$

3.4 Extra short-term component: $f_8(t)$