Functional Data Analysis (FDA) focuses on situations where observations for each subject are not single scalar outcomes or finite-dimensional vectors, but entire functions defined over a continuum of input points.

Examples of such random functions include:

  • A person’s blood pressure curve as a function of age
  • A patient’s body weight trajectory over time
  • Sensor readings recorded continuously across a spatial domain

In theory, each subject has an underlying smooth function $f_i(t)$ defined on a domain $T$, but in practice we observe it only at finitely many noisy measurements.


1. Observation Model for Functional Data

For subject $i$, we observe:

  • Measurement times: $t_{ij} \in T$
  • Observations: $y_{ij}$

Let the underlying function for subject $i$ be $f_i(t)$.

The observed data are modeled as:yijN(fi(tij),σ2)y_{ij} \sim N(f_i(t_{ij}), \sigma^2)

This captures:

  • The assumption that $f_i(t)$ is smooth
  • Measurement noise with variance $\sigma^2$

2. The Main Question in FDA

We want to model the entire collection of functions:{f1,f2,,fn}\{ f_1, f_2, \ldots, f_n \}

in a way that:

  1. Allows flexible shapes for individual functions
  2. Shares information (“borrows strength”) across subjects
    • Subjects with fewer observations should benefit from subjects with more data

Gaussian Processes provide exactly this type of structure.


3. Gaussian Processes for Functional Data

A convenient approach is to use a GP prior on the joint function:

Regression setting

Suppose we observe data:yijN(f(xi,tij),σ2)y_{ij} \sim N(f(x_i, t_{ij}), \sigma^2)

Here:

  • $x_i$ = subject-specific predictors (e.g., age, sex, treatment)
  • $t_{ij}$ = time for the $j$-th measurement

We treat $f(x, t)$ as a smooth function of both subject predictors and time.

GP prior on the functional relationship

We assume:fGP(m,k)f \sim \text{GP}(m, k)

where $m$ is the mean function and $k$ is the covariance kernel.


4. Kernel Construction for Functional Data

A common kernel is the squared-exponential (SE) kernel extended to include time as another dimension:k((x,t),(x,t))=τ2exp(j=1p(xjxj)22lj2(tt)22lp+12)k((x,t),(x’,t’)) = \tau^2 \exp\left( -\sum_{j=1}^{p} \frac{(x_j – x’_j)^2}{2l_j^2} -\frac{(t – t’)^2}{2l_{p+1}^2} \right)

Where:

  • $\tau$ controls the overall function magnitude
  • $l_1, \ldots, l_p$ control smoothness with respect to predictor dimensions
  • $l_{p+1}$ controls smoothness in the time dimension

Interpretation

  • If $(x, t)$ and $(x’, t’)$ are close, then $f(x,t)$ and $f(x’,t’)$ are strongly correlated.
  • If predictors are similar, subjects’ functions will also be similar.
  • If times are close, the function varies smoothly over time.

5. Key Advantages

  1. Natural modeling of curves or trajectories
    GP automatically enforces smoothness in $t$ and predictors.
  2. Handles irregular measurement times
    No need for equally spaced time points.
  3. Automatically borrows information
    Subjects with sparse trajectories are informed by subjects with dense observations.
  4. No additional computation beyond standard GP regression
    Functional data simply enlarges the input space to include time.

6. Final Insight

Functional data analysis becomes conceptually simple with Gaussian Processes:
you treat the time variable just like an additional predictor. The GP prior handles all smoothing and correlation structure without requiring specialized functional methods or basis expansions.