Functional Data Analysis (FDA) focuses on situations where observations for each subject are not single scalar outcomes or finite-dimensional vectors, but entire functions defined over a continuum of input points.
Examples of such random functions include:
- A person’s blood pressure curve as a function of age
- A patient’s body weight trajectory over time
- Sensor readings recorded continuously across a spatial domain
In theory, each subject has an underlying smooth function $f_i(t)$ defined on a domain $T$, but in practice we observe it only at finitely many noisy measurements.
1. Observation Model for Functional Data
For subject $i$, we observe:
- Measurement times: $t_{ij} \in T$
- Observations: $y_{ij}$
Let the underlying function for subject $i$ be $f_i(t)$.
The observed data are modeled as:
This captures:
- The assumption that $f_i(t)$ is smooth
- Measurement noise with variance $\sigma^2$
2. The Main Question in FDA
We want to model the entire collection of functions:
in a way that:
- Allows flexible shapes for individual functions
- Shares information (“borrows strength”) across subjects
- Subjects with fewer observations should benefit from subjects with more data
Gaussian Processes provide exactly this type of structure.
3. Gaussian Processes for Functional Data
A convenient approach is to use a GP prior on the joint function:
Regression setting
Suppose we observe data:
Here:
- $x_i$ = subject-specific predictors (e.g., age, sex, treatment)
- $t_{ij}$ = time for the $j$-th measurement
We treat $f(x, t)$ as a smooth function of both subject predictors and time.
GP prior on the functional relationship
We assume:
where $m$ is the mean function and $k$ is the covariance kernel.
4. Kernel Construction for Functional Data
A common kernel is the squared-exponential (SE) kernel extended to include time as another dimension:
Where:
- $\tau$ controls the overall function magnitude
- $l_1, \ldots, l_p$ control smoothness with respect to predictor dimensions
- $l_{p+1}$ controls smoothness in the time dimension
Interpretation
- If $(x, t)$ and $(x’, t’)$ are close, then $f(x,t)$ and $f(x’,t’)$ are strongly correlated.
- If predictors are similar, subjects’ functions will also be similar.
- If times are close, the function varies smoothly over time.
5. Key Advantages
- Natural modeling of curves or trajectories
GP automatically enforces smoothness in $t$ and predictors. - Handles irregular measurement times
No need for equally spaced time points. - Automatically borrows information
Subjects with sparse trajectories are informed by subjects with dense observations. - No additional computation beyond standard GP regression
Functional data simply enlarges the input space to include time.
6. Final Insight
Functional data analysis becomes conceptually simple with Gaussian Processes:
you treat the time variable just like an additional predictor. The GP prior handles all smoothing and correlation structure without requiring specialized functional methods or basis expansions.
