Latent Gaussian process (GP) models extend standard GP regression to cases where the data likelihood is non-Gaussian. Instead of modeling $y$ directly with a Gaussian likelihood, the model introduces a latent function $f$ that has a GP prior. A link function (as in generalized linear models) connects $f$ to the likelihood.
1. Structure of Latent GP Models
Latent Function
- A GP prior is placed on the latent function:
- The likelihood depends on $f$ through a link function:
where $\phi$ is a likelihood shape parameter.
Optional: Separate GPs for Location and Shape
- In more flexible models, one can use:
- GP for the mean (location) $f$
- GP for the shape parameter $\phi$
- This allows scale or shape to depend on predictors.
2. Posterior Structure
The conditional posterior of latent $f$ is:
Sampling from this posterior is challenging due to:
- High correlation between $f$, $\theta$, and $\phi$
- High-dimensional GP priors
Thus mixing in MCMC is slow, especially for large datasets.
3. GP-Specific MCMC Samplers
To achieve more efficient sampling, special samplers exploit the multivariate Gaussian prior structure:
- Elliptical Slice Sampling
- Scaled Metropolis–Hastings
- Scaled HMC or NUTS
These samplers alternate between:
- Sampling latent values $f$
- Sampling covariance (kernel) parameters $\theta$ and likelihood parameters $\phi$
However, because of strong coupling between parameters, mixing can remain slow.
4. Gaussian Posterior Approximation (Laplace Approximation)
Since the latent prior is Gaussian and the posterior is often close to Gaussian, a normal approximation is used:
where:
- $\hat f$ is the posterior mode
- The covariance satisfies:
Here:
- $K(x,x)$ = GP prior covariance matrix
- $W$ = diagonal matrix with elements:
(negative second derivative of log-likelihood)
Predictive Distribution
Approximate predictive distribution for a new point $\tilde y_i$ is:
This integral is usually evaluated by quadrature.
5. Approximate Marginal Likelihood via Laplace’s Method
To estimate hyperparameters $(\theta, \phi)$, the model approximates:
where:
This allows efficient hyperparameter optimization using gradients.
6. Expectation Propagation (EP) and Variational Inference
When the likelihood is highly skewed (e.g., logistic likelihood), Laplace approximation can be inaccurate.
Alternatives:
Expectation Propagation (EP)
- More accurate for skewed likelihoods.
- Often slower than Laplace.
Variational Approximation
- Sometimes used
- Often slower and less accurate than EP for GP models.
Once the latent posterior is approximated, one can compute an approximate marginal posterior:
This can be used for:
- Hyperparameter mode finding
- Grid-based integration
- MCMC over hyperparameters
- CCD (Central Composite Design) integration
7. Example: Leukemia Survival Times (AML dataset)
GPs are used to model nonlinear predictor effects and interactions in a real survival dataset from 1043 adult AML cases (1982–1998, UK).
Approximately 16% are right-censored.
Predictors
- Age
- Sex
- White Blood Cell Count (WBC) at diagnosis
- Highly skewed and positive → log-transformed
- Townsend deprivation index (measure of socioeconomic deprivation)
All continuous predictors and log-survival times are normalized.
7.1 Likelihood: Log-Logistic Model
The log-logistic model performed best among tested survival models (proportional hazards, Weibull, log-Gaussian).
Likelihood for each observation:
- $r$ = shape parameter
- $z_i$ = censoring indicator (0 = observed, 1 = censored)
This model is robust compared to log-Gaussian, especially with skewed data.
7.2 Latent Function Model
The latent function combines linear effects and a GP:
where:
- $\mu \sim \text{GP}(0, k)$
- Kernel: squared exponential
7.3 Priors
- $\log r$ ~ uniform
- $\alpha \sim N(0, \sigma_\alpha^2)$
- $\beta \sim N(0, \sigma_\beta^2)$
- $\sigma_\alpha^2, \sigma_\beta^2, \sigma_g^2 \sim \text{Inv-}\chi^2(1,1)$
- Length-scales $l_j \sim \text{half-Cauchy}(0,1)$
7.4 Inference Approach
- Laplace approximation for latent values $f$
- CCD integration for hyperparameters
- EP gave similar results but was much slower
Cross-Validation Performance
Leave-one-out predictive accuracy:
- Linear-only model:
$ \text{lppd}_{\text{loo-cv}} = -1662 $ - Model with nonlinear GP components:
$ \text{lppd}_{\text{loo-cv}} = -1629 $
Lower negative value indicates better predictive accuracy, showing clear improvement from nonlinear structure.
7.5 Discovered Patterns
Figure 21.6 showed:
- Strong nonlinear effects for predictors
- An interaction between WBC and deprivation index (Townsend)
- Previous studies did not detect these patterns because:
- They did not log-transform WBC
- They only used additive models (no interactions)
- Spatial component was not helpful once GP nonlinearities were included
The GP model automatically discovered:
- Complex nonlinear shapes
- Interactions without explicitly specifying them
This demonstrates the flexibility and modeling power of latent GP models.
Summary
Latent Gaussian process models extend GP regression to non-Gaussian likelihoods by introducing a latent function with a GP prior. Because sampling from the posterior is difficult due to high dimensionality and strong correlations, specialized GP samplers or Gaussian approximations (Laplace, EP, variational) are used. These approximations allow efficient hyperparameter inference and prediction.
The leukemia survival example illustrates the power of GPs in uncovering nonlinear predictors and interactions without pre-specification. Nonlinear effects of WBC and its interaction with socioeconomic deprivation were discovered automatically, improving predictive performance over simpler additive models.
