Latent Gaussian process (GP) models extend standard GP regression to cases where the data likelihood is non-Gaussian. Instead of modeling $y$ directly with a Gaussian likelihood, the model introduces a latent function $f$ that has a GP prior. A link function (as in generalized linear models) connects $f$ to the likelihood.

1. Structure of Latent GP Models

Latent Function

A GP prior is placed on the latent function:
$f \sim \text{GP}(0, k)$
The likelihood depends on $f$ through a link function:
$p(y \mid f, \phi)$
where $\phi$ is a likelihood shape parameter.

Optional: Separate GPs for Location and Shape

In more flexible models, one can use:
- GP for the mean (location) $f$
- GP for the shape parameter $\phi$
This allows scale or shape to depend on predictors.

2. Posterior Structure

The conditional posterior of latent $f$ is: $p(f \mid x, y, \theta, \phi) \propto p(y \mid f, \phi)\, p(f \mid x, \theta)$

Sampling from this posterior is challenging due to:

High correlation between $f$, $\theta$, and $\phi$
High-dimensional GP priors

Thus mixing in MCMC is slow, especially for large datasets.

3. GP-Specific MCMC Samplers

To achieve more efficient sampling, special samplers exploit the multivariate Gaussian prior structure:

Elliptical Slice Sampling
Scaled Metropolis–Hastings
Scaled HMC or NUTS

These samplers alternate between:

Sampling latent values $f$
Sampling covariance (kernel) parameters $\theta$ and likelihood parameters $\phi$

However, because of strong coupling between parameters, mixing can remain slow.

4. Gaussian Posterior Approximation (Laplace Approximation)

Since the latent prior is Gaussian and the posterior is often close to Gaussian, a normal approximation is used: $p(f \mid x, y, \theta, \phi) \approx N(\hat f, \Sigma)$

where:

$\hat f$ is the posterior mode
The covariance satisfies:
$\Sigma^{-1} = K(x,x)^{-1} + W$

Here:

$K(x,x)$ = GP prior covariance matrix
$W$ = diagonal matrix with elements:
$W_{ii} = -\frac{d^2}{df_i^2} \log p(y_i \mid f_i, \phi)\Big|_{f_i = \hat f_i}$
(negative second derivative of log-likelihood)

Predictive Distribution

Approximate predictive distribution for a new point $\tilde y_i$ is: $p(\tilde y_i \mid \tilde f_i, \phi) \approx \int p(\tilde y_i \mid \tilde f_i, \phi)\, N(\tilde f_i \mid x, y, \theta, \phi)\, d\tilde f_i$

This integral is usually evaluated by quadrature.

5. Approximate Marginal Likelihood via Laplace’s Method

To estimate hyperparameters $(\theta, \phi)$, the model approximates: $\log p(y \mid x, \theta, \phi) \approx \log p(y \mid \hat f, \phi) – \frac{1}{2}\hat f^\top K^{-1}\hat f – \frac{1}{2}\log |B|$

where: $B = I + W^{1/2} K W^{1/2}$

This allows efficient hyperparameter optimization using gradients.

6. Expectation Propagation (EP) and Variational Inference

When the likelihood is highly skewed (e.g., logistic likelihood), Laplace approximation can be inaccurate.
Alternatives:

Expectation Propagation (EP)

More accurate for skewed likelihoods.
Often slower than Laplace.

Variational Approximation

Sometimes used
Often slower and less accurate than EP for GP models.

Once the latent posterior is approximated, one can compute an approximate marginal posterior: $q(\theta, \phi \mid x, y)$

This can be used for:

Hyperparameter mode finding
Grid-based integration
MCMC over hyperparameters
CCD (Central Composite Design) integration

7. Example: Leukemia Survival Times (AML dataset)

GPs are used to model nonlinear predictor effects and interactions in a real survival dataset from 1043 adult AML cases (1982–1998, UK).
Approximately 16% are right-censored.

Predictors

Age
Sex
White Blood Cell Count (WBC) at diagnosis
- Highly skewed and positive → log-transformed
Townsend deprivation index (measure of socioeconomic deprivation)

All continuous predictors and log-survival times are normalized.

7.1 Likelihood: Log-Logistic Model

The log-logistic model performed best among tested survival models (proportional hazards, Weibull, log-Gaussian).

Likelihood for each observation: $p(y_i \mid – ) = y_i^{r y_i – 1} \, \exp(f(X_i))^{1-z_i} \left[1 + y_i^{r}\exp(f(X_i))\right]^{-z_i – 2}$

$r$ = shape parameter
$z_i$ = censoring indicator (0 = observed, 1 = censored)

This model is robust compared to log-Gaussian, especially with skewed data.

7.2 Latent Function Model

The latent function combines linear effects and a GP: $f_i = \alpha + X_i\beta + \mu(X_i)$

where:

$\mu \sim \text{GP}(0, k)$
Kernel: squared exponential $k(x, x’) = \sigma_g^2 \exp\left( -\sum_j \frac{|x_j – x_j’|^2}{2l_j^2} \right)$

7.3 Priors

$\log r$ ~ uniform
$\alpha \sim N(0, \sigma_\alpha^2)$
$\beta \sim N(0, \sigma_\beta^2)$
$\sigma_\alpha^2, \sigma_\beta^2, \sigma_g^2 \sim \text{Inv-}\chi^2(1,1)$
Length-scales $l_j \sim \text{half-Cauchy}(0,1)$

7.4 Inference Approach

Laplace approximation for latent values $f$
CCD integration for hyperparameters
EP gave similar results but was much slower

Cross-Validation Performance

Leave-one-out predictive accuracy:

Linear-only model:
$ \text{lppd}_{\text{loo-cv}} = -1662 $
Model with nonlinear GP components:
$ \text{lppd}_{\text{loo-cv}} = -1629 $

Lower negative value indicates better predictive accuracy, showing clear improvement from nonlinear structure.

7.5 Discovered Patterns

Figure 21.6 showed:

Strong nonlinear effects for predictors
An interaction between WBC and deprivation index (Townsend)
Previous studies did not detect these patterns because:
- They did not log-transform WBC
- They only used additive models (no interactions)
- Spatial component was not helpful once GP nonlinearities were included

The GP model automatically discovered:

Complex nonlinear shapes
Interactions without explicitly specifying them

This demonstrates the flexibility and modeling power of latent GP models.

Summary

Latent Gaussian process models extend GP regression to non-Gaussian likelihoods by introducing a latent function with a GP prior. Because sampling from the posterior is difficult due to high dimensionality and strong correlations, specialized GP samplers or Gaussian approximations (Laplace, EP, variational) are used. These approximations allow efficient hyperparameter inference and prediction.

The leukemia survival example illustrates the power of GPs in uncovering nonlinear predictors and interactions without pre-specification. Nonlinear effects of WBC and its interaction with socioeconomic deprivation were discovered automatically, improving predictive performance over simpler additive models.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Latent Gaussian process models

1. Structure of Latent GP Models

Latent Function

Optional: Separate GPs for Location and Shape

2. Posterior Structure

3. GP-Specific MCMC Samplers

4. Gaussian Posterior Approximation (Laplace Approximation)

Predictive Distribution

5. Approximate Marginal Likelihood via Laplace’s Method

6. Expectation Propagation (EP) and Variational Inference

Expectation Propagation (EP)

Variational Approximation

7. Example: Leukemia Survival Times (AML dataset)

Predictors

7.1 Likelihood: Log-Logistic Model

7.2 Latent Function Model

7.3 Priors

7.4 Inference Approach

Cross-Validation Performance

7.5 Discovered Patterns

Summary

Like this:

Related

Leave a ReplyCancel reply

1. Structure of Latent GP Models

Latent Function

Optional: Separate GPs for Location and Shape

2. Posterior Structure

3. GP-Specific MCMC Samplers

4. Gaussian Posterior Approximation (Laplace Approximation)

Predictive Distribution

5. Approximate Marginal Likelihood via Laplace’s Method

6. Expectation Propagation (EP) and Variational Inference

Expectation Propagation (EP)

Variational Approximation

7. Example: Leukemia Survival Times (AML dataset)

Predictors

7.1 Likelihood: Log-Logistic Model

7.2 Latent Function Model

7.3 Priors

7.4 Inference Approach

Cross-Validation Performance

7.5 Discovered Patterns

Summary

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery