Gaussian processes (GPs) can be extended beyond modeling the location or shape of a parametric likelihood. They can also directly define nonparametric density functions or conditional densities, giving far more modeling flexibility. This section describes three major approaches:

  1. Logistic Gaussian Processes (LGP) for density estimation
  2. LGP for density regression (conditional density modeling)
  3. Latent-variable regression models as an alternative GP-based density model

1. Logistic Gaussian Process (LGP) for Density Estimation

1.1 Objective

We want to model a probability density nonparametrically:

  • No parametric distribution assumption (e.g., Gaussian, mixture)
  • Smooth, flexible density estimated from data
  • Automatically normalized and non-negative

Assume observations:yiiidpy_i \stackrel{\text{iid}}{\sim} p

We construct $p(y)$ from a latent GP function $f(y)$.


1.2 Logistic Transformation to Ensure a Valid Density

The LGP approach defines the density as:p(yf)=ef(y)ef(y)dyp(y \mid f) = \frac{e^{f(y)}}{\int e^{f(y’)} \, dy’}

This ensures that:

  1. $p(y) \ge 0$
  2. $\int p(y),dy = 1$

Here:

  • $f \sim \text{GP}(m, k)$
  • $m$ is often chosen as the log-density of an elicited parametric distribution, such as a Student’s t

This centers the GP around a reasonable default shape.


1.3 Kernel Choice

A typical covariance function:k(y,y)=τ2exp((yy)22l2)k(y, y’) = \tau^2 \exp\left(-\frac{(y-y’)^2}{2l^2}\right)

  • $\tau$ controls overall variation in the density
  • $l$ controls smoothness in $y$

1.4 Alternative Compactified Representation

Another form uses a zero-mean GP $W(t)$ defined on $t\in[0,1]$:p(y)=g0(y)eW(G0(y))g0(v)eW(G0(v))dvp(y) = \frac{ g_0(y) \, e^{W(G_0(y))}} {\int g_0(v)\, e^{W(G_0(v))}\, dv}

Where:

  • $g_0$: baseline parametric density
  • $G_0$: cumulative distribution of $g_0$

Compactification to $[0,1]$ improves smoothness in tail regions.


1.5 Computational Challenge

The denominator:ef(y)dy\int e^{f(y’)} \, dy’

is analytically intractable.
Solutions:

  • Approximation via finite basis expansion
  • Discretization of the domain
  • MCMC for $f$ and hyperparameters
  • Laplace approximation for $f$ + quadrature for hyperparameters

A major advantage of LGP:

  • Posterior over $f$ is unimodal given hyperparameters
  • Hence, Laplace approximation works well

1.6 Example: Galaxy Speeds and Lake Acidity

Two classical datasets:

  1. Galaxy speeds (n=82)
  2. Lake acidity (n=155)

Model:

  • GP with Matérn(ν=5/2) kernel
  • GP centered on a Gaussian log-density
  • Laplace approximation for $f$
  • Hyperparameter posterior mode estimation
  • Shape constraint (monotone tails) enforced via rejection sampling

LGP produced more flexible densities than mixture-of-Gaussians.


2. Logistic Gaussian Process for Density Regression

We now generalize LGP to conditional densities:p(yx)p(y \mid x)

Define:p(yx)=ef(x,y)ef(x,y)dyp(y \mid x) = \frac{e^{f(x,y)}}{\int e^{f(x,y’)} \, dy’}

The latent function is:fGP(0,k)f \sim \text{GP}(0,k)


2.1 Covariance Function

A common squared exponential form:k((x,y),(x,y))=τ2exp(j=1p(xjxj)22lj2(yy)22lp+12)k((x,y),(x’,y’)) = \tau^2 \exp\left( -\sum_{j=1}^p \frac{(x_j-x’_j)^2}{2l_j^2} -\frac{(y-y’)^2}{2l_{p+1}^2} \right)

  • Predictor dimensions have length-scales $l_1,\dots,l_p$
  • Outcome dimension has its own length-scale $l_{p+1}$

Hyperpriors on $l_j$ allow:

  • Automatic variable selection
  • Adaptively shrinking unimportant predictors

2.2 Compactified Conditional LGP

Another representation:p(yx)=g0(y)eW(F(x),G0(y))g0(v)eW(F(x),v)dvp(y\mid x) = \frac{ g_0(y)\,e^{W(F(x),G_0(y))}} {\int g_0(v)e^{W(F(x),v)}\,dv}

Where:

  • $F(x) = (F_1(x_1),\ldots,F_p(x_p))$ maps predictors to $[-1,1]^p$
  • $G_0(y)$ compactifies $y$
  • $W$ is a GP over a $(p+1)$-dimensional input

This improves tail behavior and stabilizes computation.


2.3 Computational Issues

Density regression involves a GP over $(x,y)$, which may be high-dimensional.
Requires:

  • Approximation to the GP
  • Careful finite representation to control computational cost

Methods:

  • Laplace approximation
  • MCMC
  • Grid-based integration for $y$

3. Latent-Variable Regression Model (Alternative to LGP)

A newer, simpler alternative models densities via a latent uniform variable:

Model

yiN(μ(ui),σ2),uiUniform(0,1)y_i \sim N(\mu(u_i), \sigma^2), \quad u_i \sim \text{Uniform}(0,1)

Where:

  • $\mu : [0,1]\to\mathbb{R}$ is a smooth function
  • Prior: $\mu \sim \text{GP}(\mu_0, k)$

This induces a nonparametric prior over densities $f$ of $y_i$.


3.1 Why This Is Flexible

Any density $f_0$ can be generated via:

  1. Draw $u_i \sim U(0,1)$
  2. Let $y_i = F_0^{-1}(u_i)$

Thus, if $\mu(u)$ is flexible and $\sigma^2$ can shrink near 0, the model can approximate any density.

Centering:

  • $\mu_0 = G_0^{-1}$
    where $G_0$ is the CDF of a baseline guess density $g_0$

helps in practice.


3.2 Computational Method

With the latent $u_i$:

  • Conditional on $u_i$, the model becomes standard GP regression
  • $\mu$ is updated at unique $u_i$ values from a multivariate Gaussian conditional
  • $u_i$ can be updated using data augmentation
  • Approximating $U(0,1)$ by a grid enables:
    • Conjugate updating
    • Reduced computational burden

4. Extension to Density Regression

Generalize by adding predictors:yiN(μ(ui,xi),σ2),uiU(0,1)y_i \sim N(\mu(u_i, x_i), \sigma^2), \quad u_i \sim U(0,1)

Here:

  • $\mu$ is a GP over $(u,x)$
  • Kernel uses different length-scales for each dimension (including $u$)

This allows:

  • Automatic relevance determination for predictors
  • Flexible conditional densities $p(y\mid x)$

Summary

LGP Approach

  • Directly models log-density via a GP
  • Ensures valid density by logistic transformation
  • Flexible and smooth
  • Unimodal posterior → Laplace approximation works well
  • More flexible than mixture models

LGP for Density Regression

  • Extends LGP to model $p(y\mid x)$
  • Requires careful computation because of high dimensional GP over $(x,y)$

Latent-Variable Regression

  • Much simpler computationally
  • Uses GP regression with latent uniform variables
  • Induces flexible density or conditional density
  • Easy to extend to predictors