Gaussian processes (GPs) can be extended beyond modeling the location or shape of a parametric likelihood. They can also directly define nonparametric density functions or conditional densities, giving far more modeling flexibility. This section describes three major approaches:
- Logistic Gaussian Processes (LGP) for density estimation
- LGP for density regression (conditional density modeling)
- Latent-variable regression models as an alternative GP-based density model
1. Logistic Gaussian Process (LGP) for Density Estimation
1.1 Objective
We want to model a probability density nonparametrically:
- No parametric distribution assumption (e.g., Gaussian, mixture)
- Smooth, flexible density estimated from data
- Automatically normalized and non-negative
Assume observations:
We construct $p(y)$ from a latent GP function $f(y)$.
1.2 Logistic Transformation to Ensure a Valid Density
The LGP approach defines the density as:
This ensures that:
- $p(y) \ge 0$
- $\int p(y),dy = 1$
Here:
- $f \sim \text{GP}(m, k)$
- $m$ is often chosen as the log-density of an elicited parametric distribution, such as a Student’s t
This centers the GP around a reasonable default shape.
1.3 Kernel Choice
A typical covariance function:
- $\tau$ controls overall variation in the density
- $l$ controls smoothness in $y$
1.4 Alternative Compactified Representation
Another form uses a zero-mean GP $W(t)$ defined on $t\in[0,1]$:
Where:
- $g_0$: baseline parametric density
- $G_0$: cumulative distribution of $g_0$
Compactification to $[0,1]$ improves smoothness in tail regions.
1.5 Computational Challenge
The denominator:
is analytically intractable.
Solutions:
- Approximation via finite basis expansion
- Discretization of the domain
- MCMC for $f$ and hyperparameters
- Laplace approximation for $f$ + quadrature for hyperparameters
A major advantage of LGP:
- Posterior over $f$ is unimodal given hyperparameters
- Hence, Laplace approximation works well
1.6 Example: Galaxy Speeds and Lake Acidity
Two classical datasets:
- Galaxy speeds (n=82)
- Lake acidity (n=155)
Model:
- GP with Matérn(ν=5/2) kernel
- GP centered on a Gaussian log-density
- Laplace approximation for $f$
- Hyperparameter posterior mode estimation
- Shape constraint (monotone tails) enforced via rejection sampling
LGP produced more flexible densities than mixture-of-Gaussians.
2. Logistic Gaussian Process for Density Regression
We now generalize LGP to conditional densities:
Define:
The latent function is:
2.1 Covariance Function
A common squared exponential form:
- Predictor dimensions have length-scales $l_1,\dots,l_p$
- Outcome dimension has its own length-scale $l_{p+1}$
Hyperpriors on $l_j$ allow:
- Automatic variable selection
- Adaptively shrinking unimportant predictors
2.2 Compactified Conditional LGP
Another representation:
Where:
- $F(x) = (F_1(x_1),\ldots,F_p(x_p))$ maps predictors to $[-1,1]^p$
- $G_0(y)$ compactifies $y$
- $W$ is a GP over a $(p+1)$-dimensional input
This improves tail behavior and stabilizes computation.
2.3 Computational Issues
Density regression involves a GP over $(x,y)$, which may be high-dimensional.
Requires:
- Approximation to the GP
- Careful finite representation to control computational cost
Methods:
- Laplace approximation
- MCMC
- Grid-based integration for $y$
3. Latent-Variable Regression Model (Alternative to LGP)
A newer, simpler alternative models densities via a latent uniform variable:
Model
Where:
- $\mu : [0,1]\to\mathbb{R}$ is a smooth function
- Prior: $\mu \sim \text{GP}(\mu_0, k)$
This induces a nonparametric prior over densities $f$ of $y_i$.
3.1 Why This Is Flexible
Any density $f_0$ can be generated via:
- Draw $u_i \sim U(0,1)$
- Let $y_i = F_0^{-1}(u_i)$
Thus, if $\mu(u)$ is flexible and $\sigma^2$ can shrink near 0, the model can approximate any density.
Centering:
- $\mu_0 = G_0^{-1}$
where $G_0$ is the CDF of a baseline guess density $g_0$
helps in practice.
3.2 Computational Method
With the latent $u_i$:
- Conditional on $u_i$, the model becomes standard GP regression
- $\mu$ is updated at unique $u_i$ values from a multivariate Gaussian conditional
- $u_i$ can be updated using data augmentation
- Approximating $U(0,1)$ by a grid enables:
- Conjugate updating
- Reduced computational burden
4. Extension to Density Regression
Generalize by adding predictors:
Here:
- $\mu$ is a GP over $(u,x)$
- Kernel uses different length-scales for each dimension (including $u$)
This allows:
- Automatic relevance determination for predictors
- Flexible conditional densities $p(y\mid x)$
Summary
LGP Approach
- Directly models log-density via a GP
- Ensures valid density by logistic transformation
- Flexible and smooth
- Unimodal posterior → Laplace approximation works well
- More flexible than mixture models
LGP for Density Regression
- Extends LGP to model $p(y\mid x)$
- Requires careful computation because of high dimensional GP over $(x,y)$
Latent-Variable Regression
- Much simpler computationally
- Uses GP regression with latent uniform variables
- Induces flexible density or conditional density
- Easy to extend to predictors
