Motivation
While Dirichlet process (DP) and Dirichlet process mixture (DPM) models are often introduced through density estimation, their primary value lies in relaxing parametric assumptions within hierarchical models. These models allow probability distributions themselves to be treated as random objects, enabling flexible borrowing of information across subjects, groups, and time while maintaining Bayesian coherence.
Nonparametric Error Distributions in Regression
Consider a linear regression model
where the error distribution is unknown. Classical approaches assume a Gaussian form, while robust alternatives replace this with a Student-t distribution via a scale mixture of normals.
A more flexible approach models the error distribution nonparametrically using a Dirichlet process mixture (DPM). For example, a scale mixture formulation is
where is chosen to center the prior on a Student-t distribution. This preserves robustness while allowing deviations from the parametric form. However, the resulting distribution remains unimodal and symmetric.
Greater flexibility is obtained by using a location mixture,
with typically Gaussian. This removes unimodality and symmetry constraints entirely.
Nonparametric Distributions for Group-Varying Parameters
In hierarchical models with subject-specific parameters, uncertainty about the distribution of those parameters can be handled nonparametrically. For example, in a one-way ANOVA model,
placing a Dirichlet process prior on the distribution of ,
induces a latent clustering structure:
with cluster-specific parameters . Subjects are probabilistically grouped into an unknown number of clusters, allowing the data to determine how many distinct latent parameter values are needed.
This approach raises identifiability issues when the number of observations per subject is small, since variability in and residual variability may be confounded. These issues can be mitigated by also modeling the residual distribution nonparametrically.
Functional Data Analysis via Dirichlet Processes
Functional observations are modeled as noisy realizations of smooth subject-specific trajectories:
Each function is represented using a basis expansion,
with subject-specific coefficient vectors . Placing a Dirichlet process prior on the distribution of coefficients,
induces functional clustering:
All subjects assigned to cluster share the same underlying function. By choosing appropriately, cluster-specific basis selection is enabled.
Two common choices for are:
- Spike-and-slab priors, which allow exact basis selection via point masses at zero.
- Heavy-tailed shrinkage priors (e.g., normal-gamma or Cauchy-type), which encourage small coefficients without forcing exact zeros and allow efficient block updates in MCMC.
Hierarchical Dependence Across Random Probability Measures
Nested Dirichlet Process (NDP)
In the nested Dirichlet process, group-specific distributions are themselves clustered:
This induces clustering of entire distributions, so that for distinct groups ,
Groups assigned to different clusters have completely independent atoms.
Hierarchical Dirichlet Process (HDP)
In contrast, the HDP uses a shared set of global atoms but group-specific weights. As a result,
even though distributions are related through shared support. The HDP is therefore appropriate when groups should share mixture components but not identical distributions.
Convex Mixtures of Random Probability Measures
An alternative to DP-based dependence is to use convex combinations of random probability measures. For example,
This formulation decomposes group-level variability into a global component and a group-specific deviation, analogous to random-effects models but defined in the space of probability measures. Marginally, does not follow a DP, so this construction falls outside the dependent Dirichlet process (DDP) class.
Dynamic Models for Random Probability Measures
Temporal dependence can be introduced via measure-valued autoregressive models:
This represents a random walk in the space of probability measures. A limitation is that atoms introduced early persist indefinitely, though with decreasing weights. This can be addressed by placing an HDP prior on , allowing atoms to reappear and disappear over time.
Summary
- Dirichlet processes enable nonparametric modeling of distributions within hierarchical models.
- DPMs generalize finite mixtures by allowing the number of components to grow with the data.
- NDPs cluster entire distributions; HDPs share atoms across distributions.
- Convex mixtures provide a flexible alternative that does not enforce DP marginals.
- Functional data and regression models benefit substantially from DP-based priors.
- Dynamic extensions allow dependence across time while preserving Bayesian coherence.
