Hierarchical dependence

Motivation

While Dirichlet process (DP) and Dirichlet process mixture (DPM) models are often introduced through density estimation, their primary value lies in relaxing parametric assumptions within hierarchical models. These models allow probability distributions themselves to be treated as random objects, enabling flexible borrowing of information across subjects, groups, and time while maintaining Bayesian coherence.

Nonparametric Error Distributions in Regression

Consider a linear regression model $y_i = X_i \beta + \varepsilon_i, \qquad \varepsilon_i \sim f,$

where the error distribution $f$ is unknown. Classical approaches assume a Gaussian form, while robust alternatives replace this with a Student-t distribution via a scale mixture of normals.

A more flexible approach models the error distribution nonparametrically using a Dirichlet process mixture (DPM). For example, a scale mixture formulation is $\varepsilon_i \mid \phi_i \sim \mathcal{N}(0, \phi_i^{-1}), \qquad \phi_i \sim P, \qquad P \sim \mathrm{DP}(\alpha P_0),$

where $P_0$ is chosen to center the prior on a Student-t distribution. This preserves robustness while allowing deviations from the parametric form. However, the resulting distribution remains unimodal and symmetric.

Greater flexibility is obtained by using a location mixture, $\varepsilon_i \mid \mu_i, \tau \sim \mathcal{N}(\mu_i, \tau^{-1}), \qquad \mu_i \sim P, \qquad P \sim \mathrm{DP}(\alpha P_0),$

with $P_0$ typically Gaussian. This removes unimodality and symmetry constraints entirely.

Nonparametric Distributions for Group-Varying Parameters

In hierarchical models with subject-specific parameters, uncertainty about the distribution of those parameters can be handled nonparametrically. For example, in a one-way ANOVA model, $y_{ij} = \mu_i + \varepsilon_{ij},$

placing a Dirichlet process prior on the distribution of $\mu_i$ , $\mu_i \sim P, \qquad P \sim \mathrm{DP}(\alpha P_0),$

induces a latent clustering structure: $\mu_i = \mu^{*}_{S_i}, \qquad \Pr(S_i = c) = \pi_c,$

with cluster-specific parameters $\mu^{*}_c \sim P_0$ . Subjects are probabilistically grouped into an unknown number of clusters, allowing the data to determine how many distinct latent parameter values are needed.

This approach raises identifiability issues when the number of observations per subject is small, since variability in $\mu_i$ and residual variability may be confounded. These issues can be mitigated by also modeling the residual distribution nonparametrically.

Functional Data Analysis via Dirichlet Processes

Functional observations are modeled as noisy realizations of smooth subject-specific trajectories: $y_{ij} \sim \mathcal{N}(f_i(t_{ij}), \sigma^2).$

Each function is represented using a basis expansion, $f_i(t) = \sum_{h=1}^{H} \theta_{ih} b_h(t),$

with subject-specific coefficient vectors $\theta_i$ . Placing a Dirichlet process prior on the distribution of coefficients, $\theta_i \sim P, \qquad P \sim \mathrm{DP}(\alpha P_0),$

induces functional clustering: $f_i(t) = f^{*}_{S_i}(t), \qquad f^{*}_c(t) = b(t) \theta^{*}_c, \qquad \theta^{*}_c \sim P_0.$

All subjects assigned to cluster $c$ share the same underlying function. By choosing $P_0$ appropriately, cluster-specific basis selection is enabled.

Two common choices for $P_0$ are:

Spike-and-slab priors, which allow exact basis selection via point masses at zero.
Heavy-tailed shrinkage priors (e.g., normal-gamma or Cauchy-type), which encourage small coefficients without forcing exact zeros and allow efficient block updates in MCMC.

Hierarchical Dependence Across Random Probability Measures

Nested Dirichlet Process (NDP)

In the nested Dirichlet process, group-specific distributions $P_j$ are themselves clustered: $P_j \sim \mathrm{DP}(\alpha P_{00}), \qquad P_{00} \sim \mathrm{DP}(\beta P_0).$

This induces clustering of entire distributions, so that for distinct groups $j \neq j’$ , $\Pr(P_j = P_{j’}) = \frac{1}{1+\alpha}.$

Groups assigned to different clusters have completely independent atoms.

Hierarchical Dirichlet Process (HDP)

In contrast, the HDP uses a shared set of global atoms but group-specific weights. As a result, $\Pr(P_j = P_{j’}) = 0,$

even though distributions are related through shared support. The HDP is therefore appropriate when groups should share mixture components but not identical distributions.

Convex Mixtures of Random Probability Measures

An alternative to DP-based dependence is to use convex combinations of random probability measures. For example, $P_c = \pi G_0 + (1-\pi) G_c, \qquad G_c \sim \mathrm{DP}(\alpha G_0), \qquad \pi \sim \mathrm{Beta}(a,b).$

This formulation decomposes group-level variability into a global component and a group-specific deviation, analogous to random-effects models but defined in the space of probability measures. Marginally, $P_c$ does not follow a DP, so this construction falls outside the dependent Dirichlet process (DDP) class.

Dynamic Models for Random Probability Measures

Temporal dependence can be introduced via measure-valued autoregressive models: $P_t = (1-\pi) P_{t-1} + \pi G_t, \qquad G_t \sim \mathrm{DP}(\alpha P_0).$

This represents a random walk in the space of probability measures. A limitation is that atoms introduced early persist indefinitely, though with decreasing weights. This can be addressed by placing an HDP prior on $P_0$ , allowing atoms to reappear and disappear over time.

Summary

Dirichlet processes enable nonparametric modeling of distributions within hierarchical models.
DPMs generalize finite mixtures by allowing the number of components to grow with the data.
NDPs cluster entire distributions; HDPs share atoms across distributions.
Convex mixtures provide a flexible alternative that does not enforce DP marginals.
Functional data and regression models benefit substantially from DP-based priors.
Dynamic extensions allow dependence across time while preserving Bayesian coherence.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Hierarchical dependence

Motivation

Nonparametric Error Distributions in Regression

Nonparametric Distributions for Group-Varying Parameters

Functional Data Analysis via Dirichlet Processes

Hierarchical Dependence Across Random Probability Measures

Nested Dirichlet Process (NDP)

Hierarchical Dirichlet Process (HDP)

Convex Mixtures of Random Probability Measures

Dynamic Models for Random Probability Measures

Summary

Like this:

Related

Leave a ReplyCancel reply

Motivation

Nonparametric Error Distributions in Regression

Nonparametric Distributions for Group-Varying Parameters

Functional Data Analysis via Dirichlet Processes

Hierarchical Dependence Across Random Probability Measures

Nested Dirichlet Process (NDP)

Hierarchical Dirichlet Process (HDP)

Convex Mixtures of Random Probability Measures

Dynamic Models for Random Probability Measures

Summary

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery