Beyond density estimation

1. Nonparametric residual distributions in regression

Density estimation has primarily served as a pedagogical entry point. The main strength of Dirichlet process mixture (DPM) models lies in relaxing parametric assumptions inside hierarchical models, especially for error distributions.

Consider the linear regression model
$y_i = X_i\beta + \varepsilon_i$, with $\varepsilon_i \sim f$.
Standard regression assumes a parametric form for $f$, often Gaussian.

A common robust alternative is the $t$ distribution, represented as a scale mixture:
$\varepsilon_i \sim N(0, \phi_i^{-1}\sigma^2)$ with $\phi_i \sim \text{Gamma}(\nu/2, \nu/2)$.
Although this handles heavy tails, the shape remains restrictive.

A more flexible approach models the error distribution nonparametrically using a Dirichlet process scale mixture:
$\varepsilon_i \sim N(0, \phi_i^{-1})$, $\phi_i \sim P$, $P \sim DP(\alpha P_0)$,
with $P_0$ chosen as $\text{Gamma}(\nu/2, \nu/2)$ so the prior centers on a $t$ distribution.
This approach preserves symmetry and unimodality.

To remove both symmetry and unimodality, a location mixture of Gaussians is used:
$\varepsilon_i \sim N(\mu_i, \tau^{-1})$, $\mu_i \sim P$, $P \sim DP(\alpha P_0)$, $\tau \sim \text{Gamma}(a_\tau, b_\tau)$,
with $P_0 = N(0, \tau^{-1})$.
Regression coefficients $\beta$ are updated separately, using residuals $y_i – X_i\beta$.

2. Nonparametric distributions for group-level parameters

Hierarchical models often assume normally distributed random effects. For example, in a one-factor ANOVA:
$y_{ij} = \mu_i + \varepsilon_{ij}$, with $\mu_i \sim f$ and $\varepsilon_{ij} \sim g$.

Instead of assuming $f$ is Gaussian, one can place a Dirichlet process prior:
$\mu_i \sim P$, $P \sim DP(\alpha P_0)$.

This induces a latent clustering structure:
$\mu_i = \mu^*_{S_i}$ with $\Pr(S_i = h) = \pi_h$,
where ${\pi_h}$ follow a stick-breaking construction.

Subjects sharing the same cluster index $S_i$ share the same underlying parameter $\mu^*_h$.
Posterior means remain subject-specific due to probabilistic clustering.

Identifiability depends on the number of observations per subject.
If $n_i = 1$, subject-level and residual variability cannot be separated.
If $n_i$ is large, the distribution $P$ primarily reflects between-subject heterogeneity.

To reduce confounding, the residual distribution $g$ can also be modeled nonparametrically.
In such fully nonparametric settings, identifiability of the mean requires post-processing.

3. Functional data analysis with Dirichlet processes

Functional data analysis models observations as noisy realizations of subject-specific functions.
Let $y_{ij} \sim N(f_i(t_{ij}), \sigma^2)$.

Each function is expressed via basis expansion:
$f_i(t) = \sum_{h=1}^H \theta_{ih} b_h(t)$, with coefficient vector $\theta_i$.

Instead of assuming $\theta_i$ follows a multivariate normal distribution, a Dirichlet process prior is placed:
$\theta_i \sim P$, $P \sim DP(\alpha P_0)$.

This induces functional clustering:
$f_i(t) = f^*_{S_i}(t)$, where $f_c^{*}(t) = b(t)\theta_c^{*}$ and $\theta_c^{*} \sim P_0$.

All subjects within a cluster share the same functional form.
Flexibility arises through the choice of the base measure $P_0$.

4. Basis selection through the base measure

Two effective strategies exist for handling basis selection:

(a) Spike-and-slab prior
Each coefficient follows
$P_{0h}(\cdot) = \pi_{0h}\delta_0(\cdot) + (1-\pi_{0h})N(0, \psi_h^{-1})$,
with $\pi_{0h} \sim \text{Beta}(a,b)$ and possibly $\psi_h \sim \text{Gamma}(\nu/2,\nu/2)$.

This yields exact zeros in $\theta^*_{ch}$, enabling cluster-specific basis selection.

(b) Heavy-tailed shrinkage prior
$\theta^*{ch} \sim N(0, \psi{ch}^{-1})$, $\psi_{ch} \sim \text{Gamma}(\nu/2,\nu/2)$,
with small $\nu$ (e.g., $\nu=1$), producing Cauchy-like marginal priors.

This approach avoids exact zeros but strongly shrinks irrelevant coefficients toward zero.
Block updates of $\theta^*_c$ improve computational efficiency and mixing.

5. Key takeaways

Dirichlet process mixtures provide a principled way to relax parametric assumptions.
Nonparametric modeling applies to residuals, random effects, and functional coefficients.
Clustering emerges naturally, with uncertainty in the number of clusters handled automatically.
The base measure $P_0$ plays a critical role in controlling flexibility and computational behavior.
Heavy-tailed shrinkage priors often provide a practical balance between flexibility and efficiency.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Beyond density estimation

1. Nonparametric residual distributions in regression

2. Nonparametric distributions for group-level parameters

3. Functional data analysis with Dirichlet processes

4. Basis selection through the base measure

5. Key takeaways

Like this:

Related

Leave a ReplyCancel reply

1. Nonparametric residual distributions in regression

2. Nonparametric distributions for group-level parameters

3. Functional data analysis with Dirichlet processes

4. Basis selection through the base measure

5. Key takeaways

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery