1. Alternative residual distributions

Gaussian residuals can be replaced with heavier-tailed residual distributions using a scale-mixture representation. A common choice is:

  • Residual model:
    yiN(μ(xi),ϕiσ2)y_i \sim N(\mu(x_i),\, \phi_i \sigma^2)
  • Mixing distribution:
    ϕiInv-gamma(ν/2, ν/2)\phi_i \sim \text{Inv-gamma}(\nu/2,\ \nu/2)

Integrating out ϕi\phi_i​ yields a tνt_\nutν​ distribution for the residuals. Small ν\nu produces heavy tails that reduce the influence of outliers on the posterior of μ(x)\mu(x) without discarding any observations.

Posterior computation remains efficient. A Gibbs sampler can include a step sampling each ϕi\phi_i​ from its inverse-gamma full conditional, followed by modified updates that replace σ2\sigma^2 with ϕiσ2\phi_i\sigma^2. The degrees of freedom ν\nu may be fixed or assigned a prior and updated with a Metropolis–Hastings step.

Example: Chloride concentration

A single observation in the chloride dataset was artificially contaminated by adding noise ten times larger than the estimated residual standard deviation. Under Gaussian residuals, the posterior mean of σ\sigma increased from 0.27 to 0.61, widening posterior intervals but leaving the fitted curve essentially unchanged. When contamination was increased to one hundred times the residual standard deviation, the Gaussian model produced poor results: σ^\hat{\sigma} increased to 22.4 and the fitted curve was pulled toward zero.

Using a tνt_\nu​ residual model with ν=4\nu = 4 yielded estimates close to the uncontaminated Gaussian analysis. The fitted curve shifted only slightly, and posterior intervals increased moderately, demonstrating robustness to extreme outliers.


2. Exponential family outcomes and latent-variable augmentation

For exponential-family responses, a generalized linear model formulation can be used with linear predictor
ηi=wiβ\eta_i = w_i \beta.

Marginal likelihoods are generally unavailable in closed form, and Laplace approximations are commonly used. For categorical probit models, a latent Gaussian response can be introduced. Conditional on these latent variables, full conditional distributions are available in closed form, allowing straightforward Gibbs updates.


3. Multivariate regression surfaces

3.1 The curse of dimensionality

Multivariate nonparametric regression faces two challenges:

  1. Computational growth:
    The number of basis functions required to characterize an unconstrained multivariate regression surface increases rapidly with the number of predictors. Bayesian variable selection or shrinkage can help remove unnecessary terms, but the initial full basis expansion may become prohibitively large.
  2. Data sparsity:
    As the predictor dimension pp increases while the sample size nnn remains fixed, observations become sparse in the predictor space, reducing the ability to estimate μ(x)\mu(x) without strong modeling assumptions or informative priors.

4. Additive models

A practical strategy is to assume an additive structure:μ(x)=μ0+j=1pβj(xj),\mu(x) = \mu_0 + \sum_{j=1}^p \beta_j(x_j),

where each component is expanded in its own basis:βj(xj)=h=1Hjθjhbjh(xj).\beta_j(x_j) = \sum_{h=1}^{H_j} \theta_{jh} \, b_{jh}(x_j).

Basis functions may be B-splines or any predefined set suitable for the predictor. Variable selection or shrinkage priors applied to θjh\theta_{jh}​ operate independently across predictors, allowing irrelevant predictors or unnecessary basis elements to be removed automatically.

In MCMC, the additive structure enables a divide-and-conquer strategy. For each predictor, the regression update conditions on the contributions of the other predictors. Gibbs or Metropolis–within–Gibbs steps can be used to update the block of coefficients {θjh}\{\theta_{jh}\}.

Additive models reduce the effective dimensionality and provide efficient approximations when interactions are weak.


5. Shape-restricted regression

Prior information can impose structural constraints on βj(xj)\beta_j(x_j). A common example is monotonicity, enforced by using splines whose coefficients are constrained to be nonnegative. A prior mixing a point mass at zero with a truncated normal distribution on the coefficients produces functions that may be constant over regions of the predictor space. This reduces bias and accommodates intervals in which the predictor has no effect.

Uncertainty in the locations of flat regions allows inference on thresholds at which the response begins to change.

Example: DDE exposure and preterm birth

A semiparametric probit additive model was fitted:Pr(yi=1θ,xi,zi)=Φ(ziα+f(xi)),\Pr(y_i = 1 \mid \theta, x_i, z_i ) = \Phi\big( z_i \alpha + f(x_i) \big),

where:

  • yiy_i​ indicates preterm birth,
  • ziz_i​ contains maternal covariates,
  • f(x)f(x) is a nondecreasing function of DDE exposure.

A dense set of spline knots was used, with slope parameters constrained to be nonnegative. A latent threshold prior was applied:

  • latent slopes follow a first-order Gaussian random walk,
  • the actual slopes are thresholded versions of the latent slopes, ensuring monotonicity and allowing flat regions.

Posterior inference indicated strong evidence against the global null of no association, and provided an estimated dose threshold at which the risk begins to increase.


6. Tensor-product basis expansions for interactions

Additivity may not hold when predictors interact. A more flexible alternative uses tensor-product bases:μ(x)=h1=1Hhp=1Hθh1hpj=1pbjhj(xj).\mu(x) = \sum_{h_1=1}^H \cdots \sum_{h_p=1}^H \theta_{h_1\cdots h_p} \prod_{j=1}^p b_{jh_j}(x_j).

The coefficient array θ\theta is a pp-way tensor. Its size grows exponentially with pp, but shrinkage or variable-selection priors can set large numbers of elements near zero, effectively collapsing the regression surface to a lower-dimensional structure. Predictors and interactions may be removed through this mechanism. Given the basis functions, the model remains linear in θ\theta, making Gibbs sampling feasible.