1. Alternative residual distributions
Gaussian residuals can be replaced with heavier-tailed residual distributions using a scale-mixture representation. A common choice is:
- Residual model:
- Mixing distribution:
Integrating out yields a tν distribution for the residuals. Small produces heavy tails that reduce the influence of outliers on the posterior of without discarding any observations.
Posterior computation remains efficient. A Gibbs sampler can include a step sampling each from its inverse-gamma full conditional, followed by modified updates that replace with . The degrees of freedom may be fixed or assigned a prior and updated with a Metropolis–Hastings step.
Example: Chloride concentration
A single observation in the chloride dataset was artificially contaminated by adding noise ten times larger than the estimated residual standard deviation. Under Gaussian residuals, the posterior mean of increased from 0.27 to 0.61, widening posterior intervals but leaving the fitted curve essentially unchanged. When contamination was increased to one hundred times the residual standard deviation, the Gaussian model produced poor results: increased to 22.4 and the fitted curve was pulled toward zero.
Using a residual model with yielded estimates close to the uncontaminated Gaussian analysis. The fitted curve shifted only slightly, and posterior intervals increased moderately, demonstrating robustness to extreme outliers.
2. Exponential family outcomes and latent-variable augmentation
For exponential-family responses, a generalized linear model formulation can be used with linear predictor
.
Marginal likelihoods are generally unavailable in closed form, and Laplace approximations are commonly used. For categorical probit models, a latent Gaussian response can be introduced. Conditional on these latent variables, full conditional distributions are available in closed form, allowing straightforward Gibbs updates.
3. Multivariate regression surfaces
3.1 The curse of dimensionality
Multivariate nonparametric regression faces two challenges:
- Computational growth:
The number of basis functions required to characterize an unconstrained multivariate regression surface increases rapidly with the number of predictors. Bayesian variable selection or shrinkage can help remove unnecessary terms, but the initial full basis expansion may become prohibitively large. - Data sparsity:
As the predictor dimension increases while the sample size n remains fixed, observations become sparse in the predictor space, reducing the ability to estimate without strong modeling assumptions or informative priors.
4. Additive models
A practical strategy is to assume an additive structure:
where each component is expanded in its own basis:
Basis functions may be B-splines or any predefined set suitable for the predictor. Variable selection or shrinkage priors applied to operate independently across predictors, allowing irrelevant predictors or unnecessary basis elements to be removed automatically.
In MCMC, the additive structure enables a divide-and-conquer strategy. For each predictor, the regression update conditions on the contributions of the other predictors. Gibbs or Metropolis–within–Gibbs steps can be used to update the block of coefficients .
Additive models reduce the effective dimensionality and provide efficient approximations when interactions are weak.
5. Shape-restricted regression
Prior information can impose structural constraints on . A common example is monotonicity, enforced by using splines whose coefficients are constrained to be nonnegative. A prior mixing a point mass at zero with a truncated normal distribution on the coefficients produces functions that may be constant over regions of the predictor space. This reduces bias and accommodates intervals in which the predictor has no effect.
Uncertainty in the locations of flat regions allows inference on thresholds at which the response begins to change.
Example: DDE exposure and preterm birth
A semiparametric probit additive model was fitted:
where:
- indicates preterm birth,
- contains maternal covariates,
- is a nondecreasing function of DDE exposure.
A dense set of spline knots was used, with slope parameters constrained to be nonnegative. A latent threshold prior was applied:
- latent slopes follow a first-order Gaussian random walk,
- the actual slopes are thresholded versions of the latent slopes, ensuring monotonicity and allowing flat regions.
Posterior inference indicated strong evidence against the global null of no association, and provided an estimated dose threshold at which the risk begins to increase.
6. Tensor-product basis expansions for interactions
Additivity may not hold when predictors interact. A more flexible alternative uses tensor-product bases:
The coefficient array is a -way tensor. Its size grows exponentially with , but shrinkage or variable-selection priors can set large numbers of elements near zero, effectively collapsing the regression surface to a lower-dimensional structure. Predictors and interactions may be removed through this mechanism. Given the basis functions, the model remains linear in , making Gibbs sampling feasible.
