Example: population toxicokinetics

1. Goal and context

The objective is to estimate how much of an inhaled toxin (PERC) is metabolized in the liver, as a function of exposure level, for a whole population, not just for a few studied subjects.

PERC (perchloroethylene) is an industrial solvent.
It is inhaled, distributed in the body, and metabolized primarily in the liver.
Cancer risk is believed to depend on the amount metabolized, not just on inhaled concentration.

The analysis uses:

A physiological toxicokinetic model (multi-compartment, differential equations)
A hierarchical population model for parameters across individuals
Informative priors based on physiological and toxicological literature
Bayesian inference to combine prior information and experimental data

Figure shows the final result: curves of “fraction of inhaled PERC that is metabolized” versus exposure concentration in air, for multiple hypothetical individuals drawn from the estimated population distribution.

2. Experimental data

Data come from an experiment with six volunteers:

Each person is exposed to a relatively high concentration of PERC in air for 4 hours.
- This duration is considered long enough for organ concentrations to be approximately at equilibrium.
After exposure ends, PERC concentration in:
- Exhaled air, and
- Blood
  is measured repeatedly for up to 168 hours (one week).
Each individual is also studied at a second exposure level (not shown in the excerpt).

These repeated time–concentration measurements are used to estimate individual-level physiological parameters in the toxicokinetic model.

3. Toxicokinetic model (individual level)

3.1 Compartments and dynamics

The model assumes PERC:

Enters and leaves the body via the lungs (breath).
Is carried by blood flow to four compartments:
1. Well-perfused tissues
2. Poorly perfused tissues
3. Fat
4. Liver (where metabolism occurs)

Each compartment has:

A volume parameter
A blood-flow parameter
A partition coefficient (equilibrium concentration ratio between tissue and blood)

The liver compartment has additional parameters for metabolic rate, typically in some form of saturable kinetics (e.g., Michaelis–Menten).

Concentrations over time in each compartment are determined by a system of first-order differential equations, driven by:

Exposure concentration in air
Physiological parameters
Metabolic parameters

3.2 Parameter vector for each person

For each individual $k$ :

Define a parameter vector $\theta_k = (\theta_{k1}, \dots, \theta_{kL}), \quad L = 15$
These 15 parameters encode all relevant volumes, flows, partition coefficients, and metabolic parameters.

Given $\theta_k$ and input exposure profile, numerical solution of the differential equations provides:

Concentration trajectories in each compartment
Predicted PERC concentrations in exhaled air and blood
Predicted rate and amount of metabolism as a function of time

4. Estimation challenges and role of priors

Concentration–time curves from multi-compartment models often resemble mixtures of decaying exponentials. Such models are ill-conditioned:

Many very different parameter combinations can produce similar curves.
It is difficult or impossible to identify all parameters from data alone.

A physiologically based model solves part of this problem:

Parameters correspond to physically meaningful quantities (e.g., liver volume, fat/blood partition ratio).
Literature provides prior information about plausible values and variation across people.

Because the data alone cannot identify all parameters well, informative and hierarchical priors are essential.

5. Measurement model

5.1 Observed data notation

Let:

$j$ : exposure replication index ( $j = 1,2$ for two exposure levels)
$k$ : individual index ( $k = 1,\dots,6$ )
$m$ : measurement type ( $m=1$ for blood, $m=2$ for exhaled air)
$t$ : time of measurement

Observed concentrations are denoted: $y_{jkmt}$

5.2 Expected values from the toxicokinetic model

For individual $k$ , exposure level $E_j$ , and time $t$ :

The model predicts expected concentration in blood and air through nonlinear functions $g_m(\theta_k, E_j, t), \quad m = 1,2$
These functions are obtained by numerically solving the differential equations with parameters $\theta_k$ and exposure input $E_j$ .

5.3 Error model

Observed measurements differ from model expectations due to measurement and model error:

Work on the log scale.
Assume errors are independent log-normal with:
- Mean 0
- Standard deviation $\sigma_m$ σm on the log scale, possibly different for blood ( $\sigma_1$ ) and exhaled air ( $\sigma_2$ )

Formally: $\log y_{jkmt} \sim N\big(\log g_m(\theta_k, E_j, t), \ \sigma_m^2\big), \quad m = 1,2$

Independent priors are assigned to $\log \sigma_1$ and $\log \sigma_2$ (e.g., uniform or weakly informative).

6. Population (hierarchical) model for parameters

6.1 Constraints on physiological parameters

The 15 parameters $\theta_{kl}$ include some constrained components:

θk2,θk3,θk4,θk5\theta_{k2}, \theta_{k3}, \theta_{k4}, \theta_{k5}: fractions of blood flow to each compartment
- Must sum to 1 for each individual $k$ .
θk6,θk7,θk8\theta_{k6}, \theta_{k7}, \theta_{k8}: scaling coefficients of organ volumes
- Must sum to 0.873 (standard fraction of lean body mass excluding bone) for each $k$ .
- Among these, liver volume $\theta_{k8}$ is small and well studied.

To impose these constraints while maintaining a convenient distribution for sampling, parameters are reparameterized via a set of unconstrained latent variables $\psi_{kl}$ .

6.2 Transformation to unconstrained parameters $\psi$

Define a transformation from $\psi_{kl}$ to $\theta_{kl}$ such that:

Fractions sum to 1 for blood flows.
Volume fractions sum to 0.873.
Other parameters remain positive.

For example (schematically):

For fractional flows $l=2,3,4,5$ : $\theta_{kl} = \frac{e^{\psi_{kl}}}{e^{\psi_{k2}} + e^{\psi_{k3}} + e^{\psi_{k4}} + e^{\psi_{k5}}}$
For volume fractions $l=6,7,8$ , use a similar normalization scaled to 0.873.
For unconstrained positive parameters $l=1,9,\dots,15$ , use $\theta_{kl} = e^{\psi_{kl}}$ .

This ensures constraints on $\theta$ but allows modeling on a normal scale for $\psi$ .

6.3 Population distribution of $\psi$

For each parameter index $l = 1,\dots,15$ :

Individual parameters $\psi_{kl}$ are modeled as: $\psi_{kl} \sim N(\mu_l, \tau_l^2)$
Distribution is truncated at ±3 standard deviations to restrict parameters to physiologically reasonable values.
- Truncation is also used as a diagnostic: if posterior draws accumulate at boundaries, the model and priors may be incompatible with data.

The parameters $\mu_l$ and $\tau_l$ represent, for each physiological parameter:

$\mu_l$ : population mean (on the transformed scale)
$\tau_l$ : population standard deviation (inter-individual variability)

Model assumes independence across $l$ at the population level.
This assumption is checked after fitting by examining correlations among estimated parameters across individuals.

7. Hyperpriors (prior information)

For each parameter index $l$ :

Prior for population mean $\mu_l$ : $\mu_l \sim N(M_l, S_l^2)$
Prior for population variance $\tau_l^2$ : $\tau_l^2 \sim \text{Inv-}\chi^2(\nu_l, \tau_{0l}^2)$

Hyperparameters $M_l, S_l, \tau_{0l}$ and $\nu_l$ are based on physiological and toxicological literature:

Human studies when available
Allometric scaling from animals otherwise
Weakly informative design:
- $S_l$ and $\tau_{0l}$ are chosen large rather than small when literature is ambiguous.
- $\nu_l$ typically set low (e.g., 2) to allow substantial uncertainty.

Example:

Liver weight as fraction of lean body weight:
- Population geometric mean ≈ 0.033
- Both prior uncertainty on the mean and population heterogeneity ≈ 10–20%
- So $M_k = \log 0.033, S_k = \log 1.1, \tau_{0k} = \log 1.1$ .
A Michaelis–Menten parameter (poorly known):
- Population geometric mean ≈ 0.7
- Prior uncertainty up to factor 100 above/below (very wide)
- Population variability believed to be only up to factor 4 across individuals
- So $M_k = \log 0.7, S_k = \log 10, \tau_{0k} = \log 2$ .

This structure separates:

Uncertainty about the typical value in the population ( $\mu_l$ ),
Actual inter-individual variation ( $\tau_l$ ).

8. Joint posterior distribution

Collect parameters:

Individual parameters: $\psi = \{\psi_{kl}\}$
Population means: $\mu = (\mu_1,\dots,\mu_L)$
Population standard deviations: $\tau = (\tau_1,\dots,\tau_L)$
Measurement variances: $\sigma = (\sigma_1,\sigma_2)$

Given data $y$ , exposures $E$ , times $t$ , and hyperparameters $M, S, \tau_0, \nu$ , the joint posterior is (up to normalization): $p(\psi, \mu, \tau^2, \sigma^2 \mid y, E, t, M, S, \tau_0^2, \nu) \propto p(y \mid \psi, E, t, \sigma^2)\, p(\psi \mid \mu, \tau^2)\, p(\mu, \tau^2 \mid M, S, \tau_0^2)\, p(\sigma^2)$

$p(y \mid \psi, E, t, \sigma^2)$ : product of lognormal densities for all measurements, based on $g_m(\theta_k, E_j, t)$ .
$p(\psi \mid \mu, \tau^2)$ : product of truncated normal densities for each $\psi_{kl}$ .
$p(\mu, \tau^2 \mid \cdot)$ : product of normal priors for $\mu_l$ and inverse-chi-square priors for $\tau_l^2$ .
$p(\sigma^2)$ : priors for measurement variances.

Differential equation solutions $g_m$ must be evaluated numerically for each combination of parameters and observation times.

9. Computation (Gibbs + Metropolis)

Sampling scheme:

Use Gibbs sampling at the block level, cycling through:
- $\sigma^2$
- $\tau^2$
- $\mu$
- $\psi_1, \dots, \psi_K$ (individual parameter vectors)
Conditional distributions:
- For components of $\sigma^2$ : inverse-chi-square (or similar)
- For components of $\tau^2$ : inverse-chi-square
- For components of $\mu$ : normal
- For each vector $\psi_k$ : no closed form → Metropolis steps
Metropolis updates for ψk\psi_k:
- Update one person at a time (vector of length 15)
- Use a multivariate normal proposal centered at current $\psi_k$
- Covariance matrix tuned from pilot runs
- Scale to achieve acceptance rate ≈ 0.23
- This reduces computation because only likelihood components for person $k$ need to be recomputed.

Several independent chains (e.g., 5 runs of 50,000 iterations) are started from dispersed initial values:

Initial $\psi_{kl}$ draws from prior
$\mu_l$ set to prior means $M_l$ initially
First update $\sigma$ and $\tau$

Only every 10th iteration is stored due to memory constraints. Convergence is monitored by within- vs between-chain variance diagnostics (e.g., $\hat{R}$ ).

10. Inference for fraction metabolized and population risk

After fitting the model, main quantities of interest:

For each person kk and exposure scenario (e.g., a given constant concentration and exposure duration):
- Fraction of inhaled PERC metabolized, a nonlinear function computed by solving the differential equations with $\theta_k$ .
For the population:
- Distribution of this fraction across individuals.

10.1 Individual-level distributions

For each subject $k$ :

Take posterior draws of $\psi_k$ , transform to $\theta_k$ .
For each draw, run the toxicokinetic model for a given exposure profile (e.g., 50 ppm or 0.001 ppm).
Compute fraction metabolized under that scenario.
The resulting sample forms the posterior distribution of “fraction metabolized” for that individual.

Variation within this distribution reflects parameter uncertainty for person $k$ .

10.2 Population-level distributions

To simulate the distribution in a new individual from the same population:

Draw $(\mu, \tau)$ from the posterior.
Draw $\psi_{\text{new}, l} \sim N(\mu_l, \tau_l^2)$ (with truncation), transform to $\theta_{\text{new}}$ .
Run the model for the exposure scenario.
Compute the fraction metabolized.

Repeating this yields the population distribution of the fraction metabolized, combining:

Posterior uncertainty about population parameters, and
True inter-individual variability.

Example numeric results:

At 50 ppm (high, occupational exposure), the 95% interval for the fraction metabolized in the population is approximately [0.5%, 4.1%].
At 0.001 ppm (low, environmental exposure), the 95% interval is approximately [15%, 58%].

Figure shows posterior distributions for each of the six experimental subjects at these two exposure levels. There are large differences across individuals, e.g., a factor-2 difference between some subjects.

10.3 Dose–response curve

The model is used to compute, for a “typical” individual or for random draws from the population:

The fraction of PERC metabolized per day after long-term inhalation exposure, as a function of exposure level.

Resulting curve:

Constant fraction at very low exposures (linear, unsaturated metabolism).
Saturation begins around 1 ppm, nearly complete by 10 ppm.
At high levels, fraction metabolized decreases with exposure because metabolism hits a maximum rate.

This is the curve shown in the earlier figure.

11. Model checking and external validation

11.1 Internal fit

Posterior simulations are used to check fit:

Compare observed concentrations $y_{jkmt}$ to predicted $g_m(\theta_k, E_j, t)$ .
Examine relative errors (observed / predicted) vs predicted values.
Errors are modest in magnitude, suggesting reasonable fit and no major systematic bias.

11.2 External validation

Model predictions are compared with independent data:

A second experiment on 6 volunteers with lower exposure levels (0.5–9 ppm) and shorter follow-up (up to 50 minutes).
For each exposure and time, simulate from the population distribution $p(\theta \mid \mu, \tau)$ and compute predicted concentration ratios (blood/air).
Observed values are plotted against predictive intervals (95%, 99%) from the model.

Overall:

Model predictions agree reasonably well with external data, even though exposure levels and time scales differ.
The main discrepancy occurs in the first ~15 minutes after starting exposure, where the model assumes instantaneous equilibrium in some compartments, but real physiology requires some time to equilibrate.

12. Key takeaways

The analysis combines five essential elements:

Physiological model
- Encodes real biology and allows prior information from literature.
Population model (hierarchical)
- Distinguishes individual parameters from population distribution.
Informative priors on physiological parameters
- Based on human and animal data, separates typical values from inter-individual variability.
Experimental data (concentration–time curves)
- Provide likelihood information but are insufficient alone.
Bayesian inference with MCMC
- Produces a full posterior for parameters, combining data and prior information.
- Gives direct uncertainty quantification for risk-relevant quantities (e.g., fraction metabolized at environmental exposures).

If any of these components is removed, the problem becomes either statistically or scientifically ill-posed. Together, they yield a coherent and interpretable risk assessment framework for PERC exposure in a human population.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Example: population toxicokinetics

1. Goal and context

2. Experimental data