1. Goal and context

The objective is to estimate how much of an inhaled toxin (PERC) is metabolized in the liver, as a function of exposure level, for a whole population, not just for a few studied subjects.

  • PERC (perchloroethylene) is an industrial solvent.
  • It is inhaled, distributed in the body, and metabolized primarily in the liver.
  • Cancer risk is believed to depend on the amount metabolized, not just on inhaled concentration.

The analysis uses:

  • A physiological toxicokinetic model (multi-compartment, differential equations)
  • A hierarchical population model for parameters across individuals
  • Informative priors based on physiological and toxicological literature
  • Bayesian inference to combine prior information and experimental data

Figure shows the final result: curves of “fraction of inhaled PERC that is metabolized” versus exposure concentration in air, for multiple hypothetical individuals drawn from the estimated population distribution.


2. Experimental data

Data come from an experiment with six volunteers:

  • Each person is exposed to a relatively high concentration of PERC in air for 4 hours.
    • This duration is considered long enough for organ concentrations to be approximately at equilibrium.
  • After exposure ends, PERC concentration in:
    • Exhaled air, and
    • Blood
      is measured repeatedly for up to 168 hours (one week).
  • Each individual is also studied at a second exposure level (not shown in the excerpt).

These repeated time–concentration measurements are used to estimate individual-level physiological parameters in the toxicokinetic model.


3. Toxicokinetic model (individual level)

3.1 Compartments and dynamics

The model assumes PERC:

  • Enters and leaves the body via the lungs (breath).
  • Is carried by blood flow to four compartments:
    1. Well-perfused tissues
    2. Poorly perfused tissues
    3. Fat
    4. Liver (where metabolism occurs)

Each compartment has:

  • A volume parameter
  • A blood-flow parameter
  • A partition coefficient (equilibrium concentration ratio between tissue and blood)

The liver compartment has additional parameters for metabolic rate, typically in some form of saturable kinetics (e.g., Michaelis–Menten).

Concentrations over time in each compartment are determined by a system of first-order differential equations, driven by:

  • Exposure concentration in air
  • Physiological parameters
  • Metabolic parameters

3.2 Parameter vector for each person

For each individual kk:

  • Define a parameter vector θk=(θk1,,θkL),L=15\theta_k = (\theta_{k1}, \dots, \theta_{kL}), \quad L = 15
  • These 15 parameters encode all relevant volumes, flows, partition coefficients, and metabolic parameters.

Given θk\theta_k​ and input exposure profile, numerical solution of the differential equations provides:

  • Concentration trajectories in each compartment
  • Predicted PERC concentrations in exhaled air and blood
  • Predicted rate and amount of metabolism as a function of time

4. Estimation challenges and role of priors

Concentration–time curves from multi-compartment models often resemble mixtures of decaying exponentials. Such models are ill-conditioned:

  • Many very different parameter combinations can produce similar curves.
  • It is difficult or impossible to identify all parameters from data alone.

A physiologically based model solves part of this problem:

  • Parameters correspond to physically meaningful quantities (e.g., liver volume, fat/blood partition ratio).
  • Literature provides prior information about plausible values and variation across people.

Because the data alone cannot identify all parameters well, informative and hierarchical priors are essential.


5. Measurement model

5.1 Observed data notation

Let:

  • jj: exposure replication index (j=1,2j = 1,2 for two exposure levels)
  • kk: individual index (k=1,,6k = 1,\dots,6)
  • mm: measurement type (m=1m=1 for blood, m=2m=2 for exhaled air)
  • tt: time of measurement

Observed concentrations are denoted:yjkmty_{jkmt}

5.2 Expected values from the toxicokinetic model

For individual kk, exposure level EjE_j, and time tt:

  • The model predicts expected concentration in blood and air through nonlinear functions gm(θk,Ej,t),m=1,2g_m(\theta_k, E_j, t), \quad m = 1,2
  • These functions are obtained by numerically solving the differential equations with parameters θk\theta_k​ and exposure input EjE_j.

5.3 Error model

Observed measurements differ from model expectations due to measurement and model error:

  • Work on the log scale.
  • Assume errors are independent log-normal with:
    • Mean 0
    • Standard deviation σm\sigma_mσm​ on the log scale, possibly different for blood (σ1\sigma_1​) and exhaled air (σ2\sigma_2)

Formally:logyjkmtN(loggm(θk,Ej,t), σm2),m=1,2\log y_{jkmt} \sim N\big(\log g_m(\theta_k, E_j, t), \ \sigma_m^2\big), \quad m = 1,2

Independent priors are assigned to logσ1\log \sigma_1 and logσ2\log \sigma_2​ (e.g., uniform or weakly informative).


6. Population (hierarchical) model for parameters

6.1 Constraints on physiological parameters

The 15 parameters θkl\theta_{kl}​ include some constrained components:

  • θk2,θk3,θk4,θk5\theta_{k2}, \theta_{k3}, \theta_{k4}, \theta_{k5}: fractions of blood flow to each compartment
    • Must sum to 1 for each individual kk.
  • θk6,θk7,θk8\theta_{k6}, \theta_{k7}, \theta_{k8}​: scaling coefficients of organ volumes
    • Must sum to 0.873 (standard fraction of lean body mass excluding bone) for each kk.
    • Among these, liver volume θk8\theta_{k8}​ is small and well studied.

To impose these constraints while maintaining a convenient distribution for sampling, parameters are reparameterized via a set of unconstrained latent variables ψkl\psi_{kl}.

6.2 Transformation to unconstrained parameters ψ\psi

Define a transformation from ψkl\psi_{kl} to θkl\theta_{kl} such that:

  • Fractions sum to 1 for blood flows.
  • Volume fractions sum to 0.873.
  • Other parameters remain positive.

For example (schematically):

  • For fractional flows l=2,3,4,5l=2,3,4,5: θkl=eψkleψk2+eψk3+eψk4+eψk5\theta_{kl} = \frac{e^{\psi_{kl}}}{e^{\psi_{k2}} + e^{\psi_{k3}} + e^{\psi_{k4}} + e^{\psi_{k5}}}
  • For volume fractions l=6,7,8l=6,7,8, use a similar normalization scaled to 0.873.
  • For unconstrained positive parameters l=1,9,,15l=1,9,\dots,15, use θkl=eψkl\theta_{kl} = e^{\psi_{kl}}​.

This ensures constraints on θ\theta but allows modeling on a normal scale for ψ\psi.

6.3 Population distribution of ψ\psi

For each parameter index l=1,,15l = 1,\dots,15:

  • Individual parameters ψkl\psi_{kl} are modeled as: ψklN(μl,τl2)\psi_{kl} \sim N(\mu_l, \tau_l^2)
  • Distribution is truncated at ±3 standard deviations to restrict parameters to physiologically reasonable values.
    • Truncation is also used as a diagnostic: if posterior draws accumulate at boundaries, the model and priors may be incompatible with data.

The parameters μl\mu_l​ and τl\tau_l represent, for each physiological parameter:

  • μl\mu_l: population mean (on the transformed scale)
  • τl\tau_l​: population standard deviation (inter-individual variability)

Model assumes independence across $l$ at the population level.
This assumption is checked after fitting by examining correlations among estimated parameters across individuals.


7. Hyperpriors (prior information)

For each parameter index ll:

  • Prior for population mean μl\mu_l: μlN(Ml,Sl2)\mu_l \sim N(M_l, S_l^2)
  • Prior for population variance τl2\tau_l^2​: τl2Inv-χ2(νl,τ0l2)\tau_l^2 \sim \text{Inv-}\chi^2(\nu_l, \tau_{0l}^2)

Hyperparameters Ml,Sl,τ0lM_l, S_l, \tau_{0l} and νl\nu_l are based on physiological and toxicological literature:

  • Human studies when available
  • Allometric scaling from animals otherwise
  • Weakly informative design:
    • SlS_l​ and τ0l\tau_{0l} are chosen large rather than small when literature is ambiguous.
    • νl\nu_l​ typically set low (e.g., 2) to allow substantial uncertainty.

Example:

  • Liver weight as fraction of lean body weight:
    • Population geometric mean ≈ 0.033
    • Both prior uncertainty on the mean and population heterogeneity ≈ 10–20%
    • So Mk=log0.033,Sk=log1.1,τ0k=log1.1M_k = \log 0.033, S_k = \log 1.1, \tau_{0k} = \log 1.1.
  • A Michaelis–Menten parameter (poorly known):
    • Population geometric mean ≈ 0.7
    • Prior uncertainty up to factor 100 above/below (very wide)
    • Population variability believed to be only up to factor 4 across individuals
    • So Mk=log0.7,Sk=log10,τ0k=log2M_k = \log 0.7, S_k = \log 10, \tau_{0k} = \log 2.

This structure separates:

  • Uncertainty about the typical value in the population (μl\mu_l),
  • Actual inter-individual variation (τl\tau_l).

8. Joint posterior distribution

Collect parameters:

  • Individual parameters: ψ={ψkl}\psi = \{\psi_{kl}\}
  • Population means: μ=(μ1,,μL)\mu = (\mu_1,\dots,\mu_L)
  • Population standard deviations: τ=(τ1,,τL)\tau = (\tau_1,\dots,\tau_L)
  • Measurement variances: σ=(σ1,σ2)\sigma = (\sigma_1,\sigma_2)

Given data yy, exposures EE, times tt, and hyperparameters M,S,τ0,νM, S, \tau_0, \nu, the joint posterior is (up to normalization):p(ψ,μ,τ2,σ2y,E,t,M,S,τ02,ν)p(yψ,E,t,σ2)p(ψμ,τ2)p(μ,τ2M,S,τ02)p(σ2)p(\psi, \mu, \tau^2, \sigma^2 \mid y, E, t, M, S, \tau_0^2, \nu) \propto p(y \mid \psi, E, t, \sigma^2)\, p(\psi \mid \mu, \tau^2)\, p(\mu, \tau^2 \mid M, S, \tau_0^2)\, p(\sigma^2)

  • p(yψ,E,t,σ2)p(y \mid \psi, E, t, \sigma^2): product of lognormal densities for all measurements, based on gm(θk,Ej,t)g_m(\theta_k, E_j, t).
  • p(ψμ,τ2)p(\psi \mid \mu, \tau^2): product of truncated normal densities for each ψkl\psi_{kl}​.
  • p(μ,τ2)p(\mu, \tau^2 \mid \cdot): product of normal priors for μl\mu_l and inverse-chi-square priors for τl2\tau_l^2​.
  • p(σ2)p(\sigma^2): priors for measurement variances.

Differential equation solutions gmg_m​ must be evaluated numerically for each combination of parameters and observation times.


9. Computation (Gibbs + Metropolis)

Sampling scheme:

  1. Use Gibbs sampling at the block level, cycling through:
    • σ2\sigma^2
    • τ2\tau^2
    • μ\mu
    • ψ1,,ψK\psi_1, \dots, \psi_K​ (individual parameter vectors)
  2. Conditional distributions:
    • For components of σ2\sigma^2: inverse-chi-square (or similar)
    • For components of τ2\tau^2: inverse-chi-square
    • For components of μ\mu: normal
    • For each vector ψk\psi_k​: no closed form → Metropolis steps
  3. Metropolis updates for ψk\psi_k​:
    • Update one person at a time (vector of length 15)
    • Use a multivariate normal proposal centered at current ψk\psi_k
    • Covariance matrix tuned from pilot runs
    • Scale to achieve acceptance rate ≈ 0.23
    • This reduces computation because only likelihood components for person kk need to be recomputed.

Several independent chains (e.g., 5 runs of 50,000 iterations) are started from dispersed initial values:

  • Initial ψkl\psi_{kl}​ draws from prior
  • μl\mu_l​ set to prior means MlM_l initially
  • First update σ\sigma and τ\tau

Only every 10th iteration is stored due to memory constraints. Convergence is monitored by within- vs between-chain variance diagnostics (e.g., R^\hat{R}).


10. Inference for fraction metabolized and population risk

After fitting the model, main quantities of interest:

  • For each person kk and exposure scenario (e.g., a given constant concentration and exposure duration):
    • Fraction of inhaled PERC metabolized, a nonlinear function computed by solving the differential equations with θk\theta_k​.
  • For the population:
    • Distribution of this fraction across individuals.

10.1 Individual-level distributions

For each subject kk:

  1. Take posterior draws of ψk\psi_k​, transform to θk\theta_k​.
  2. For each draw, run the toxicokinetic model for a given exposure profile (e.g., 50 ppm or 0.001 ppm).
  3. Compute fraction metabolized under that scenario.
  4. The resulting sample forms the posterior distribution of “fraction metabolized” for that individual.

Variation within this distribution reflects parameter uncertainty for person kk.

10.2 Population-level distributions

To simulate the distribution in a new individual from the same population:

  1. Draw (μ,τ)(\mu, \tau) from the posterior.
  2. Draw ψnew,lN(μl,τl2)\psi_{\text{new}, l} \sim N(\mu_l, \tau_l^2) (with truncation), transform to θnew\theta_{\text{new}}​.
  3. Run the model for the exposure scenario.
  4. Compute the fraction metabolized.

Repeating this yields the population distribution of the fraction metabolized, combining:

  • Posterior uncertainty about population parameters, and
  • True inter-individual variability.

Example numeric results:

  • At 50 ppm (high, occupational exposure), the 95% interval for the fraction metabolized in the population is approximately [0.5%, 4.1%].
  • At 0.001 ppm (low, environmental exposure), the 95% interval is approximately [15%, 58%].

Figure shows posterior distributions for each of the six experimental subjects at these two exposure levels. There are large differences across individuals, e.g., a factor-2 difference between some subjects.

10.3 Dose–response curve

The model is used to compute, for a “typical” individual or for random draws from the population:

  • The fraction of PERC metabolized per day after long-term inhalation exposure, as a function of exposure level.

Resulting curve:

  • Constant fraction at very low exposures (linear, unsaturated metabolism).
  • Saturation begins around 1 ppm, nearly complete by 10 ppm.
  • At high levels, fraction metabolized decreases with exposure because metabolism hits a maximum rate.

This is the curve shown in the earlier figure.


11. Model checking and external validation

11.1 Internal fit

Posterior simulations are used to check fit:

  • Compare observed concentrations yjkmty_{jkmt}​ to predicted gm(θk,Ej,t)g_m(\theta_k, E_j, t).
  • Examine relative errors (observed / predicted) vs predicted values.
  • Errors are modest in magnitude, suggesting reasonable fit and no major systematic bias.

11.2 External validation

Model predictions are compared with independent data:

  • A second experiment on 6 volunteers with lower exposure levels (0.5–9 ppm) and shorter follow-up (up to 50 minutes).
  • For each exposure and time, simulate from the population distribution p(θμ,τ)p(\theta \mid \mu, \tau) and compute predicted concentration ratios (blood/air).
  • Observed values are plotted against predictive intervals (95%, 99%) from the model.

Overall:

  • Model predictions agree reasonably well with external data, even though exposure levels and time scales differ.
  • The main discrepancy occurs in the first ~15 minutes after starting exposure, where the model assumes instantaneous equilibrium in some compartments, but real physiology requires some time to equilibrate.

12. Key takeaways

The analysis combines five essential elements:

  1. Physiological model
    • Encodes real biology and allows prior information from literature.
  2. Population model (hierarchical)
    • Distinguishes individual parameters from population distribution.
  3. Informative priors on physiological parameters
    • Based on human and animal data, separates typical values from inter-individual variability.
  4. Experimental data (concentration–time curves)
    • Provide likelihood information but are insufficient alone.
  5. Bayesian inference with MCMC
    • Produces a full posterior for parameters, combining data and prior information.
    • Gives direct uncertainty quantification for risk-relevant quantities (e.g., fraction metabolized at environmental exposures).

If any of these components is removed, the problem becomes either statistically or scientifically ill-posed. Together, they yield a coherent and interpretable risk assessment framework for PERC exposure in a human population.