1. Psychological experiment and motivation

The example analyzes reaction times in a psychology experiment using a finite mixture model.

  • There are $17$ individuals in total:
    • $11$ non-schizophrenic participants.
    • $6$ schizophrenic participants.
  • Each person has reaction time measured $30$ times.
  • The analysis uses the logarithms of reaction times (in milliseconds). The histograms in Figure show:
    • Schizophrenic participants have slower reaction times on average than non-schizophrenic participants.
    • For some schizophrenic participants, reaction times are much more variable than for others.

Psychological theory suggests two components for schizophrenic participants:

  1. Motor reflex retardation: a general slowing that affects all trials.
  2. Attentional lapses on some trials: when attention fails, reactions are much slower than usual.

The statistical model is built to reflect these two mechanisms.


2. Initial model for non-schizophrenic participants

For non-schizophrenic individuals ($j = 1, \dots, 11$), the model assumes a normal random-effects structure:

  • For person $j$, the $i$-th log reaction time is $y_{ij}$.
  • The responses for person $j$ are modeled as
    • $y_{ij} \sim \text{Normal}(\alpha_j, \sigma_y^2)$,
    • where $\alpha_j$ is a person-specific mean,
    • and $\sigma_y^2$ is a common within-person variance shared by all individuals.

So non-schizophrenic reaction times have:

  • Individual baselines $\alpha_j$.
  • A common variability $\sigma_y^2$ around those baselines.

3. Mixture model for schizophrenic participants

For schizophrenic individuals ($j = 12, \dots, 17$), the model introduces a two-component mixture to reflect attentional lapses:

  • With probability $(1 – \lambda)$:
    • No attentional delay.
    • $y_{ij} \sim \text{Normal}(\alpha_j, \sigma_y^2)$.
  • With probability $\lambda$:
    • A delayed response due to an attentional lapse.
    • $y_{ij} \sim \text{Normal}(\alpha_j + \tau, \sigma_y^2)$.

Here:

  • $\lambda$ is the probability that a given trial for a schizophrenic individual has an attentional delay.
  • $\tau$ is the size of the additional delay (on the log scale) when such a lapse occurs.

Because reaction times are positive and skewed, this model is applied to the logarithms of reaction times.


4. Hierarchical population model for the person means

The person-specific means $\alpha_j$ themselves are modeled hierarchically to capture motor reflex retardation at the group level:

  • Let $S_j$ be an indicator:
    • $S_j = 0$ if person $j$ is non-schizophrenic.
    • $S_j = 1$ if person $j$ is schizophrenic.

The hierarchical model is:

  • $\alpha_j \sim \text{Normal}(\mu + \beta S_j, \sigma_\alpha^2)$.

Interpretation:

  • For non-schizophrenic individuals ($S_j = 0$): $\alpha_j$ is centered at $\mu$.
  • For schizophrenic individuals ($S_j = 1$): $\alpha_j$ is centered at $\mu + \beta$.
  • $\beta$ measures the average shift in log reaction time between schizophrenics and non-schizophrenics (i.e., motor reflex retardation).
  • $\sigma_\alpha^2$ captures between-person variability in means.

The three key parameters of scientific interest are:

  1. $\beta$: average difference in baseline log reaction time between schizophrenics and non-schizophrenics (motor reflex retardation).
  2. $\lambda$: proportion of schizophrenic responses that are delayed (attentional lapses).
  3. $\tau$: size of the delay during a lapse, on the log scale.

5. Rewriting the model using indicator variables

To implement computation more easily (ECM and Gibbs sampling), the mixture is written using indicator variables $z_{ij}$:

  • $y_{ij}$ is the $i$-th response for individual $j$.
  • $z_{ij}$ is an unobserved indicator:
    • $z_{ij} = 0$ if trial $(i, j)$ comes from the non-delayed component.
    • $z_{ij} = 1$ if trial $(i, j)$ comes from the delayed component.

The hierarchical model is then:

  • $y_{ij} \mid \alpha_j, z_{ij}, \phi \sim \text{Normal}(\alpha_j + \tau z_{ij}, \sigma_y^2)$,
  • $\alpha_j \mid z, \phi \sim \text{Normal}(\mu + \beta S_j, \sigma_\alpha^2)$,
  • $z_{ij} \mid \phi \sim \text{Bernoulli}(\lambda S_j)$,

where $\phi = (\sigma_\alpha^2, \beta, \lambda, \tau, \mu, \sigma_y^2)$ collects the main hyperparameters, and $z$ collects all the $z_{ij}$.

The notation sometimes uses $\omega = (\alpha, \phi)$ to represent all parameters except the indicators $z$.

The indicator variables $z_{ij}$ are not strictly necessary to define the model, but they make the conditional distributions much simpler, which is crucial for ECM and Gibbs sampling. Since there are only two components, a single indicator $z_{ij}$ is sufficient for each observation.


6. Prior specification and identifiability

A noninformative uniform joint prior is placed on $\phi$:

  • $\tau$ is restricted to be positive to avoid non-identifiability (otherwise, “no delay” plus a negative $\tau$ could represent the same situation as “delay” plus a positive $\tau$).
  • $\sigma_\alpha^2 > 0$ and $\sigma_y^2 > 0$.
  • $\lambda$ is taken to be uniform on $[0.001, 0.999]$:
    • $\lambda = 0$ or $\lambda = 1$ would imply no mixture at all, which is inconsistent with the intended model.
  • Previous analyses and scientific reasoning indicate that a model without a mixture is inadequate.

7. Crude parameter estimates

Before running more sophisticated algorithms, the analysis constructs rough estimates:

  • For each person $j$, estimate $\alpha_j$ by the sample mean of their $30$ observations.
  • Estimate $\sigma_y^2$ by the average within-person sample variance among non-schizophrenic participants.
  • Using the estimated $\hat{\alpha}_j$, estimate:
    • $\mu$ as the average of $\hat{\alpha}_j$ over non-schizophrenics.
    • $\beta$ as the average difference between schizophrenics and non-schizophrenics.
    • $\sigma_\alpha^2$ as the variance of $\hat{\alpha}_j$ within groups.

From visual inspection of the six schizophrenic histograms, the authors crudely set:

  • $\hat{\lambda} = \frac{1}{3}$,
  • $\hat{\tau} = 1.0$.

No initial estimate of the indicators $z_{ij}$ is needed because these are updated in the first step of ECM/Gibbs.


8. ECM algorithm to find posterior modes

To find the modes of the posterior $p(z, \omega \mid y)$, the analysis uses an ECM (Expectation/Conditional Maximization) algorithm:

  1. Starting points:
    • Draw $100$ starting values for $\phi$ from a “simplified” distribution.
    • This distribution is built by taking the crude estimates and dividing each parameter by an independent $\chi^2_1$ random variable, to spread the starting points over the parameter space.
  2. E-step:
    • For a current guess $\omega_{\text{old}}$, compute the expectation
      • $E_{\text{old}}[\log p(z, \omega \mid y)]$,
      • where the expectation is over $z$ given $y$ and $\omega_{\text{old}}$.
    • For each $(i, j)$, compute:
      • $\Pr(z_{ij} = 0 \mid \omega_{\text{old}}, y) = 1 – \zeta_{ij}$,$\Pr(z_{ij} = 1 \mid \omega_{\text{old}}, y) = \zeta_{ij}$,
        where
      ζij=λoldN(yijαjold+τold,(σyold)2)(1λold)N(yijαjold,(σyold)2)+λoldN(yijαjold+τold,(σyold)2).\zeta_{ij} = \frac{\lambda_{\text{old}} \, N(y_{ij} \mid \alpha_j^{\text{old}} + \tau^{\text{old}}, (\sigma_y^{\text{old}})^2)} {(1 – \lambda_{\text{old}}) \, N(y_{ij} \mid \alpha_j^{\text{old}}, (\sigma_y^{\text{old}})^2) + \lambda_{\text{old}} \, N(y_{ij} \mid \alpha_j^{\text{old}} + \tau^{\text{old}}, (\sigma_y^{\text{old}})^2)}.
    • Here $N(\cdot \mid m, v)$ denotes the normal density with mean $m$ and variance $v$.
  3. M-step (conditional maximization):
    ECM updates one group of parameters at a time to increase $E_{\text{old}}[\log p(z, \omega \mid y)]$.
    • Update $\lambda$:
      • Compute the expected proportion of delayed trials among schizophrenics: λnew=1630j=1217i=130ζij,\lambda_{\text{new}} = \frac{1}{6 \cdot 30} \sum_{j=12}^{17} \sum_{i=1}^{30} \zeta_{ij},because there are $6$ schizophrenic individuals and $30$ trials each ($180$ trials total).
    • Update each $\alpha_j$:
      • Combine the normal prior $\alpha_j \sim \text{Normal}(\mu + \beta S_j, \sigma_\alpha^2)$ with the mixture likelihood.
      • The updated conditional mode has the form αj=(1σα2+301σy2)1(μ+βSjσα2+1σy2i=130(yijζijτ)),\alpha_j = \left(\frac{1}{\sigma_\alpha^2} + 30 \cdot \frac{1}{\sigma_y^2}\right)^{-1} \left( \frac{\mu + \beta S_j}{\sigma_\alpha^2} + \frac{1}{\sigma_y^2} \sum_{i=1}^{30} (y_{ij} – \zeta_{ij}\tau) \right),which corresponds to equation.
    • Update $\tau$ and $\sigma_y^2$:
      • Use only the delayed components for schizophrenic individuals (weighted by $\zeta_{ij}$): τnew=j=1217i=130ζij(yijαj)j=1217i=130ζij,\tau_{\text{new}} = \frac{\sum_{j=12}^{17} \sum_{i=1}^{30} \zeta_{ij} (y_{ij} – \alpha_j)} {\sum_{j=12}^{17} \sum_{i=1}^{30} \zeta_{ij}},τnew​=∑j=1217​∑i=130​ζij​∑j=1217​∑i=130​ζij​(yij​−αj​)​, (σynew)2=11730j=117i=130(yijαjζijτnew)2.(\sigma_y^{\text{new}})^2 = \frac{1}{17 \cdot 30} \sum_{j=1}^{17} \sum_{i=1}^{30} (y_{ij} – \alpha_j – \zeta_{ij}\tau_{\text{new}})^2.(σynew​)2=17⋅301​j=1∑17​i=1∑30​(yij​−αj​−ζij​τnew​)2.
    • Update $\mu$, $\beta$, and $\sigma_\alpha^2$:
      • These come from the normal population model for $\alpha_j$ with a uniform hyperprior:
        • $\mu_{\text{new}}$ is basically the average of $\alpha_j$ for non-schizophrenics.
        • $\beta_{\text{new}}$ is essentially the difference between the average $\alpha_j$ for schizophrenics and the average for non-schizophrenics.
        • $(\sigma_\alpha^{\text{new}})^2$ is the variance of $\alpha_j$ around the group means.

After running ECM from $100$ starting points, the algorithm finds three local modes:

  • One dominant (major) mode.
  • Two minor modes corresponding to nearly degenerate cases with $\lambda$ near $0$.

The minor modes have much lower posterior density (by a factor smaller than $e^{-20}$), so the analysis treats the target distribution as effectively unimodal.


9. Approximating the posterior near the major mode

The marginal posterior for $\omega$ after integrating out $z$ is:p(ωy)j=117N(αjμ+βSj,σα2)×j=117i=130[(1λSj)N(yijαj,σy2)+λSjN(yijαj+τ,σy2)].p(\omega \mid y) \propto \prod_{j=1}^{17} N(\alpha_j \mid \mu + \beta S_j, \sigma_\alpha^2) \times \prod_{j=1}^{17} \prod_{i=1}^{30} \left[ (1 – \lambda S_j) \, N(y_{ij} \mid \alpha_j, \sigma_y^2) + \lambda S_j \, N(y_{ij} \mid \alpha_j + \tau, \sigma_y^2) \right].

This function is evaluated along the ECM iterations to confirm that it increases.

At the major mode, the analysis constructs a multivariate $t_4$ approximation to $p(\omega \mid y)$:

  • Centered at the mode.
  • With scale given by the inverse of the Hessian (second-derivative matrix).

Then:

  • Draw $S = 2000$ samples of $\omega$ from this $t_4$ approximation.
  • Use these as starting points for importance resampling and for initializing the Gibbs sampler.

Samples near minor modes would have negligible weights and therefore do not matter.


10. Gibbs sampler construction

The Gibbs sampler uses the fact that the full conditional distributions:

  • $p(\phi \mid \alpha, z, y)$,
  • $p(\alpha \mid \phi, z, y)$,
  • $p(z \mid \alpha, \phi, y)$

have standard forms. One full Gibbs sweep:

  1. Update $z_{ij}$ for schizophrenics ($j = 12, \dots, 17$):
    • For $i = 1, \dots, 30$, draw
      • $z_{ij} \sim \text{Bernoulli}(\zeta_{ij})$,
      • where $\zeta_{ij}$ is as defined in equation (22.5).
    • For non-schizophrenics ($j < 12$), $z_{ij}$ is fixed at $0$.
  2. Update each $\alpha_j$:
    • Draw $\alpha_j$ from a normal distribution whose mean is the ECM conditional mode (equation (22.6) with $\zeta_{ij}$ replaced by the sampled $z_{ij}$) and whose variance is the inverse of the denominator in that expression.
  3. Update $\lambda$:
    • Let $h = \sum_{j=12}^{17} \sum_{i=1}^{30} z_{ij}$ be the total number of delayed trials among schizophrenic individuals.
    • Draw $\lambda$ from a $\text{Beta}(h + 1, 180 – h + 1)$ distribution, restricted to $[0.001, 0.999]$.
  4. Update the remaining parameters $(\beta, \mu, \tau, \sigma_y^2, \sigma_\alpha^2)$:
    • First, draw variance components from inverse–$\chi^2$ distributions:
      • $\sigma_y^2 \mid \alpha, \lambda, z \sim \text{Inv-}\chi^2!\left(508,; \frac{1}{508} \sum_{j=1}^{17} \sum_{i=1}^{30} (y_{ij} – \alpha_j – z_{ij}\tau)^2 \right)$.
      • $\sigma_\alpha^2 \mid \alpha, \lambda, z \sim \text{Inv-}\chi^2!\left(15,; \frac{1}{15} \sum_{j=1}^{17} (\alpha_j – \mu – \beta S_j)^2 \right)$.
    • Then draw $\tau$ (conditional on $\sigma_y^2$) from a normal distribution: τα,λ,z,σy2,σα2Normal ⁣(j=1217i=130zij(yijαj)j=1217i=130zij,σy2j=1217i=130zij).\tau \mid \alpha, \lambda, z, \sigma_y^2, \sigma_\alpha^2 \sim \text{Normal}\!\left( \frac{\sum_{j=12}^{17} \sum_{i=1}^{30} z_{ij} (y_{ij} – \alpha_j)} {\sum_{j=12}^{17} \sum_{i=1}^{30} z_{ij}}, \quad \frac{\sigma_y^2}{\sum_{j=12}^{17} \sum_{i=1}^{30} z_{ij}} \right).
    • Draw $\mu$ from a normal distribution that depends on $\alpha$, $\beta$, and $\sigma_\alpha^2$: μα,λ,z,β,σy2,σα2Normal ⁣(117j=117(αjβSj),σα217).\mu \mid \alpha, \lambda, z, \beta, \sigma_y^2, \sigma_\alpha^2 \sim \text{Normal}\!\left( \frac{1}{17} \sum_{j=1}^{17} (\alpha_j – \beta S_j), \quad \frac{\sigma_\alpha^2}{17} \right).
    • Draw $\beta$ from a normal distribution that depends on the difference between schizophrenic and non-schizophrenic means: βα,λ,z,μ,σy2,σα2Normal ⁣(16j=1217(αjμ),σα26).\beta \mid \alpha, \lambda, z, \mu, \sigma_y^2, \sigma_\alpha^2 \sim \text{Normal}\!\left( \frac{1}{6} \sum_{j=12}^{17} (\alpha_j – \mu), \quad \frac{\sigma_\alpha^2}{6} \right).

In essence, each Gibbs step is the stochastic counterpart to an ECM update: instead of taking the mode, the sampler draws from the full conditional distribution.


11. Posterior inferences for the original model

From the Gibbs output, the analysis computes posterior medians and $95%$ intervals for:

  • $\lambda$: probability that an observation from a schizophrenic participant is delayed.
  • $\tau$: attentional delay (log scale).
  • $\beta$: difference in average log reaction time between schizophrenics and non-schizophrenics for non-delayed trials.

The results (for the original model) show:

  • $\beta$ is clearly positive, implying that schizophrenic participants have slower average reaction times.
    • On the original time scale, $\exp(\beta)$ has a $95%$ posterior interval of approximately $[1.18, 1.62]$.
  • $\lambda$ is in the range $[0.07, 0.18]$, so attentional lapses are fairly infrequent.
  • $\exp(\tau)$ has a $95%$ interval around $[2.10, 2.61]$, so when a lapse occurs, the response time is about two to three times longer on average.

12. Posterior predictive checks for the original model

The model is checked against the data using posterior predictive distributions:

  • For a new measurement on an existing person $j$, predictions condition on that person’s $\alpha_j$.
  • For a new person, first draw a new $\tilde{\alpha}j$ from the population distribution $\text{Normal}(\mu + \beta S_j, \sigma\alpha^2)$, then simulate $\tilde{y}_{ij}$ given $\tilde{\alpha}_j$.

To assess fit for schizophrenic individuals, the analysis uses test statistics based on the within-person standard deviation:

  • For each schizophrenic $j = 12, \dots, 17$, compute
    • $s_j = \text{standard deviation of the 30 log reaction times } y_{ij}$.
  • Define
    • $T_{\min} = \min_j s_j$,
    • $T_{\max} = \max_j s_j$ across the six schizophrenic participants.

Then:

  • Simulate many replicated datasets $y^{\text{rep}}$ from the posterior predictive distribution.
  • For each replicate, compute $T_{\min}(y^{\text{rep}})$ and $T_{\max}(y^{\text{rep}})$.
  • Plot a scatter of $\big(T_{\min}(y^{\text{rep}}), T_{\max}(y^{\text{rep}})\big)$ and mark the observed pair $(T_{\min}(y), T_{\max}(y))$.

The result:

  • The observed $(T_{\min}, T_{\max})$ lies in an extreme region of the posterior predictive distribution.
  • Posterior predictive $p$-values are approximately $0.000$ for $T_{\min}$ and $1.000$ for $T_{\max}$.
  • Interpretation: the model underestimates the range of within-person variances among schizophrenic individuals. There is more heterogeneity than the model allows.

A naive test like the average $s_j$ would not be useful, because it is essentially a sufficient statistic for $\sigma_y^2$ and is automatically well-fitted by the model.


13. Expanded model with additional heterogeneity

To better capture person-to-person differences among schizophrenic individuals, the model is expanded by adding two parameters:

  1. $\omega$: the probability that a schizophrenic individual is prone to attentional delays at all.
  2. $\sigma_{y2}^2$: the variance of delayed observations, allowing delayed trials to be more variable than non-delayed trials.

The original model is a special case of this expanded model when $\omega = 1$ and $\sigma_{y2}^2 = \sigma_y^2$.

A new indicator variable $W_j$ is introduced:

  • $W_j = 1$ if person $j$ is prone to attentional delays,
  • $W_j = 0$ otherwise.
  • For non-schizophrenics, $W_j = 0$ automatically.
  • For each schizophrenic, $W_j \sim \text{Bernoulli}(\omega)$.

The expanded model becomes:

  • $y_{ij} \mid z_{ij}, \omega \sim \text{Normal}\big(\alpha_j + \tau z_{ij}, , (1 – z_{ij})\sigma_y^2 + z_{ij}\sigma_{y2}^2 \big)$,
  • $\alpha_j \mid z, S, \mu, \beta, \sigma_\alpha^2 \sim \text{Normal}(\mu + \beta S_j, \sigma_\alpha^2)$,
  • $z_{ij} \mid S, W, \omega \sim \text{Bernoulli}(\lambda S_j W_j)$,
  • $W_j \mid S, \omega \sim \text{Bernoulli}(\omega S_j)$.

Both $\omega$ and $\sigma_{y2}^2$ are given uniform priors.


14. Fitting the expanded model and comparing results

The new model can be fit by:

  • Adding Gibbs steps to update $\omega$, $\sigma_{y2}^2$, and $W_j$.
  • Slightly modifying the earlier Gibbs steps to condition on these new parameters.
  • Using ten draws from the previous posterior as starting values for the expanded model’s Gibbs chains.

Ten chains of length $500$ (discarding the first half as burn-in) are sufficient for approximate convergence (all $\hat{R}$ below $1.1$).

Posterior summaries show:

  • Under the expanded model, the proportion of delayed observations for schizophrenics ($\lambda$) is larger.
  • The size of each delay ($\tau$) is smaller.
  • $\beta$ is adjusted accordingly.
  • The new parameter $\omega$ (probability that a schizophrenic individual experiences attentional delays) has a posterior interval roughly in $[0.24, 0.84]$.

These differences indicate that the expanded model describes the data in a meaningfully different way and suggests a real improvement in fit, rather than mere overfitting.


15. Posterior predictive checks for the expanded model

The expanded model is evaluated using the same test quantities $T_{\min}$ and $T_{\max}$ based on within-person standard deviations for schizophrenics.

  • Posterior predictive simulations again generate $T_{\min}(y^{\text{rep}})$ and $T_{\max}(y^{\text{rep}})$.
  • The scatterplot shows that the posterior predictive cloud moves closer to the observed $(T_{\min}(y), T_{\max}(y))$.

Key points:

  • The observed value is still somewhat in the periphery, with predictive $p$-values about $0.97$ and $0.05$.
  • However, the magnitude of misfit is much smaller than for the original model. This is visible from the change in scale between previous figure and this.
  • Conclusion:
    • The expanded model provides a substantially better fit to the variation in within-person variability among schizophrenics.
    • Some lack of fit remains, which suggests possible directions for further model development or for collecting more data.