This section introduces a Bayesian formulation of the histogram for estimating an unknown density $f$ from iid data. The purpose is twofold: to provide a simple parametric Bayesian density estimator with closed-form posterior updates, and to motivate the fully nonparametric Bayesian density models developed later using the Dirichlet process.

1. Problem setting

Assume observations $y_i \sim f$ are independent and identically distributed, and the objective is to estimate the density function $f$. Histograms are often used as simple density estimators, but classical histograms depend strongly on arbitrary choices such as bin width and bin locations. The Bayesian histogram preserves the histogram structure while treating the bin probabilities as random variables, allowing uncertainty quantification and principled incorporation of prior information.

2. Histogram-based density model

Fix a set of knots (bin boundaries)

$\xi = (\xi_0,\xi_1,\dots,\xi_k)$ satisfying $\xi_0 < \xi_1 < \cdots < \xi_k$, and assume $y_i \in [\xi_0,\xi_k]$.

Define a piecewise-constant density:

$f(y) = \sum_{h=1}^k 1_{\xi_{h-1} < y \le \xi_h},\frac{\pi_h}{\xi_h – \xi_{h-1}}, \quad y \in \mathbb{R},$

where $\pi = (\pi_1,\dots,\pi_k)$ is an unknown probability vector with $\pi_h \ge 0$ and $\sum_{h=1}^k \pi_h = 1$. Each $\pi_h$ represents the total probability mass in bin $h$, and dividing by the bin width $(\xi_h – \xi_{h-1})$ yields a constant density within that bin.

Under this model, the shape of the density is determined by the fixed bins, while uncertainty is expressed through the unknown bin probabilities.

3. Prior specification: Dirichlet distribution

A Dirichlet prior is placed on $\pi$:

$\pi \sim \text{Dirichlet}(a_1,\dots,a_k),$

with density

$p(\pi \mid a) = \frac{\Gamma!\left(\sum_{h=1}^k a_h\right)}{\prod_{h=1}^k \Gamma(a_h)} \prod_{h=1}^k \pi_h^{a_h – 1}.$

The hyperparameters can be written as $a = \alpha \pi_0$, where

$\pi_0 = E(\pi \mid a) = \left(\frac{a_1}{\sum_{h=1}^k a_h},\dots,\frac{a_k}{\sum_{h=1}^k a_h}\right)$

is the prior mean, and

$\alpha = \sum_{h=1}^k a_h$

acts as a prior sample size controlling the strength of prior information. Larger values of $\alpha$ imply stronger prior beliefs about $\pi$.

4. Posterior distribution and conjugacy

Let $n_h$ denote the number of observations in bin $h$:

$n_h = \sum_i 1_{\xi_{h-1} < y_i \le \xi_h}.$

Combining the likelihood with the Dirichlet prior yields a conjugate posterior:

$p(\pi \mid y) \propto \prod_{h=1}^k \pi_h^{a_h + n_h – 1},$

so that

$\pi \mid y \sim \text{Dirichlet}(a_1 + n_1,\dots,a_k + n_k).$

This conjugacy allows exact posterior computation and straightforward simulation from the posterior distribution of the density.

5. Illustrative example

Data are simulated from a bimodal mixture on $[0,1]$:

$f(y) = 0.75,\text{Beta}(y \mid 1,5) + 0.25,\text{Beta}(y \mid 20,2),$

with $n=100$ observations. Using 10 equally spaced bins over $[0,1]$, the Bayesian histogram posterior is computed. Posterior draws of the histogram density approximate the true density reasonably well, but the approximation quality depends strongly on the number and placement of the knots.

6. Advantages and limitations

Advantages

  • Posterior computation is analytically tractable due to conjugacy.
  • Prior beliefs about the density can be expressed through $(\pi_0,\alpha)$.
  • Posterior uncertainty is easily quantified using credible intervals.
  • Implementation is simple and computationally efficient.

Limitations

  • Results are sensitive to the choice of bin locations and bin counts.
  • Introducing random knots and using reversible jump MCMC is computationally demanding.
  • Averaging over random bins may introduce artificial local irregularities.
  • The Dirichlet prior does not enforce smoothing across adjacent bins, which can result in visually rough density estimates.

Despite these limitations, Bayesian histograms provide a clear and interpretable bridge between classical histograms and fully nonparametric Bayesian density models, motivating the use of Dirichlet process–based approaches in subsequent sections.