1. The core problem
A time histogram (often called a PSTH) is used to estimate a time-dependent firing rate $\lambda(t)$ from repeated spike trains.
The estimate depends critically on the choice of the bin width $\Delta$:
- If $\Delta$ is too large, the estimator smooths out genuine temporal structure and fails to capture changes in $\lambda(t)$.
- If $\Delta$ is too small, each bin contains very few spikes, and the estimator is dominated by random fluctuations rather than signal.
The central goal is to choose $\Delta$ objectively, using only the observed spike data, so that the histogram best represents the unknown rate $\lambda(t)$.
2. Data structure and assumptions
Consider the following experimental setup:
- A stimulus is presented repeatedly under identical conditions.
- The neuron is recorded for a fixed duration $T$ on each trial.
- The experiment is repeated $n$ times, producing $n$ spike trains.
Let $\lambda(t)$ denote the underlying firing rate, which may vary over time.
When spikes are pooled across trials, the number of spikes falling into a small time interval can be approximated by a time-dependent Poisson process. This approximation becomes increasingly accurate as trials are aggregated.
3. Binning and spike counts
Choose a bin width $\Delta$, which divides the interval $[0,T]$ into $N = T/\Delta$ bins.
For bin $i$:
- Let $k_i$ be the total number of spikes across all $n$ trials that fall into that bin.
- Define the bin-averaged rate as
$\theta_i = \frac{1}{\Delta} \int_{(i-1)\Delta}^{i\Delta} \lambda(t),dt$.
Under the Poisson assumption,
$k_i \sim \text{Poisson}(n\Delta,\theta_i)$.
4. Bar-histogram rate estimator
The histogram estimator assigns a constant rate within each bin:
$\hat{\theta}_i = \frac{k_i}{n\Delta}$
This estimator is unbiased, meaning
$\mathbb{E}[\hat{\theta}_i] = \theta_i$.
5. Criterion for optimality: mean integrated squared error
To quantify how well the estimator approximates the true rate, define
$\text{MISE} = \frac{1}{T} \int_0^T \mathbb{E}\big[(\hat{\lambda}(t) – \lambda(t))^2\big],dt$
This quantity measures the expected squared error, averaged over time.
Direct minimization of MISE is impossible because $\lambda(t)$ is unknown. The strategy is therefore to transform this expression into an equivalent form that depends only on observable quantities.
6. Decomposition of the error
Within a single bin, the squared error separates naturally into two components:
- Estimation variance:
$\mathbb{E}[(\hat{\theta} – \theta)^2]$, arising from random spike counts. - Approximation error:
$\frac{1}{\Delta} \int_0^\Delta (\lambda(t) – \theta)^2,dt$, arising because the histogram forces the rate to be constant within the bin.
Only the balance between these two terms depends on $\Delta$.
7. Removing terms independent of $\Delta$
Terms that do not depend on $\Delta$ do not affect the location of the minimum. Removing them leads to a cost function $C_n(\Delta)$ that satisfies:
- Minimizing $C_n(\Delta)$ is equivalent to minimizing MISE.
- $C_n(\Delta)$ can be estimated from the data.
The resulting cost function has the form
$C_n(\Delta) = \mathbb{E}[(\hat{\theta} – \theta)^2] – \mathbb{E}[(\theta – \langle \theta \rangle)^2]$
where $\langle \theta \rangle$ denotes the average of $\theta$ over time.
8. Key Poisson identity
For a Poisson random variable, variance equals the mean.
Applied to the rate estimator, this yields
$\mathbb{E}[(\hat{\theta} – \theta)^2] = \frac{1}{n\Delta},\mathbb{E}[\hat{\theta}]$
This identity allows the cost function to be rewritten entirely in terms of observable spike counts.
9. Empirical form of the cost function
For a fixed $\Delta$:
- Mean spike count per bin:
$\bar{k} = \frac{1}{N}\sum_{i=1}^N k_i$ - Variance of spike counts across bins:
$v = \frac{1}{N}\sum_{i=1}^N (k_i – \bar{k})^2$
The empirical cost function becomes
$C_n(\Delta) = \frac{2\bar{k} – v}{(n\Delta)^2}$
10. Definition of the optimal bin width
The optimal bin width is defined as
$\Delta^* \equiv \arg\min_{\Delta} C_n(\Delta)$
Here, the superscript $*$ is not an exponent; it labels the value of $\Delta$ that minimizes the cost.
11. Interpretation of divergence
If the number of trials $n$ is too small and the underlying rate fluctuates only weakly, the cost function may have:
- no finite minimum, or
- a minimum at $\Delta \approx T$.
This indicates that the available data do not support meaningful time-resolved rate estimation using a histogram. The divergence is therefore a diagnostic signal, not a failure.
12. Extrapolation to additional trials
Even with only $n$ observed trials, it is possible to estimate what would happen if more trials were available.
For a hypothetical number of trials $m > n$, the extrapolated cost function is
$C_m(\Delta \mid n) = \left(\frac{1}{m} – \frac{1}{n}\right)\frac{\bar{k}}{n^2\Delta^2} + C_n(\Delta)$
This expression predicts how the balance between noise and resolution would change with additional data.
Define the corresponding optimal bin width as
$\Delta_m^* = \arg\min_{\Delta} C_m(\Delta \mid n)$
13. Critical number of trials
Empirically, the inverse optimal bin width satisfies
$\frac{1}{\Delta_m^*} \propto \left(\frac{1}{m} – \frac{1}{\hat{n}_c}\right)$
This relation defines a critical number of trials $\hat{n}_c$:
- If $m < \hat{n}_c$, then $\Delta_m^*$ diverges.
- If $m > \hat{n}_c$, then $\Delta_m^*$ is finite.
Thus, $\hat{n}_c$ estimates the minimum number of repetitions required before time-dependent structure can be resolved.
14. Theoretical interpretation using rate correlations
Let $\mu = \mathbb{E}[\lambda(t)]$ and define the autocorrelation of rate fluctuations as
$\phi(\tau) = \mathbb{E}[(\lambda(t)-\mu)(\lambda(t+\tau)-\mu)]$
The cost function can be written as
$C_n(\Delta) = \frac{\mu}{n} – \frac{1}{\Delta^2} \int_0^\Delta \int_0^\Delta \phi(t_1 – t_2),dt_1 dt_2$
This expression clarifies the tradeoff:
- The term $\mu/n$ reflects sampling noise.
- The integral term reflects how temporal correlations are averaged out by binning.
15. Closed-form expression for the critical trial number
From the correlation structure,
$n_c = \frac{\mu}{\int_{-\infty}^{\infty} \phi(t),dt}$
This shows that stronger or longer-lasting rate fluctuations reduce the number of trials required to resolve time variation.
16. Scaling laws for the optimal bin width
The behavior of $\Delta^*$ as $n$ increases depends on the smoothness of $\lambda(t)$.
Smooth rate functions
If $\phi(t)$ is smooth near $t=0$, then
$\Delta^* \propto n^{-1/3}$
Rates with sharp fluctuations
If $\phi(t)$ has a cusp at $t=0$, then
$\Delta^* \propto n^{-1/2}$
The scaling exponent therefore provides information about the temporal structure of the underlying rate.
17. Extension to piecewise-linear histograms
Replacing the piecewise-constant histogram with a piecewise-linear estimator improves approximation accuracy.
For this estimator:
- Smooth rates yield
$\Delta^* \propto n^{-1/5}$ - Non-smooth rates yield
$\Delta^* \propto n^{-1/2}$
The same cost-minimization and extrapolation principles apply.
18. Conceptual summary
This framework provides:
- A fully data-driven rule for selecting histogram bin width.
- A diagnostic for determining whether time-resolved estimation is justified.
- A method for estimating how many additional trials are required.
- A link between bin-width scaling and temporal smoothness of the underlying rate.
