1. The core problem

A time histogram (often called a PSTH) is used to estimate a time-dependent firing rate $\lambda(t)$ from repeated spike trains.

The estimate depends critically on the choice of the bin width $\Delta$:

  • If $\Delta$ is too large, the estimator smooths out genuine temporal structure and fails to capture changes in $\lambda(t)$.
  • If $\Delta$ is too small, each bin contains very few spikes, and the estimator is dominated by random fluctuations rather than signal.

The central goal is to choose $\Delta$ objectively, using only the observed spike data, so that the histogram best represents the unknown rate $\lambda(t)$.


2. Data structure and assumptions

Consider the following experimental setup:

  • A stimulus is presented repeatedly under identical conditions.
  • The neuron is recorded for a fixed duration $T$ on each trial.
  • The experiment is repeated $n$ times, producing $n$ spike trains.

Let $\lambda(t)$ denote the underlying firing rate, which may vary over time.

When spikes are pooled across trials, the number of spikes falling into a small time interval can be approximated by a time-dependent Poisson process. This approximation becomes increasingly accurate as trials are aggregated.


3. Binning and spike counts

Choose a bin width $\Delta$, which divides the interval $[0,T]$ into $N = T/\Delta$ bins.

For bin $i$:

  • Let $k_i$ be the total number of spikes across all $n$ trials that fall into that bin.
  • Define the bin-averaged rate as
    $\theta_i = \frac{1}{\Delta} \int_{(i-1)\Delta}^{i\Delta} \lambda(t),dt$.

Under the Poisson assumption,
$k_i \sim \text{Poisson}(n\Delta,\theta_i)$.


4. Bar-histogram rate estimator

The histogram estimator assigns a constant rate within each bin:

$\hat{\theta}_i = \frac{k_i}{n\Delta}$

This estimator is unbiased, meaning
$\mathbb{E}[\hat{\theta}_i] = \theta_i$.


5. Criterion for optimality: mean integrated squared error

To quantify how well the estimator approximates the true rate, define

$\text{MISE} = \frac{1}{T} \int_0^T \mathbb{E}\big[(\hat{\lambda}(t) – \lambda(t))^2\big],dt$

This quantity measures the expected squared error, averaged over time.

Direct minimization of MISE is impossible because $\lambda(t)$ is unknown. The strategy is therefore to transform this expression into an equivalent form that depends only on observable quantities.


6. Decomposition of the error

Within a single bin, the squared error separates naturally into two components:

  1. Estimation variance:
    $\mathbb{E}[(\hat{\theta} – \theta)^2]$, arising from random spike counts.
  2. Approximation error:
    $\frac{1}{\Delta} \int_0^\Delta (\lambda(t) – \theta)^2,dt$, arising because the histogram forces the rate to be constant within the bin.

Only the balance between these two terms depends on $\Delta$.


7. Removing terms independent of $\Delta$

Terms that do not depend on $\Delta$ do not affect the location of the minimum. Removing them leads to a cost function $C_n(\Delta)$ that satisfies:

  • Minimizing $C_n(\Delta)$ is equivalent to minimizing MISE.
  • $C_n(\Delta)$ can be estimated from the data.

The resulting cost function has the form

$C_n(\Delta) = \mathbb{E}[(\hat{\theta} – \theta)^2] – \mathbb{E}[(\theta – \langle \theta \rangle)^2]$

where $\langle \theta \rangle$ denotes the average of $\theta$ over time.


8. Key Poisson identity

For a Poisson random variable, variance equals the mean.

Applied to the rate estimator, this yields

$\mathbb{E}[(\hat{\theta} – \theta)^2] = \frac{1}{n\Delta},\mathbb{E}[\hat{\theta}]$

This identity allows the cost function to be rewritten entirely in terms of observable spike counts.


9. Empirical form of the cost function

For a fixed $\Delta$:

  • Mean spike count per bin:
    $\bar{k} = \frac{1}{N}\sum_{i=1}^N k_i$
  • Variance of spike counts across bins:
    $v = \frac{1}{N}\sum_{i=1}^N (k_i – \bar{k})^2$

The empirical cost function becomes

$C_n(\Delta) = \frac{2\bar{k} – v}{(n\Delta)^2}$


10. Definition of the optimal bin width

The optimal bin width is defined as

$\Delta^* \equiv \arg\min_{\Delta} C_n(\Delta)$

Here, the superscript $*$ is not an exponent; it labels the value of $\Delta$ that minimizes the cost.


11. Interpretation of divergence

If the number of trials $n$ is too small and the underlying rate fluctuates only weakly, the cost function may have:

  • no finite minimum, or
  • a minimum at $\Delta \approx T$.

This indicates that the available data do not support meaningful time-resolved rate estimation using a histogram. The divergence is therefore a diagnostic signal, not a failure.


12. Extrapolation to additional trials

Even with only $n$ observed trials, it is possible to estimate what would happen if more trials were available.

For a hypothetical number of trials $m > n$, the extrapolated cost function is

$C_m(\Delta \mid n) = \left(\frac{1}{m} – \frac{1}{n}\right)\frac{\bar{k}}{n^2\Delta^2} + C_n(\Delta)$

This expression predicts how the balance between noise and resolution would change with additional data.

Define the corresponding optimal bin width as

$\Delta_m^* = \arg\min_{\Delta} C_m(\Delta \mid n)$


13. Critical number of trials

Empirically, the inverse optimal bin width satisfies

$\frac{1}{\Delta_m^*} \propto \left(\frac{1}{m} – \frac{1}{\hat{n}_c}\right)$

This relation defines a critical number of trials $\hat{n}_c$:

  • If $m < \hat{n}_c$, then $\Delta_m^*$ diverges.
  • If $m > \hat{n}_c$, then $\Delta_m^*$ is finite.

Thus, $\hat{n}_c$ estimates the minimum number of repetitions required before time-dependent structure can be resolved.


14. Theoretical interpretation using rate correlations

Let $\mu = \mathbb{E}[\lambda(t)]$ and define the autocorrelation of rate fluctuations as

$\phi(\tau) = \mathbb{E}[(\lambda(t)-\mu)(\lambda(t+\tau)-\mu)]$

The cost function can be written as

$C_n(\Delta) = \frac{\mu}{n} – \frac{1}{\Delta^2} \int_0^\Delta \int_0^\Delta \phi(t_1 – t_2),dt_1 dt_2$

This expression clarifies the tradeoff:

  • The term $\mu/n$ reflects sampling noise.
  • The integral term reflects how temporal correlations are averaged out by binning.

15. Closed-form expression for the critical trial number

From the correlation structure,

$n_c = \frac{\mu}{\int_{-\infty}^{\infty} \phi(t),dt}$

This shows that stronger or longer-lasting rate fluctuations reduce the number of trials required to resolve time variation.


16. Scaling laws for the optimal bin width

The behavior of $\Delta^*$ as $n$ increases depends on the smoothness of $\lambda(t)$.

Smooth rate functions

If $\phi(t)$ is smooth near $t=0$, then

$\Delta^* \propto n^{-1/3}$

Rates with sharp fluctuations

If $\phi(t)$ has a cusp at $t=0$, then

$\Delta^* \propto n^{-1/2}$

The scaling exponent therefore provides information about the temporal structure of the underlying rate.


17. Extension to piecewise-linear histograms

Replacing the piecewise-constant histogram with a piecewise-linear estimator improves approximation accuracy.

For this estimator:

  • Smooth rates yield
    $\Delta^* \propto n^{-1/5}$
  • Non-smooth rates yield
    $\Delta^* \propto n^{-1/2}$

The same cost-minimization and extrapolation principles apply.


18. Conceptual summary

This framework provides:

  1. A fully data-driven rule for selecting histogram bin width.
  2. A diagnostic for determining whether time-resolved estimation is justified.
  3. A method for estimating how many additional trials are required.
  4. A link between bin-width scaling and temporal smoothness of the underlying rate.