Objective Selection of the Bin Width for a Time Histogram

1. The core problem

A time histogram (often called a PSTH) is used to estimate a time-dependent firing rate $\lambda(t)$ from repeated spike trains.

The estimate depends critically on the choice of the bin width $\Delta$:

If $\Delta$ is too large, the estimator smooths out genuine temporal structure and fails to capture changes in $\lambda(t)$.
If $\Delta$ is too small, each bin contains very few spikes, and the estimator is dominated by random fluctuations rather than signal.

The central goal is to choose $\Delta$ objectively, using only the observed spike data, so that the histogram best represents the unknown rate $\lambda(t)$.

2. Data structure and assumptions

Consider the following experimental setup:

A stimulus is presented repeatedly under identical conditions.
The neuron is recorded for a fixed duration $T$ on each trial.
The experiment is repeated $n$ times, producing $n$ spike trains.

Let $\lambda(t)$ denote the underlying firing rate, which may vary over time.

When spikes are pooled across trials, the number of spikes falling into a small time interval can be approximated by a time-dependent Poisson process. This approximation becomes increasingly accurate as trials are aggregated.

3. Binning and spike counts

Choose a bin width $\Delta$, which divides the interval $[0,T]$ into $N = T/\Delta$ bins.

For bin $i$:

Let $k_i$ be the total number of spikes across all $n$ trials that fall into that bin.
Define the bin-averaged rate as
$\theta_i = \frac{1}{\Delta} \int_{(i-1)\Delta}^{i\Delta} \lambda(t),dt$.

Under the Poisson assumption,
$k_i \sim \text{Poisson}(n\Delta,\theta_i)$.

4. Bar-histogram rate estimator

The histogram estimator assigns a constant rate within each bin:

$\hat{\theta}_i = \frac{k_i}{n\Delta}$

This estimator is unbiased, meaning
$\mathbb{E}[\hat{\theta}_i] = \theta_i$.

5. Criterion for optimality: mean integrated squared error

To quantify how well the estimator approximates the true rate, define

$\text{MISE} = \frac{1}{T} \int_0^T \mathbb{E}\big[(\hat{\lambda}(t) – \lambda(t))^2\big],dt$

This quantity measures the expected squared error, averaged over time.

Direct minimization of MISE is impossible because $\lambda(t)$ is unknown. The strategy is therefore to transform this expression into an equivalent form that depends only on observable quantities.

6. Decomposition of the error

Within a single bin, the squared error separates naturally into two components:

Estimation variance:
$\mathbb{E}[(\hat{\theta} – \theta)^2]$, arising from random spike counts.
Approximation error:
$\frac{1}{\Delta} \int_0^\Delta (\lambda(t) – \theta)^2,dt$, arising because the histogram forces the rate to be constant within the bin.

Only the balance between these two terms depends on $\Delta$.

7. Removing terms independent of $\Delta$

Terms that do not depend on $\Delta$ do not affect the location of the minimum. Removing them leads to a cost function $C_n(\Delta)$ that satisfies:

Minimizing $C_n(\Delta)$ is equivalent to minimizing MISE.
$C_n(\Delta)$ can be estimated from the data.

The resulting cost function has the form

$C_n(\Delta) = \mathbb{E}[(\hat{\theta} – \theta)^2] – \mathbb{E}[(\theta – \langle \theta \rangle)^2]$

where $\langle \theta \rangle$ denotes the average of $\theta$ over time.

8. Key Poisson identity

For a Poisson random variable, variance equals the mean.

Applied to the rate estimator, this yields

$\mathbb{E}[(\hat{\theta} – \theta)^2] = \frac{1}{n\Delta},\mathbb{E}[\hat{\theta}]$

This identity allows the cost function to be rewritten entirely in terms of observable spike counts.

9. Empirical form of the cost function

For a fixed $\Delta$:

Mean spike count per bin:
$\bar{k} = \frac{1}{N}\sum_{i=1}^N k_i$
Variance of spike counts across bins:
$v = \frac{1}{N}\sum_{i=1}^N (k_i – \bar{k})^2$

The empirical cost function becomes

$C_n(\Delta) = \frac{2\bar{k} – v}{(n\Delta)^2}$

10. Definition of the optimal bin width

The optimal bin width is defined as

$\Delta^* \equiv \arg\min_{\Delta} C_n(\Delta)$

Here, the superscript $*$ is not an exponent; it labels the value of $\Delta$ that minimizes the cost.

11. Interpretation of divergence

If the number of trials $n$ is too small and the underlying rate fluctuates only weakly, the cost function may have:

no finite minimum, or
a minimum at $\Delta \approx T$.

This indicates that the available data do not support meaningful time-resolved rate estimation using a histogram. The divergence is therefore a diagnostic signal, not a failure.

12. Extrapolation to additional trials

Even with only $n$ observed trials, it is possible to estimate what would happen if more trials were available.

For a hypothetical number of trials $m > n$, the extrapolated cost function is

$C_m(\Delta \mid n) = \left(\frac{1}{m} – \frac{1}{n}\right)\frac{\bar{k}}{n^2\Delta^2} + C_n(\Delta)$

This expression predicts how the balance between noise and resolution would change with additional data.

Define the corresponding optimal bin width as

$\Delta_m^* = \arg\min_{\Delta} C_m(\Delta \mid n)$

13. Critical number of trials

Empirically, the inverse optimal bin width satisfies

$\frac{1}{\Delta_m^*} \propto \left(\frac{1}{m} – \frac{1}{\hat{n}_c}\right)$

This relation defines a critical number of trials $\hat{n}_c$:

If $m < \hat{n}_c$, then $\Delta_m^*$ diverges.
If $m > \hat{n}_c$, then $\Delta_m^*$ is finite.

Thus, $\hat{n}_c$ estimates the minimum number of repetitions required before time-dependent structure can be resolved.

14. Theoretical interpretation using rate correlations

Let $\mu = \mathbb{E}[\lambda(t)]$ and define the autocorrelation of rate fluctuations as

$\phi(\tau) = \mathbb{E}[(\lambda(t)-\mu)(\lambda(t+\tau)-\mu)]$

The cost function can be written as

$C_n(\Delta) = \frac{\mu}{n} – \frac{1}{\Delta^2} \int_0^\Delta \int_0^\Delta \phi(t_1 – t_2),dt_1 dt_2$

This expression clarifies the tradeoff:

The term $\mu/n$ reflects sampling noise.
The integral term reflects how temporal correlations are averaged out by binning.

15. Closed-form expression for the critical trial number

From the correlation structure,

$n_c = \frac{\mu}{\int_{-\infty}^{\infty} \phi(t),dt}$

This shows that stronger or longer-lasting rate fluctuations reduce the number of trials required to resolve time variation.

16. Scaling laws for the optimal bin width

The behavior of $\Delta^*$ as $n$ increases depends on the smoothness of $\lambda(t)$.

Smooth rate functions

If $\phi(t)$ is smooth near $t=0$, then

$\Delta^* \propto n^{-1/3}$

Rates with sharp fluctuations

If $\phi(t)$ has a cusp at $t=0$, then

$\Delta^* \propto n^{-1/2}$

The scaling exponent therefore provides information about the temporal structure of the underlying rate.

17. Extension to piecewise-linear histograms

Replacing the piecewise-constant histogram with a piecewise-linear estimator improves approximation accuracy.

For this estimator:

Smooth rates yield
$\Delta^* \propto n^{-1/5}$
Non-smooth rates yield
$\Delta^* \propto n^{-1/2}$

The same cost-minimization and extrapolation principles apply.

18. Conceptual summary

This framework provides:

A fully data-driven rule for selecting histogram bin width.
A diagnostic for determining whether time-resolved estimation is justified.
A method for estimating how many additional trials are required.
A link between bin-width scaling and temporal smoothness of the underlying rate.

Your Gateway to Data Mastery

Learn, explore, and innovate with data science.

Objective Selection of the Bin Width for a Time Histogram

1. The core problem

2. Data structure and assumptions

3. Binning and spike counts

4. Bar-histogram rate estimator

5. Criterion for optimality: mean integrated squared error

6. Decomposition of the error

7. Removing terms independent of $\Delta$

8. Key Poisson identity

9. Empirical form of the cost function

10. Definition of the optimal bin width

11. Interpretation of divergence

12. Extrapolation to additional trials

13. Critical number of trials

14. Theoretical interpretation using rate correlations

15. Closed-form expression for the critical trial number

16. Scaling laws for the optimal bin width

Smooth rate functions

Rates with sharp fluctuations

17. Extension to piecewise-linear histograms

18. Conceptual summary

Like this:

Related

Leave a ReplyCancel reply

1. The core problem

2. Data structure and assumptions

3. Binning and spike counts

4. Bar-histogram rate estimator

5. Criterion for optimality: mean integrated squared error

6. Decomposition of the error

7. Removing terms independent of $\Delta$

8. Key Poisson identity

9. Empirical form of the cost function

10. Definition of the optimal bin width

11. Interpretation of divergence

12. Extrapolation to additional trials

13. Critical number of trials

14. Theoretical interpretation using rate correlations

15. Closed-form expression for the critical trial number

16. Scaling laws for the optimal bin width

Smooth rate functions

Rates with sharp fluctuations

17. Extension to piecewise-linear histograms

18. Conceptual summary

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Your Gateway to Data Mastery