1. Standard Neural Networks

  • In a normal neural net, weights WWW are fixed after training.
  • Training finds the best point estimate (usually by minimizing loss with gradient descent).
  • Problem: this doesn’t capture uncertainty. If the data is limited or noisy, the model might be overconfident.

2. Bayesian Perspective

In Bayesian statistics, parameters are treated as random variables with probability distributions.

For BNNs:

  • Instead of a fixed weight $W$, we assume a posterior distribution:
    • $p(W \mid D) \propto p(D \mid W) \, p(W)$
      • $D$: training data
      • $p(W)$: prior distribution over weights (e.g., Gaussian with mean 0, small variance)
      • $(D \mid W)$: likelihood (how well the weights explain the data)
      • $p(W \mid D)$: posterior distribution of weights

This means after training, we don’t just get one set of weights—we get a whole distribution of possible weights.


3. Prediction with BNNs

To make predictions, we integrate over all possible weights:

$p(y \mid x, D) = \int p(y \mid x, W) \, p(W \mid D) \, dW$

  • This accounts for both data uncertainty (noise in labels) and model uncertainty (not enough data to be sure about weights).
  • The output is not just a point prediction, but a distribution over predictions.

4. Why It’s Hard

The integral above is intractable (too complex).
So we use approximation methods:

  • Variational Inference (VI) → approximate the posterior with a simpler distribution $q(W)$.
  • Markov Chain Monte Carlo (MCMC) → sample weights from the posterior (accurate but slow).
  • Dropout as Approximate Bayesian Inference (Gal & Ghahramani, 2016) → using dropout at test time mimics sampling from a weight distribution.

5. Advantages

  • Uncertainty estimation: model knows when it doesn’t know.
  • Better generalization: avoids overconfident wrong predictions.
  • Decision making: useful in medicine, self-driving cars, finance, etc., where uncertainty is critical.

6. Disadvantages

  • Computationally heavy (slower training and inference).
  • Approximation quality depends on chosen method (VI vs. MCMC vs. dropout).
  • Not as widely adopted in large-scale deep learning compared to standard nets.

Summary:
A Bayesian Neural Network treats weights not as fixed numbers but as probability distributions.
Instead of one “best” network, you get a whole family of networks weighted by probability.
Predictions come with uncertainty estimates, making BNNs safer and more reliable than standard neural nets.