An amazing fact
If we use just the right amount of Gaussian noise, and if
we let the weight vector wander around for long enough
before we take a sample, we will get a sample from the
true posterior over weight vectors.
This is called a “Markov Chain Monte Carlo” method
and it makes it feasible to use full Bayesian learning
with hundreds or thousands of parameters.
There are related MCMC methods that are more
complicated but more efficient (we don’t need to let the
weights wander around for so long before we get
samples from the posterior).
Radford Neal (1995) showed that this works extremely
well when data is limited but the model needs to be
complicated.