 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
If we use just
the right amount of Gaussian noise, and if
|
|
|
we let the
weight vector wander around for long enough
|
|
|
before we take a
sample, we will get a sample from the
|
|
|
true posterior
over weight vectors.
|
|
|
|
– |
This
is called a “Markov Chain
Monte Carlo” method
|
|
|
and
it makes it feasible to use full Bayesian learning
|
|
|
with
hundreds or thousands of parameters.
|
|
|
|
– |
There
are related MCMC methods that are more
|
|
|
complicated
but more efficient (we don’t need to let the
|
|
weights
wander around for so long before we get
|
|
|
samples
from the posterior).
|
|
|
| • |
Radford Neal
(1995) showed that this works extremely
|
|
|
well when data is
limited but the model needs to be
|
|
|
complicated.
|
|