 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
If the neural
net only has a few parameters we could put
|
|
|
a grid over the
parameter space and evaluate p( W | D )
|
|
|
at each
grid-point.
|
|
|
|
– |
This
is expensive, but it does not involve any gradient
|
|
descent
and there are no local optimum issues.
|
|
|
| • |
After evaluating
each grid point we use all of them to
|
|
|
make predictions
on test data
|
|
|
|
– |
This
is also expensive, but it works much better than
|
|
|
ML
learning when the posterior is vague or
|
|
|
multimodal
(this happens when data is scarce).
|
|