lec9


Approximating full Bayesian learning in a

	neural network

•

If the neural net only has a few parameters we could put

a grid over the parameter space and evaluate p( W | D )

at each grid-point.

–

This is expensive, but it does not involve any gradient

descent and there are no local optimum issues.

•

After evaluating each grid point we use all of them to

make predictions on test data

–

This is also expensive, but it works much better than

ML learning when the posterior is vague or

multimodal (this happens when data is scarce).