Lecture 23

The flaws in the wake-sleep algorithm

•

The recognition weights are trained to invert the

generative model in parts of the space where there is no

data.

–

This is wasteful.

•

The recognition weights do not follow the gradient of the

log probability of the data. Nor do they follow the

gradient of a bound on this probability.

–

This leads to incorrect mode-averaging

•

The posterior over the top hidden layer is very far from

independent because the independent prior cannot

eliminate explaining away effects.