lec2b

The flaws in the wake-sleep algorithm

•

The recognition weights are trained to invert the

generative model in parts of the space where there is no

data.

–

This is wasteful.

•

The recognition weights follow the gradient of the wrong

divergence. They minimize KL(P||Q) but the variational

bound requires minimization of KL(Q||P).

–

This leads to incorrect mode-averaging

•

The posterior over the top hidden layer is very far from

independent because the independent prior cannot

eliminate explaining away effects.