The up-down algorithm:
A contrastive divergence version of wake-sleep
Replace the top layer of the DAG by an RBM
This eliminates bad variational approximations caused
by top-level units that are independent in the prior.
It is nice to have an associative memory at the top.
Replace the ancestral pass in the sleep phase by a top-
down pass starting with the state of the RBM produced by
the wake phase.
This makes sure the recognition weights are trained in
the vicinity of the data.
It also reduces mode averaging. If the recognition
weights prefer one mode, they will stick with that mode
even if the generative weights like some other mode
just as much.