What the wake phase achieves
The bottom-up recognition weights are used to compute
a sample from the distribution Q over hidden
configurations. Q approximates the true posterior, P.
In each layer Q assumes the states are independent
given the states in the layer below. It ignores
explaining away.
The changes to the generative weights are designed to
reduce the average cost (i.e. energy) of generating the
data when the hidden configurations are sampled from
the approximate posterior.
The updates to the generative weights follow the
gradient of the variational bound with respect to the
parameters of the model.