 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
The bottom-up
recognition weights are used to compute
|
|
a sample from the
distribution Q over hidden
|
|
|
configurations.
Q approximates the true posterior, P.
|
|
|
– |
In
each layer Q assumes the states are independent
|
|
|
given
the states in the layer below. It ignores
|
|
|
explaining
away.
|
|
• |
The changes to
the generative weights are designed to
|
|
|
reduce the
average cost (i.e. energy) of generating the
|
|
|
data when the
hidden configurations are sampled
from
|
|
|
the approximate
posterior.
|
|
|
– |
The
updates to the generative weights follow the
|
|
|
gradient
of the variational bound with respect to the
|
|
|
parameters
of the model.
|
|