 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
The recognition
weights are trained to invert the
|
|
|
generative model
in parts of the space where there is no
|
|
data.
|
|
|
|
– |
This
is wasteful.
|
|
|
• |
The recognition
weights follow the gradient of the wrong
|
|
|
divergence. They
minimize KL(P||Q) but the
variational
|
|
|
bound requires
minimization of KL(Q||P).
|
|
|
|
– |
This
leads to incorrect mode-averaging
|
|
|
• |
The posterior
over the top hidden layer is very far from
|
|
|
independent
because the independent prior cannot
|
|
|
eliminate
explaining away effects.
|
|