 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
The recognition
weights are trained to invert the
|
|
|
generative model
in parts of the space where there is no
|
|
data.
|
|
|
|
– |
This
is wasteful.
|
|
|
| • |
The recognition
weights do not follow the gradient of the
|
|
|
log probability
of the data. Nor do they follow the
|
|
|
gradient of a
bound on this probability.
|
|
|
|
– |
This
leads to incorrect mode-averaging
|
|
|
| • |
The posterior
over the top hidden layer is very far from
|
|
|
independent
because the independent prior cannot
|
|
|
eliminate
explaining away effects.
|
|