 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
The recognition
weights are trained to invert the
|
|
generative model
in parts of the space where
|
|
|
there is no data.
|
|
|
|
– |
This
is wasteful.
|
|
|
• |
The recognition
weights follow the gradient of
|
|
|
the wrong
divergence. They minimize KL(P||Q)
|
|
|
|
but the
variational bound requires minimization
|
|
|
of KL(Q||P).
|
|
|
|
– |
This
leads to incorrect mode-averaging.
|
|