 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
What if we use an
approximation to the posterior
|
|
|
distribution over
hidden configurations?
|
|
|
|
– |
e.g.
assume the posterior factorizes into a product of
|
|
|
distributions
for each separate hidden cause.
|
|
|
• |
If we use the
approximation for learning, there is no
|
|
|
guarantee that
learning will increase the probability that
|
|
|
the model would
generate the observed data.
|
|
|
• |
But maybe we can
find a different and sensible objective
|
|
function that is
guaranteed to improve at each update.
|
|