












• 
What if we use an
approximation to the posterior



distribution over
hidden configurations?




– 
e.g.
assume the posterior factorizes into a product of



distributions
for each separate hidden cause.



• 
If we use the
approximation for learning, there is no



guarantee that
learning will increase the probability that



the model would
generate the observed data.



• 
But maybe we can
find a different and sensible objective


function that is
guaranteed to improve at each update.

