•What if we use an
approximation to the posterior distribution
over hidden configurations?
–e.g. assume
the posterior factorizes into a product of distributions for each separate hidden cause.
•
•If we use the
approximation for learning, there is no guarantee that learning will increase the probability
that the model would
generate the observed data.
•
•But maybe we can find
a different and sensible objective function
that is guaranteed to improve at each update.