Approximate inference
What if we use an approximation to the posterior
distribution over hidden configurations?
e.g. assume the posterior factorizes into a product of
distributions for each separate hidden cause.
If we use the approximation for learning, there is no
guarantee that learning will increase the probability that
the model would generate the observed data.
But maybe we can find a different and sensible objective
function that is guaranteed to improve at each update.