Approximate inference
•What if we use an approximation to the posterior distribution over hidden configurations?
–e.g. assume the posterior factorizes into a product of distributions for each separate hidden cause.
•
•If we use the approximation for learning, there is no guarantee that learning will increase the probability that the model would generate the observed data.
•
•But maybe we can find a different and sensible objective function that is guaranteed to improve at each update.