Learning Energy-Based Models of High-Dimensional Data

Approximate inference

•What if we use an approximation to the posterior distribution over hidden configurations?

–e.g. assume the posterior factorizes into a product of distributions for each separate hidden cause.

•

•If we use the approximation for learning, there is no guarantee that learning will increase the probability that the model would generate the observed data.

•

•But maybe we can find a different and sensible objective function that is guaranteed to improve at each update.