Learning Energy-Based Models of High-Dimensional Data

Approximate inference

•

What if we use an approximation to the posterior

distribution over hidden configurations?

–

e.g. assume the posterior factorizes into a product of

distributions for each separate hidden cause.

•

If we use the approximation for learning, there is no

guarantee that learning will increase the probability that

the model would generate the observed data.

•

But maybe we can find a different and sensible objective

function that is guaranteed to improve at each update.