Learning Energy-Based Models of High-Dimensional Data

Intuitive motivation

•It is silly to run the Markov chain all the way to equilibrium if we can get the information required for learning in just a few steps.

–The way in which the model systematically distorts the data distribution in the first few steps tells us a lot about how the model is wrong.

–But the model could have strong modes far from any data. These modes will not be sampled by confabulations. Is this a problem in practice?