Learning Energy-Based Models of High-Dimensional Data

Intuitive motivation

•

It is silly to run the Markov chain all the way to

equilibrium if we can get the information required

for learning in just a few steps.

–

The way in which the model systematically

distorts the data distribution in the first few

steps tells us a lot about how the model is

wrong.

–

But the model could have strong modes far

from any data. These modes will not be

sampled by confabulations. Is this a problem

in practice?