•It
is silly to run the Markov chain all the way to equilibrium if we can
get the information required for learning in just a few steps.
–The
way in which the model systematically distorts the data distribution
in the first few steps tells us a lot about how the model is wrong.
–But the model could have
strong modes far from any data.
These modes will not be sampled
by confabulations. Is this a problem in
practice?