Intuitive motivation
It is silly to run the Markov chain all the way to
equilibrium if we can get the information required
for learning in just a few steps.
The way in which the model systematically
distorts the data distribution in the first few
steps tells us a lot about how the model is
wrong.
But the model could have strong modes far
from any data. These modes will not be
sampled by confabulations. Is this a problem
in practice?