Why does stacking RBM’s produce this kind
of generative model?
It is not at all obvious that stacking RBM’s
produces a model in which the top two layers of
features form an RBM, but the layers beneath
that are not at all like a Boltzmann Machine.
To understand why this happens we need to ask
how an RBM defines a probability distribution
over visible vectors.