Faster mixing chains
Hybrid Monte Carlo can only take small steps
because the energy surface is curved.
With a single layer of hidden units, it is possible
to use alternating parallel Gibbs sampling.
Much less computation
Much faster mixing
Can be extended to use pooled second layer
(Max Welling)
Can only be used in deep networks by
learning one hidden layer at a time.