Gibbs sampling
First fix a datavector from the training set on the visible
units.
Then keep visiting hidden units and updating their binary
states using information from their parents and
descendants.
If we do this in the right way, we will eventually get
unbiased samples from the posterior distribution for that
datavector.
This is relatively efficient because almost all hidden
configurations will have negligible probability and will
probably not be visited.