Gibbs sampling
First fix a datavector from the training set on the visible
Then keep visiting hidden units and updating their binary
states using information from their parents and
If we do this in the right way, we will eventually get
unbiased samples from the posterior distribution for that
This is relatively efficient because almost all hidden
configurations will have negligible probability and will
probably not be visited.