The recipe for Gibbs sampling
Imagine a huge ensemble of networks.
The networks have identical parameters.
They have the same clamped datavector.
The fraction of the ensemble with each possible hidden
configuration defines a distribution over hidden
configurations.
Each time we pick the state of a hidden unit from its
posterior distribution given the states of the other units, the
distribution represented by the ensemble gets closer to the
equilibrium distribution.
A quantity called the “free energy” always decreases
(see next lecture)
Eventually, we reach the stationary distribution in which
the number of networks that change from configuration a
to configuration b is exactly the same as the number that
change from b to a: