Learning Energy-Based Models of High-Dimensional Data

The recipe for Gibbs sampling

•

Imagine a huge ensemble of networks.

–

The networks have identical parameters.

–

They have the same clamped datavector.

–

The fraction of the ensemble with each possible hidden

configuration defines a distribution over hidden

configurations.

•

Each time we pick the state of a hidden unit from its

posterior distribution given the states of the other units, the

distribution represented by the ensemble gets closer to the

equilibrium distribution.

–

A quantity called the “free energy” always decreases

(see next lecture)

–

Eventually, we reach the stationary distribution in which

the number of networks that change from configuration a

to configuration b is exactly the same as the number that

change from b to a: