Learning Energy-Based Models of High-Dimensional Data

Faster mixing chains

•Hybrid Monte Carlo can only take small steps because the energy surface is curved.

•With a single layer of hidden units, it is possible to use alternating parallel Gibbs sampling.

–Much less computation

–Much faster mixing

–Can be extended to use pooled second layer (Max Welling)

–Can only be used in deep networks by learning one hidden layer at a time.