Lecture 23


Another explanation of the contrastive

	divergence learning procedure

•

Think of an RBM as an infinite sigmoid belief net with

tied weights.

•

If we start at the data, alternating Gibbs sampling

computes samples from the posterior distribution in each

hidden layer of the infinite net.

•

In deeper layers the derivatives w.r.t. the weights are

very small.

–

Contrastive divergence just ignores these small

derivatives in the deeper layers of the infinite net.

–

Its silly to compute the derivatives exactly when you

know the weights are going to change a lot.