Lecture 1b

The shortcut

•

Instead of taking the negative samples from the equilibrium

distribution, use slight corruptions of the datavectors. Only add random

momentum once, and only follow the dynamics for a few steps.

–

Much less variance because a datavector and its confabulation

form a matched pair.

–

Gives a very biased estimate of the gradient of the log likelihood.

–

Gives a good estimate of the gradient of the contrastive divergence

(i.e. the amount by which F falls during the brief HMC.)

•

Its very hard to say anything about what this method does to the log

likelihood because it only looks at rivals in the vicinity of the data.

•

Its hard to say exactly what this method does to the contrastive

divergence because the Markov chain defines what we mean by

“vicinity”, and the chain keeps changing as the parameters change.

–

But its works well empirically, and it can be proved to work well in

some very simple cases.