Learning Energy-Based Models of High-Dimensional Data

A surprising shortcut

•Instead of taking the negative samples from the equilibrium distribution, use slight corruptions of the datavectors. Only add random momentum once, and only follow the dynamics for a few steps.

–Much less variance because a datavector and its confabulation form a matched pair.

–Seems to be very biased, but maybe it is optimizing a different objective function.

•If the model is perfect and there is an infinite amount of data, the confabulations will be equilibrium samples. So the shortcut will not cause learning to mess up a perfect model.