A surprising shortcut
Instead of taking the negative samples from the
equilibrium distribution, use slight corruptions of
the datavectors. Only add random momentum
once, and only follow the dynamics for a few steps.
Much less variance because a datavector and
its confabulation form a matched pair.
Seems to be very biased, but maybe it is
optimizing a different objective function.
If the model is perfect and there is an infinite
amount of data, the confabulations will be
equilibrium samples. So the shortcut will not cause
learning to mess up a perfect model.