 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| |
Instead of taking
the negative samples from the equilibrium
|
|
|
distribution,
use slight corruptions of the datavectors. Only add
random
|
|
momentum once,
and only follow the dynamics for a few steps.
|
|
|
|
|
Much
less variance because a datavector and its confabulation
|
|
|
form
a matched pair.
|
|
|
|
|
Gives
a very biased estimate of the gradient of the log likelihood.
|
|
|
|
|
Gives
a good estimate of the gradient of the contrastive divergence
|
|
|
(i.e.
the amount by which F falls during the brief HMC.)
|
|
|
| |
Its very hard to
say anything about what this method does to the log
|
|
|
likelihood
because it only looks at rivals in the vicinity of the data.
|
|
|
| |
Its hard to say
exactly what this method does to the contrastive
|
|
|
divergence
because the Markov chain defines what we mean by
|
|
|
vicinity, and
the chain keeps changing as the parameters change.
|
|
|
|
|
But
its works well empirically, and it can be proved to work well in
|
|
|
some
very simple cases.
|
|