 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| |
The obvious
method is to start with a random hidden
|
|
|
configuration for
each datavector and to do Gibbs
|
|
|
sampling until we
have reached equilibrium.
|
|
|
| |
Then use the
equilibrium samples from the posterior
|
|
|
distribution
over hidden configurations to update the
|
|
|
weights (online or batch or mini-batch)
|
|
|
| |
But how do we
decide how much Gibbs sampling is
|
|
|
required to reach
equilibrium?
|
|
|
|
|
There
is no simple test and if we dont do enough
|
|
|
there
is no guarantee that the learning will work, even
|
|
if
we use an infinitesimal learning rate.
|
|