 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
Instead of
starting with a random hidden configuration,
|
|
|
use the last
hidden configuration for that training
|
|
|
datavector before
the weights were updated.
|
|
|
|
– |
If
the weight updates are small enough, the hidden
|
|
|
configurations
will start very close to the equilibrium
|
|
|
distribution
for each training datavector and the Gibbs
|
|
sampling
will make them even closer.
|
|
|
|
– |
So
we might as well update the weights after one
|
|
|
round
of Gibbs updating for each training datavector
|
|
|
| • |
This method is
even cleverer than it appears.
|
|
|
|
– |
We
will see in the next lecture that it works even if the
|
|
hidden
configurations are not close to equilibrium.
|
|