A clever trick
Instead of starting with a random hidden configuration,
use the last hidden configuration for that training
datavector before the weights were updated.
If the weight updates are small enough, the hidden
configurations will start very close to the equilibrium
distribution for each training datavector and the Gibbs
sampling will make them even closer.
So we might as well update the weights after one
round of Gibbs updating for each training datavector
This method is even cleverer than it appears.
We will see in the next lecture that it works even if the
hidden configurations are not close to equilibrium.