Learning Energy-Based Models of High-Dimensional Data

A clever trick

•

Instead of starting with a random hidden configuration,

use the last hidden configuration for that training

datavector before the weights were updated.

–

If the weight updates are small enough, the hidden

configurations will start very close to the equilibrium

distribution for each training datavector and the Gibbs

sampling will make them even closer.

–

So we might as well update the weights after one

round of Gibbs updating for each training datavector

•

This method is even cleverer than it appears.

–

We will see in the next lecture that it works even if the

hidden configurations are not close to equilibrium.