lec2a

Multilayer contrastive divergence

•

Start by learning one hidden layer.

•

Then re-present the data as the activities of the

hidden units.

–

The same learning algorithm can now be

applied to the re-presented data.

•

Can we prove that each step of this greedy

learning improves a bound on the log probability

of the data under the overall model?

–

What is the overall model?