lec2b

Multilayer contrastive divergence

•

Start by learning one hidden layer.

•

Then re-present the data as the activities of the

hidden units.

–

The same learning algorithm can now be

applied to the re-presented data.

•

Can we prove that each step of this greedy

learning improves the log probability of the data

under the overall model?

–

What is the overall model?