lecture 24

A solution to all of these problems

•

Use greedy unsupervised learning to find a sensible set of

weights one layer at a time. Then fine-tune with

backpropagation.

•

Greedily learning one layer at a time scales well to really

deep networks.

•

Most of the information in the final weights comes from

modeling the distribution of input vectors.

–

The precious information in the labels is only used for

the final fine-tuning.

•

We do not start backpropagation until we already have

sensible weights that already do well at the task.

–

So the fine-tuning is well-behaved and quite fast.