 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
Use greedy
unsupervised learning to find a sensible set of
|
|
weights one layer
at a time. Then fine-tune with
|
|
|
backpropagation.
|
|
|
| • |
Greedily
learning one layer at a time scales well to really
|
|
|
deep networks.
|
|
|
| • |
Most of the
information in the final weights comes from
|
|
|
modeling the
distribution of input vectors.
|
|
|
|
– |
The
precious information in the labels is only used for
|
|
|
the
final fine-tuning.
|
|
|
| • |
We do not start
backpropagation until we already have
|
|
|
sensible weights
that already do well at the task.
|
|
|
|
– |
So
the fine-tuning is well-behaved and quite fast.
|
|