the next generation of neural networks

Using backpropagation for fine-tuning

•

Greedily learning one layer at a time scales well to really

big networks, especially if we have locality in each layer.

•

We do not start backpropagation until we already have

sensible weights that already do well at the task.

–

So the initial gradients are sensible and

backpropagation only needs to perform a local search.

•

Most of the information in the final weights comes from

modeling the distribution of input vectors.

–

The precious information in the labels is only used for

the final fine-tuning. It slightly modifies the features. It

does not need to discover features.