lecture 24

Some problems with backpropagation

•

The amount of information that each training case

provides about the weights is at most the log of the

number of possible output labels.

–

So to train a big net we need lots of labeled data.

•

In nets with many layers of weights the backpropagated

derivatives either grow or shrink multiplicatively at each

layer.

–

Learning is tricky either way.

•

Dumb gradient descent is not a good way to perform a

global search for a good region of a very large, very non-

linear space.

–

So deep nets trained by backpropagation are rare in

practice.