 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
The amount of
information that each training case
|
|
|
provides about
the weights is at most the log of the
|
|
|
number of
possible output labels.
|
|
|
|
– |
So
to train a big net we need lots of labeled data.
|
|
|
| • |
In nets with
many layers of weights the backpropagated
|
|
|
derivatives
either grow or shrink multiplicatively at each
|
|
|
layer.
|
|
|
|
– |
Learning
is tricky either way.
|
|
|
| • |
Dumb gradient
descent is not a good way to perform a
|
|
|
global search
for a good region of a very large, very non-
|
|
linear space.
|
|
|
|
– |
So
deep nets trained by backpropagation are rare in
|
|
|
practice.
|
|