Some problems with backpropagation
The amount of information that each training case
provides about the weights is at most the log of the
number of possible output labels.
So to train a big net we need lots of labeled data.
In nets with many layers of weights the backpropagated
derivatives either grow or shrink multiplicatively at each
layer.
Learning is tricky either way.
Dumb gradient descent is not a good way to perform a
global search for a good region of a very large, very non-
linear space.
So deep nets trained by backpropagation are rare in
practice.