What is wrong with back-propagation?
It requires labeled training data.
Almost all data is unlabeled.
We need to fit about 10^14 connection weights in only
about 10^9 seconds.
Unless the weights are highly redundant, labels cannot
possibly provide enough information.
The learning time does not scale well
It is very slow in networks with more than two or three
hidden layers.
The neurons need to send two different types of signal
Forward pass: signal = activity = y
Backward pass: signal = dE/dy