Learning multilayer networks
We want to learn models with multiple layers of non-
linear features.
Perceptrons: Use a layer of hand-coded, non-adaptive
features followed by a layer of adaptive decision units.
Needs supervision signal for each training case.
Only one layer of adaptive weights.
Back-propagation: Use multiple layers of  adaptive
features and train by backpropagating error derivatives
Needs supervision signal for each training case.
Learning time scales poorly for deep networks.
Support Vector Machines: Use a very large set of fixed
features
Needs supervision signal for each training case.
Does not learn multiple layers of features