Preprocessing the input vectors
Instead of trying to predict the answer directly from the
raw inputs we could start by extracting  a layer of
“features”.
Sensible if we already know that certain combinations
of input values would be useful (e.g. edges or corners
in an image).
Instead of learning the features we could design them by
hand.
The hand-coded features are equivalent to a layer of
non-linear neurons that do not need to be learned.
If we use a very big set of features for a two-class
problem, the classes will almost certainly be linearly
separable.
But surely the linear separator will give poor generalization.