 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
Instead of
trying to predict the answer directly from the
|
|
|
raw inputs we
could start by extracting a layer of
|
|
|
“features”.
|
|
|
– |
Sensible
if we already know that certain combinations
|
|
|
of
input values would be useful (e.g. edges or corners
|
|
|
in
an image).
|
|
• |
Instead of
learning the features we could design them by
|
|
hand.
|
|
|
– |
The
hand-coded features are equivalent to a layer of
|
|
|
non-linear
neurons that do not need to be learned.
|
|
|
– |
If
we use a very big set of features for a two-class
|
|
|
problem,
the classes will almost certainly be linearly
|
|
|
separable.
|
|
|
• |
But
surely the linear separator will give poor generalization.
|
|