 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
Networks without
hidden units are very limited in the
|
|
|
input-output
mappings they can model.
|
|
|
|
– |
More
layers of linear units do not help. Its still linear.
|
|
|
|
– |
Fixed
output non-linearities are not enough
|
|
|
| • |
We need multiple
layers of adaptive non-linear hidden
|
|
|
units. This gives us a universal approximator.
But how
|
|
|
can we train such
nets?
|
|
|
|
– |
We
need an efficient way of adapting all the weights,
|
|
|
not
just the last layer. This is hard. Learning the
|
|
|
weights
going into hidden units is equivalent to
|
|
|
learning
features.
|
|
|
|
– |
Nobody
is telling us directly what hidden units should
|
|
do.
|
|