 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| |
We dont know
what the hidden units ought to do, but we
|
|
|
can compute how
fast the error changes as we change a
|
|
|
hidden activity.
|
|
|
|
Instead of using desired activities to
train the hidden
|
|
|
units,
use error derivatives w.r.t.
hidden activities.
|
|
|
|
Each
hidden activity can affect many output units and
|
|
|
can
therefore have many separate effects on the error.
|
|
|
These
effects must be combined.
|
|
|
|
We
can compute error derivatives for all
the hidden units
|
|
efficiently.
|
|
|
|
Once
we have the error derivatives for the hidden
|
|
|
activities,
its easy to get the error derivatives for the
|
|
|
weights
going into a hidden unit.
|
|