The idea behind backpropagation
• We don’t know what the hidden units ought to do, but we
can compute how fast the error changes as we change a
hidden activity.
–  Instead of using desired activities to train the hidden
units, use error derivatives w.r.t. hidden activities.
– Each hidden activity can affect many output units and
can therefore have many separate effects on the error.
These effects must be combined.
– We can compute error derivatives for all the hidden units
efficiently.
– Once we have the error derivatives for the hidden
activities, its easy to get the error derivatives for the
weights going into a hidden unit.