Adaptive learning rates on each connection
Use a global learning rate
multiplied by a local gain
on each connection.
Increase the local gains if
the gradient does not
change sign.
Use additive increases and
multiplicative decreases.
This ensures that big
learning rates decay
rapidly when
oscillations start.