How the learning goes wrong
If the learning rate is big,
it sloshes to and fro
across the ravine. If the
rate is too big, this
oscillation diverges.
How can we move quickly
in directions with small
gradients without getting
divergent oscillations in
directions with big
gradients?
E
w