|
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
If the learning
rate is big,
|
|
|
it sloshes to and
fro
|
|
|
across the
ravine. If the
|
|
|
rate is too big,
this
|
|
|
oscillation
diverges.
|
|
|
| • |
How can we move
quickly
|
|
in directions
with small
|
|
|
gradients
without getting
|
|
|
divergent
oscillations in
|
|
|
directions with
big
|
|
|
gradients?
|
|
|
|
|