Convergence speed
•
The direction of steepest
descent does not point at
the minimum unless the
ellipse is a circle.
–
The gradient is big in
the direction in which
we only want to travel
a small distance.
–
The gradient is small in
the direction in which we
want to travel a large
distance.
This equation is sick. The
RHS needs to be multiplied
by a term of dimension w^2
to make the dimensions
balance.