Another, more general way to fix up the
error surface
We can leave the error surface alone, but apply
the correction to the gradient vector.
We multiply the vector of gradients by the
inverse of the curvature matrix. This produces a
direction that points straight at the minimum for a
quadratic surface.
The curvature matrix has too many terms to be
of use in a big network. Maybe we can get some
benefit from just using the terms along the
leading diagonal (Le Cun).