IRLS
For a linear model with a squared error, the
optimal weights are given by
This can be derived as a single update on the
initial weight vector in which the gradient vector
is pre-multiplied by the inverse of the curvature
of the error surface to decide the direction and
magnitude of the weight update: