 |
 |
 |
 |
 |
 |
 |
 |
| • |
For a linear
model with a squared error, the
|
|
|
optimal weights
are given by
|
|
|
| • |
This can be
derived as a single update on the
|
|
|
initial weight
vector in which the gradient vector
|
|
is pre-multiplied
by the inverse of the curvature
|
|
|
of the error
surface to decide the direction and
|
|
|
magnitude of the
weight update:
|
|