 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
Finding a set of
weights, W, that minimizes the
|
|
|
squared errors
is exactly the same as finding a W
|
|
|
that maximizes
the log probability that the model
|
|
|
would produce the
desired outputs on all the
|
|
|
training cases.
|
|
|
|
– |
We
implicitly assume that zero-mean Gaussian
|
|
noise
is added to the model’s actual output.
|
|
|
|
– |
We
do not need to know the variance of the
|
|
|
noise
because we are assuming it’s the same
|
|
|
in
all cases. So it just scales the squared error.
|
|