 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
|
|
 |
 |
When is minimizing the squared error equivalent to
|
|
Maximum Likelihood Learning?
|
|
|
|
|
|
 |
|
|
 |
 |
 |
 |
 |
 |
 |
• |
Minimizing the
squared
|
|
|
residuals is
equivalent to
|
|
|
maximizing the
log probability
|
|
of the correct
answer under a
|
|
|
Gaussian centered
at the
|
|
|
model’s guess.
|
|
|
|
|
|
|
|
|
 |
 |
t = the
|
|
correct
|
|
answer
|
|
|
 |
 |
y = model’s
|
|
estimate
of most
|
|
probable
value
|
|
|
|
|
|
|
|
|
 |
 |
can
be ignored if
|
sigma
is same
|
|
for every
case
|
|
|
|
|
 |
|
|
|
 |
can be
ignored
|
|
if
sigma is fixed
|
|
|
|
|
|
|
|
|
|
|