When is minimizing the squared error equivalent to
Maximum Likelihood Learning?
Minimizing the squared
residuals is equivalent to
maximizing the log probability
of the correct answer under a
Gaussian centered at the
model’s guess.
t = the
correct
answer
y = model’s
estimate of most
probable value
can be ignored if
sigma is same
for every case
can be ignored
if sigma is fixed