The probability distribution that is implicitly
assumed when using squared error
Minimizing the squared
residuals is equivalent to
maximizing the log probability
of the correct answers under a
Gaussian centered at the
model’s guess.
If we assume that the
variance of the Gaussian is
the same for all cases, its
value does not matter.
      d
correct
answer
    y
model’s
prediction