An example where minimizing the squared
error gives terrible estimates
Suppose we have a network of 500
computers and they all have
slightly imperfect clocks.
After doing statistics 101 we decide
to improve the clocks by averaging
all the times to get a least squares
estimate
Then we broadcast the average
to all of the clocks.
Problem: The probability of being
wrong by ten hours is more than
one hundredth of the probability of
being wrong by one hour. In fact, its
about the same!
Text Box: negative log prob of error à
0
error