The bias-variance trade-off
(a figment of the frequentists lack of imagination?)
Imagine that the training set was drawn at random from a
whole set of training sets.
The squared loss can be decomposed into a “bias” term
and a “variance” term.
Bias = systematic error in the model’s estimates
Variance = noise in the estimates cause by sampling
noise in the training set.
There is also an additional loss due to the fact that the
target values are noisy.
We eliminate this extra, irreducible loss from the math
by using the average target values (i.e. the unknown,
noise-free values)