 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
Imagine that the
training set was drawn at random from a
|
|
whole set of
training sets.
|
|
|
• |
The squared loss
can be decomposed into a “bias” term
|
|
|
and a “variance”
term.
|
|
|
|
– |
Bias
= systematic error in the model’s estimates
|
|
|
|
– |
Variance
= noise in the estimates cause by sampling
|
|
|
noise
in the training set.
|
|
|
• |
There is also an
additional loss due to the fact that the
|
|
|
target values are
noisy.
|
|
|
|
– |
We
eliminate this extra, irreducible loss from the math
|
|
|
by
using the average target values (i.e. the unknown,
|
|
|
noise-free
values)
|
|