 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
|
We can reduce
the variance term by averaging lots of
|
|
|
models trained
on different datasets.
|
|
|
|
|
This
seems silly. If we had lots of different datasets it
|
|
|
would
be better to combine them into one big training
|
|
|
set.
|
|
|
|
|
With more training data there will be much
less variance.
|
|
|
|
Weird
idea: We can create different
datasets by bootstrap
|
|
sampling of our
single training dataset.
|
|
|
|
|
This
is called bagging and it works surprisingly well.
|
|
|
|
But if we have
enough computation its better to do the
|
|
|
right Bayesian
thing:
|
|
|
|
|
Combine
the predictions of many models using the
|
|
|
posterior
probability of each parameter vector as the
|
|
|
combination
weight.
|
|