Beating the bias-variance trade-off
• We can reduce the variance term by averaging lots of
models trained on different datasets.
– This seems silly. If we had lots of different datasets it
would be better to combine them into one big training
set.
•  With more training data there will be much less variance.
• Weird idea: We can create different datasets by bootstrap
sampling of our single training dataset.
– This is called “bagging” and it works surprisingly well.
• But if we have enough computation its better to do the
right Bayesian thing:
– Combine the predictions of many models using the
posterior probability of each parameter vector as the
combination weight.