Lecture 2

Beating the bias-variance trade-off

•

We can reduce the variance term by averaging lots of

models trained on different datasets.

–

This seems silly. If we had lots of different datasets it

would be better to combine them into one big training

set.

•

With more training data there will be much less variance.

•

Weird idea: We can create different datasets by bootstrap

sampling of our single training dataset.

–

This is called “bagging” and it works surprisingly well.

•

But if we have enough computation its better to do the

right Bayesian thing:

–

Combine the predictions of many models using the

posterior probability of each parameter vector as the

combination weight.