Deciding how much to restrict the capacity
How do we decide which limit to use and how
strong to make the limit?
If we use the test data we get an unfair
prediction of the error rate we would get on
new test data.
Suppose we compared a set of models that
gave random results, the best one on a
particular dataset would do better than
chance.  But it wont do better than chance on
another test set.
So use a separate validation set to do model
selection.