lec7

Overfitting

•

The training data contains information about the

regularities in the mapping from input to output. But it

also contains noise

–

The target values may be unreliable.

–

There is sampling error. There will be accidental

regularities just because of the particular training

cases that were chosen.

•

When we fit the model, it cannot tell which regularities

are real and which are caused by sampling error.

–

So it fits both kinds of regularity.

–

If the model is very flexible it can model the sampling

error really well. This is a disaster.