A weird measure of model complexity
• Suppose that we pick n datapoints and assign labels of +
or – to them at random. If our model class (e.g. a neural
net with a certain number of hidden units) is powerful
enough to learn any association of labels with the data,
its too powerful!
• Maybe we can characterize the power of a model class
by asking how many datapoints it can “shatter” i.e. learn
perfectly for all possible assignments of labels.
– This number of datapoints is called the Vapnik-
Chervonenkis dimension.
– The model does not need to shatter all sets of
datapoints of size h. One set is sufficient.
• For planes in 3-D, h=4 even though 4 co-planar points cannot
be shattered.