 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
|
Suppose that we
pick n datapoints and assign labels of +
|
|
or to them at
random. If our model class (e.g. a neural
|
|
|
net with a
certain number of hidden units) is powerful
|
|
|
enough to learn any
association of labels with the data,
|
|
|
its too powerful!
|
|
|
|
Maybe we can
characterize the power of a model class
|
|
|
by asking how
many datapoints it can shatter i.e. learn
|
|
|
perfectly for
all possible assignments of labels.
|
|
|
|
|
This
number of datapoints is called the Vapnik-
|
|
|
Chervonenkis
dimension.
|
|
|
|
|
The
model does not need to shatter all sets of
|
|
|
datapoints
of size h. One set is sufficient.
|
|
|
|
|
For
planes in 3-D, h=4 even though 4 co-planar points cannot
|
|
|
be
shattered.
|
|