




















Suppose that we
pick n datapoints and assign labels of +


or to them at
random. If our model class (e.g. a neural



net with a
certain number of hidden units) is powerful



enough to learn any
association of labels with the data,



its too powerful!




Maybe we can
characterize the power of a model class



by asking how
many datapoints it can shatter i.e. learn



perfectly for
all possible assignments of labels.





This
number of datapoints is called the Vapnik



Chervonenkis
dimension.





The
model does not need to shatter all sets of



datapoints
of size h. One set is sufficient.





For
planes in 3D, h=4 even though 4 coplanar points cannot



be
shattered.

