Why do large margin separators have lower VC
dimension?
Consider a random set of N points that
all fit inside a unit hypercube.
If the number of dimensions is bigger
than N-2, it is easy to find a separating
plane for any labeling of the points.
So the fact that there is a separating
plane doesn’t tell us much. It like
putting a straight line through 2 data
points.
But there is unlikely to be a separating
plane with a margin that is big
If we find such a plane its unlikely to
be a coincidence. So it will probably
apply to the test data too.