Training a linear SVM
To find the maximum margin separator, we have to solve
the following optimization problem:
 This is tricky but it’s a convex problem. There is only one
optimum and we can find it without fiddling with learning
rates or weight decay or early stopping.
Don’t worry about the optimization problem. It has been
solved. Its called quadratic programming.
It takes time proportional to N^2 which is really bad for
very big datasets
so for big datasets we end up doing approximate optimization!