lecture 10: Support Vector Machines

A hybrid approach

•

If we use a neural net to define the features, maybe we

can use convex optimization for the final layer of weights

and then backpropagate derivatives to “learn the kernel”.

•

The convex optimization is quadratic in the number of

training cases. So this approach works best when most

of the data is unlabelled.

–

Unsupervised pre-training can then use the

unlabelled data to learn features that are appropriate

for the domain.

–

The final convex optimization can use these features

as well as possible and also provide derivatives that

allow them to be fine-tuned.

–

This seems better than just trying lots of kernels and

selecting the best ones (which is the current method).