CSC 2515 2008

Lecture 10
Support Vector Machines

Getting good generalization on big datasets

Preprocessing the input vectors

Is preprocessing cheating?

A hierarchy of model classes

A way to choose a model class

A weird measure of model complexity

An example of VC dimension

Some examples of VC dimension

The probabilistic guarantee

Preventing overfitting when using big sets of features

Support Vector Machines

Training a linear SVM

Testing a linear SVM

A Bayesian Interpretation

What to do if there is no separating plane

Introducing slack variables

A picture of the best plane with a slack variable

The story so far

Why do large margin separators have lower VC dimension?

How to make a plane curved

A potential problem and a magic solution

What the kernel trick achieves

The kernel trick

Dealing with the test data

Dealing with the test data

The classification rule

Some commonly used kernels

Performance

Support Vector Machines are Perceptrons!

A problem that cannot be solved using a kernel that computes the similarity of a test image to a training case

A hybrid approach

Learning to extract the orientation of a face patch (Ruslan Salakhutdinov)

The training and test sets

The root mean squared error in the orientation when combining GPs with deep belief nets