lecture 10: Support Vector Machines

•

Suppose we use a big set of features

to ensure that the two classes are

linearly separable. What is the best

separating line to use?

•

The Bayesian answer is to use them

all (including ones that do not quite

separate the data.)

•

Weight each line by its posterior

probability (i.e. by a combination of

how well it fits the data and how well it

fits the prior).

•

Is there an efficient way to

approximate the correct Bayesian

answer?