













• 
Suppose we use a
big set of features



to ensure that
the two classes are



linearly
separable. What is the best



separating line
to use?



• 
The Bayesian
answer is to use them



all (including ones that do not quite



separate
the data.)



• 
Weight each line
by its posterior



probability (i.e. by a combination of



how
well it fits the data and how well it


fits
the prior).



• 
Is there an
efficient way to



approximate the
correct Bayesian



answer?

