 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
Suppose we use a
big set of features
|
|
|
to ensure that
the two classes are
|
|
|
linearly
separable. What is the best
|
|
|
separating line
to use?
|
|
|
• |
The Bayesian
answer is to use them
|
|
|
all (including ones that do not quite
|
|
|
separate
the data.)
|
|
|
• |
Weight each line
by its posterior
|
|
|
probability (i.e. by a combination of
|
|
|
how
well it fits the data and how well it
|
|
fits
the prior).
|
|
|
• |
Is there an
efficient way to
|
|
|
approximate the
correct Bayesian
|
|
|
answer?
|
|