 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
Instead of trying
to find the best single setting of the
|
|
|
parameters (as
in ML or MAP) compute the full posterior
|
|
|
distribution over
parameter settings
|
|
|
|
– |
This
is extremely computationally intensive for all but
|
|
|
the
simplest models (its
feasible for a biased coin).
|
|
|
| • |
To make
predictions, let each different setting of the
|
|
|
parameters make
its own prediction and then combine
|
|
|
all these
predictions by weighting each of them by the
|
|
|
posterior
probability of that setting of the parameters.
|
|
|
|
– |
This
is also computationally intensive.
|
|
|
| • |
The full
Bayesian approach allows us to use complicated
|
|
models even when
we do not have much data
|
|