Full Bayesian Learning
Instead of trying to find the best single setting of the
parameters (as in ML or MAP) compute the full posterior
distribution over parameter settings
This is extremely computationally intensive for all but
the simplest models (its feasible for a biased coin).
To make predictions, let each different setting of the
parameters make its own prediction and then combine
all these predictions by weighting each of them by the
posterior probability of that setting of the parameters.
This is also computationally intensive.
The full Bayesian approach allows us to use complicated
models even when we do not have much data