A cheap trick to avoid computing the
posterior probabilities of all weight vectors
Suppose we just try to find the most probable
weight vector.
We can do this by starting with a random
weight vector and then adjusting it in the
direction that improves  p( W | D ).
It is easier to work in the log domain. If we want
to minimize a cost we use negative log
probabilities: