A even cheaper trick
Suppose we completely ignore the prior over
weight vectors
This is equivalent to giving all possible weight
vectors the same prior probability density.
Then all we have to do is to maximize:
This is called maximum likelihood learning. It is
very widely used for fitting models in statistics.