Maximum A Posteriori Learning
This trades-off the prior probabilities of the parameters
against the probability of the data given the parameters.
It looks for the parameters that have the greatest product
of the prior term and the likelihood term.
Minimizing the squared weights is equivalent to
maximizing the log probability of the weights under a
zero-mean Gaussian prior.
p(w)
w
0