A problem with the regularizer
We would like the solution we find to be independent of the units we
use to measure the components of the input vector.
If different components have different units (e.g. age and height), we
have a problem.
If we measure age in months and height in meters, the relative
values of the two weights are very different than if we use years
and millemeters. So the squared penalty has very different
effects.
One way to avoid the units problem: Whiten the data so that the
input components all have unit variance and no covariance. This
stops the regularizer from being applied to the whitening matrix.
But this can cause other problems when two input components
are almost perfectly correlated.
We really need a prior on the weight on each input component.