Weight-decay via noisy inputs
Weight-decay reduces the effect
of noise in the inputs.
The noise variance is
amplified by the squared
weight
The amplified noise makes an
additive contribution to the
squared error.
So minimizing the squared
error tends to minimize the
squared weights when the
inputs are noisy.
It gets more complicated for
non-linear networks.
j
i