Limiting the size of the weights
Weight-decay involves
adding an extra term to the
cost function that penalizes
the squared weights.
Keeps weights small
unless they have big
error derivatives.
C
w