Preventing overfitting by early stopping
If we have lots of data and a big model, its very
expensive to keep re-training it with different
amounts of weight decay.
It is much cheaper to start with very small
weights and let them grow until the performance
on the validation set starts getting worse (but
don’t get fooled by noise!)
The capacity of the model is limited because the
weights have not had time to grow big.