 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
If we have lots
of data and a big model, its very
|
|
|
expensive to keep
re-training it with different
|
|
|
amounts of weight
decay.
|
|
|
| • |
It is much
cheaper to start with very small
|
|
|
weights and let
them grow until the performance
|
|
|
on the validation
set starts getting worse (but
|
|
|
don’t
get fooled by noise!)
|
|
|
| • |
The capacity of
the model is limited because the
|
|
weights have not
had time to grow big.
|
|