 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
|
|
The Bayesian interpretation of weight decay
|
|
|
|
|
|
|
|
|
 |
|
 |
|
|
 |
 |
assuming
that the
|
|
model makes a
|
|
Gaussian
prediction
|
|
|
 |
 |
assuming
a
|
|
Gaussian
prior for
|
the weights
|
|
|
|
|
|
 |
 |
So
the correct value of the
|
|
weight
decay parameter is
|
|
the
ratio of two variances. Its
|
not just an
arbitrary hack.
|
|
|
|
|