The Bayesian interpretation of weight decay
constant
assuming that the
model makes a
Gaussian prediction
assuming a
Gaussian prior for
the weights
So the correct value of the
weight decay parameter is
the ratio of two variances. Its
not just an arbitrary hack.