Every regularizer limits capacity, but they all have their own unique effects on how the network develops. Some lead to lots of small weights, some lead to lots of small and large weights but avoid medium size weights, some have preferences about the size of the inputs to the units, etc. Let's talk about what happens to a multilayer neural network with logistic hidden units, in the presence of the various regularization methods. What is the effect of L2 weight decay on the network? What if the weight decay is excessively strong? What is the effect of L1 weight decay? What if the weight decay is excessively strong? What is the effect of adding noise to the inputs of the units, i.e. before applying the logistic nonlinearity? Hint: how can the network make sure that it suffers as little as possible from that noise? What is the effect of adding noise after applying the logistic nonlinearity? Same hint: how can the network make sure that it suffers as little as possible from that noise?