THE MLP-BGD-2 METHOD

   Regression and classification with multilayer perceptron networks 
            trained using batch gradient descent with adaptive, 
                with dynamic input-dependent learning rates
              

                      Radford M. Neal, 23 July 1997


This method is the same as mlp-bgd-1, except that the learning rates
(stepsizes) for weights on connections out of input units are adjusted
dynamically in an attempt to make weights from the more relevant
inputs learn faster, thereby improving the performance of early
stopping.  

In detail, the adjustment works as follows.  The magnitude of the
gradient vector restricted to the weights on connections out of each
input unit is computed.  The fourth power of this magnitude is then
found, and these fourth powers are divided by the largest of them,
producing a scaling factor between zero and one for each input.  The
stepsizes that would normally be used for weights out of an input are
multiplied by the corresponding scaling factor.  Other weights and
biases have the same stepsizes as in mlp-bgd-1.  

All other aspects of the procedure are also the same as in mlp-bgd-1.
The shell files used to implement the method work the same way as for
mlp-bgd-1.