THE MLP-BGD-3 METHOD Regression with multilayer perceptron networks trained using batch gradient descent with adaptive, with static input-dependent learning rates Radford M. Neal, 23 July 1997 This method is the same as mlp-bgd-1, except that the learning rates (stepsizes) for weights on connections out of input units are chosen based on correlations with the target, in an attempt to make weights from the more relevant inputs learn faster, thereby improving the performance of early stopping. This method differs from mlp-bgd-2 in that the learning rates are chosen once and for all at the beginning, not adapted dynamically during learning. There must be a single real-valued target for this method. The effect learning rates (stepsizes) are actually set indirectly, by scaling the inputs. Scaling an input by a factor of f has the same effect as scaling the stepsize for that input by a factor of f^2, since the gradient is multiplied by f, and the size of weight needed to get the same effect is multiplied by 1/f. The scaling for an input, xi, is computed from the average value over the entire training set of xi*t, where t is the target value. The method is used with the standard DELVE encodings, which normalize the inputs and targets, based on the median and average absolute deviation. Since this will usually be close to normalizing based on mean and standard deviation, the average value of xi*t will be close to the correlation of the input with the target. Once ci, the average value of xi*t, is computed for each input, inputs i is scaled by ci^2 divided by the maximum value of ci^2 over all the inputs, except the scaling factor is set ot 0.01 if it would otherwise be less than this. The effect is similar to the initial adjustment of learning rates in mlp-bgd-2, but for mlp-bgd-3, the scaling is fixed during learning. All other aspects of the procedure are the same as for mlp-bgd-1 and mlp-bgd-2. The "runr" shell used to implement the method works the same way as for mlp-bgd-1 and mlp-bgd-2. The "corrscale" program is used to produce a set of arguments to data-spec that implement the scaling.