Learning by perturbing weights
Randomly perturb one weight and see
if it improves performance. If so, save
the change.
Very inefficient. We need to do
multiple forward passes  on a
representative set of training data
just to change one weight.
Towards the end of learning, large
weight perturbations will nearly
always make things worse.
We could randomly perturb all the
weights in parallel and correlate the
performance gain with the weight
changes.
Not any better because we need
lots of trials to “see” the effect of
changing one weight through the
noise created by all the others.
output units
hidden units
input units