Stochastic gradient descent
If the dataset is highly redundant, the gradient
on the first half is almost identical to the gradient
on the second half.
So instead of computing the full gradient,
update the weights using the gradient on the
first half and then get a gradient for the new
weights on the second half.
The extreme version is to update the weights
after each example, but balanced mini-
batches are just as good and faster in matlab.