lec10

Stochastic gradient descent

•

If the dataset is highly redundant, the gradient

on the first half is almost identical to the gradient

on the second half.

–

So instead of computing the full gradient,

update the weights using the gradient on the

first half and then get a gradient for the new

weights on the second half.

–

The extreme version is to update the weights

after each example, but balanced mini-

batches are just as good and faster in matlab.