 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
If the dataset is
highly redundant, the gradient
|
|
|
on the first
half is almost identical to the gradient
|
|
on the second
half.
|
|
|
|
– |
So
instead of computing the full gradient,
|
|
|
update
the weights using the gradient on the
|
|
|
first
half and then get a gradient for the new
|
|
|
weights
on the second half.
|
|
|
|
– |
The
extreme version is to update the weights
|
|
|
after
each example, but balanced mini-
|
|
|
batches
are just as good and faster in matlab.
|
|