Ways to use weight derivatives
How often to update
after each training case?
after a full sweep through the training data?
after a “mini-batch” of training cases?
How much to update
Use a fixed learning rate?
Adapt the learning rate?
Add momentum?
Don’t use steepest descent?