Logistic regression (jump to page 205)
When there are only two classes we can model
the conditional probability of the positive class as
If we use the right error function, something nice
happens: The gradient of the logistic and the
gradient of the error function cancel each other: