A natural way to try to prove convergence
The obvious approach is to write down an error function
and try to show that each step of the learning procedure
reduces the error.
For stochastic online learning we would like to show
that each step reduces the expected error, where the
expectation is across the choice of training cases.
It cannot be a squared error because the size of the
update does not depend on the size of the mistake.
The textbook tries to use the sum of the distances on the
wrong side of the decision surface as an error measure.
Its conclusion is that the perceptron convergence
procedure is not guaranteed to reduce the total error
at each step.
This is true for that error function even if there is a set of
weights that gets the right answer for every training case.