Lecture 3

Why the learning procedure works

•

Consider the squared

distance between any

satisfactory weight vector

and the current weight

vector.

–

Every time the

perceptron makes a

mistake, the learning

algorithm reduces the

squared distance

between the current

weight vector and any

satisfactory weight

vector (unless it crosses

the decision plane).

•

So consider “generously satisfactory”

weight vectors that lie within the

feasible region by a margin at least as

great as the largest update.

–

Every time the perceptron makes a

mistake, the squared distance to all

of these weight vectors is always

decreased by at least the squared

length of the smallest update vector.