Using “least squares” for classification
This is not the right thing to do and it doesn’t work as
well as better methods, but it is easy:
 It reduces classification to least squares regression.
 We already know how to do regression. We can just
solve for the optimal weights with some matrix
algebra (see lecture 2).
We use targets that are equal to the conditional
probability of the class given the input.
 When there are more than two classes, we treat each
class as a separate problem  (we cannot get away with this
if we use the “max” decision function).