  | 
      | 
      | 
      | 
      | 
      | 
      | 
      | 
      | 
      | 
      | 
      | 
      | 
      | 
   
   
    | • | 
    This is not the
    right thing to do and it doesn’t work as 
     | 
     | 
   
   
     | 
    well as better
    methods, but it is easy: 
     | 
     | 
   
   
     | 
   
   
     | 
    – | 
     It reduces classification to least squares
    regression. 
     | 
     | 
   
   
     | 
   
   
     | 
    – | 
     We already know how to do regression. We
    can just 
     | 
     | 
   
   
     | 
    solve
    for the optimal weights with some matrix 
     | 
     | 
   
   
     | 
    algebra
    (see lecture 2). 
     | 
     | 
   
   
     | 
   
   
    | • | 
    We use targets
    that are equal to the conditional 
     | 
     | 
   
   
     | 
    probability of
    the class given the input. 
     | 
     | 
   
   
     | 
   
   
     | 
    – | 
     When there are more than two classes, we
    treat each 
     | 
   
   
     | 
    class
    as a separate problem  (we cannot get away with this 
     | 
   
   
     | 
    if
    we use the “max” decision function). 
     | 
     |