|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | 
   
    | • | A simple linear
    discriminant function is a projection of the 
 | 
   
    |  | data down to 1-D. 
 |  | 
   
    |  | – | So
    choose the projection that gives the best 
 |  | 
   
    |  | separation
    of the classes. What do we
    mean by “best 
 |  | 
   
    |  | separation”? 
 |  | 
   
    | • | An obvious
    direction to choose is the direction of the line 
 |  | 
   
    |  | joining the class
    means. 
 |  | 
   
    |  | – | But
    if the main direction of variance in each class is 
 |  | 
   
    |  | not
    orthogonal to this line, this will not give good 
 |  | 
   
    |  | separation
    (see the next figure). 
 |  | 
   
    | • | Fisher’s method
    chooses the direction that maximizes 
 |  | 
   
    |  | the ratio of between class variance to within
    class 
 |  | 
   
    |  | variance. 
 |  | 
   
    |  | – | This
    is the direction in which the projected points 
 |  | 
   
    |  | contain
    the most information about class membership 
 |  | 
   
    |  | (under
    Gaussian assumptions) 
 |  |