Fisher’s linear discriminant
A simple linear discriminant function is a projection of the
data down to 1-D.
So choose the projection that gives the best
separation of the classes. What do we mean by “best
separation”?
An obvious direction to choose is the direction of the line
joining the class means.
But if the main direction of variance in each class is
not orthogonal to this line, this will not give good
separation (see the next figure).
Fisher’s method chooses the direction that maximizes
the ratio of between class variance to within class
variance.
This is the direction in which the projected points
contain the most information about class membership
(under Gaussian assumptions)