CSC2515 Fall 2007
Introduction to Machine
Learning
Lecture 3: Linear Classification Methods
What is “linear” classification?
Representing the target values for classification
Three approaches to classification
Reminder: Three different spaces that are easy to confuse
Discriminant functions for N>2 classes
Problems with multi-class discriminant functions
Using “least squares” for classification
Problems with using least squares for classification
Another example where least squares regression gives poor decision surfaces
A picture showing the advantage of Fisher’s linear discriminant.
Math of Fisher’s linear discriminants
More math of Fisher’s linear discriminants
The perceptron convergence procedure
A natural way to try to prove convergence
A better way to prove the convergence (using the convexity of the solutions in weight-space)
Why the learning procedure works
Why connectedness is hard to compute
Distinguishing T from C in any orientation and position
Logistic regression (jump to page 205)
The natural error function for the logistic
Using the chain rule to get the error derivatives
The cross-entropy or “softmax” error function for multi-class classification
A special case of softmax for two classes
Probabilistic Generative Models for Discrimination
A simple example for continuous inputs
A picture of the two Gaussian models and the resulting posterior for the red class
A way of thinking about the role of the inverse covariance matrix
The posterior when the covariance matrices are different for different classes.
Two ways to train a set of class-specific generative models
An example where the two types of training behave very differently