CSC2515 Fall 2007
Introduction to Machine Learning

Lecture 3: Linear Classification Methods

What is “linear” classification?

Representing the target values for classification

Three approaches to classification

Slide 5

Reminder: Three different spaces that are easy to confuse

Discriminant functions for N>2 classes

Problems with multi-class discriminant functions

A simple solution

Using “least squares” for classification

Problems with using least squares for classification

Another example where least squares regression gives poor decision surfaces

Fisher’s linear discriminant

A picture showing the advantage of Fisher’s linear discriminant.

Math of Fisher’s linear discriminants

More math of Fisher’s linear discriminants

Perceptrons

The perceptron convergence procedure

A natural way to try to prove convergence

Weight and data space

A better way to prove the convergence (using the convexity of the solutions in weight-space)

Why the learning procedure works

What perceptrons cannot learn

What can perceptrons do?

The N-bit even parity task

Why connectedness is hard to compute

Distinguishing T from C in any orientation and position

Logistic regression (jump to page 205)

The logistic function

The natural error function for the logistic

Using the chain rule to get the error derivatives

The cross-entropy or “softmax” error function for multi-class classification

A special case of softmax for two classes

Probabilistic Generative Models for Discrimination

A simple example for continuous inputs

A picture of the two Gaussian models and the resulting posterior for the red class

A way of thinking about the role of the inverse covariance matrix

The posterior when the covariance matrices are different for different classes.

Two ways to train a set of class-specific generative models

An example where the two types of training behave very differently