CSC2515 Fall 2007
 Introduction to Machine Learning

Lecture 4: Backpropagation

Why we need backpropagation

Learning by perturbing weights

The idea behind backpropagation

A difference in notation

Non-linear neurons with smooth derivatives

Sketch of the backpropagation algorithm
on a single training case

The derivatives

Some Success Stories

Overview of the applications in this lecture

An example of relational information

Another way to express the same information

A relational learning task

The structure of the neural net

How to show the weights of hidden units

The features it learned for person 1

What the network learns

Another way to see that it works

Why this is interesting

A basic problem in speech recognition

The standard “trigram” method

Why the trigram model is silly

Bengio’s neural net for predicting the next word

2-D display of some of the 100-D feature vectors learned by another language model

Applying backpropagation to shape recognition

The invariance problem

Le Net

The replicated feature approach

The architecture of LeNet5

Backpropagation with weight constraints

Combining the outputs of replicated features

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

The 82 errors made by LeNet5

Slide 43

Recurrent networks

An advantage of modeling sequential data

The equivalence between layered, feedforward nets and recurrent nets

Backpropagation through time

Teaching signals for recurrent networks

A good problem for a recurrent network

The algorithm for binary addition

A recurrent net for binary addition

The connectivity of the network

What the network learns

Preventing overfitting by early stopping

Why early stopping works

Full Bayesian Learning

How to deal with the fact that the space of all possible parameters vectors is huge

One method for sampling weight vectors

An amazing fact

Slide 60