The next generation of neural networks

The main aim of neural networks

First generation neural networks

Second generation neural networks (~1985)

A temporary digression

What is wrong with back-propagation?

Overcoming the limitations of back-propagation

The building blocks: Binary stochastic neurons

A simple learning module:
A Restricted Boltzmann Machine

Weights à Energies à Probabilities

A picture of “alternating Gibbs sampling” which can be used to learn the weights of an RBM

Contrastive divergence learning:
A quick way to learn an RBM

How to learn a set of features that are good for reconstructing images of the digit 2

Slide 14

How well can we reconstruct the digit images from the binary feature activations?

Training a deep network

Why does greedy learning work?

The generative model after learning 3 layers

A neural model of digit recognition

Fine-tuning with a contrastive divergence version of the wake-sleep algorithm

Show the movie of the network generating and recognizing digits

(available at www.cs.toronto/~hinton)

Examples of correctly recognized handwritten digits
that the neural network had never seen before

How well does it discriminate on the MNIST test set with no extra information about geometric distortions?

Using backpropagation for fine-tuning

First, model the distribution of digit images

Deep Autoencoders
(Ruslan Salakhutdinov)

A comparison of methods for compressing digit images to 30 real numbers.

How to compress document count vectors

Slide 29

Slide 30

Finding binary codes for documents

Using a deep autoencoder as a hash-function for finding approximate matches

How good is a shortlist found this way?

Summary

THE END

The extra slides explain some points in more detail and give additional examples.

Why does greedy learning work?

Do the 30-D codes found by the autoencoder preserve the class structure of the data?

Slide 39

Inference in a directed net with replicated weights

What happens when the weights in higher layers become different from the weights in the first layer?

The Energy of a joint configuration

Using energies to define probabilities

An RBM with real-valued visible units
(you don’t have to understand this slide!)

And now for something a bit more realistic

A network with local connectivity

Slide 47