Project on Learning Deep Belief Nets

Deep Belief Nets (DBN's) will be explained in the lecture on Oct 29. Instead of learning layers of features by backpropagating errors, they learn one layer at a time by trying to build a generative model of the data or the activities of the feature detectors in the layer below. After they have learned features in this way, they can be fine-tuned with backpropagation. Their main advantage is that they can learn the layers of features from large sets of unlabelled data.

The data and the code

For this project you should use the MNIST dataset. This data and the code for training DBN's is available here. You want the code for classifying images, not the code for deep autoencoders.

The main point of the project

You should investigate how the relative performance of three learning methods changes as you change the relative amounts of labelled and unlabelled data. The three methods are DBN's, SVMlite, and SVMlite applied to the features learned by DBN's. SVMlite is explained in several places on the web. We will add more information soon on the easiest way to use SVMlite with Matlab for multiclass classification (as opposed to 2 class which is easier). Support vector machines will be explained in the lecture on Nov 12, but you dont need to understand much about them to run SVMlite.

Choosing the data

You have to decide how much labelled and unlabelled data to use. Using small amounts of labelled data is a good place to start because it makes the supervised learning fast and it gives a high error-rate which makes comparisons easier.

Designing the Deep Belief Network

You have to decide how many hidden layers the DBN should have, how many units each layer should have, and how long the pre-training should be. This project does not involve writing your own code from scratch, so we will expect you to perform sensible experiments to choose these numbers (and the numbers of labelled and unlabelled examples) and it will be evaluated on how well you do this.

Extra work for teams

If you are working as a pair, you should also compare the three methods above with a single feedforward network trained with backpropagation on the labelled data.