Course Projects (Worth: 35%)
Project due before midnight Monday April 15 as .pdf sent to hinton@cs.toronto.edu
Please email a yourname.pdf report to csc2535ta1@cs.toronto.edu
General Guidelines
The idea of the project is to give you some experience in trying to do
a small piece of original research in machine learning, or in studying a specific
learning algorithm in depth.
What we expect to see is an empirical investigation of a variation of
a known learning algorithm, or an investigation of a known learning
algorithm on a new dataset. You need to describe the algorithm
clearly, relate it to existing work, implement it, and test it on a
small scale problem.
To do this you will need to write code, run it on some data, make some
figures, read a few background papers, collect some references, and write a
few pages describing your task, the algorithm(s) you used and the results
you obtained.
You are not expected to spend the time that would be required for a
conference paper! The whole project should take about a week of
full-time work to do and about two days to write-up.
Projects can be done individually, or in pairs. Of course, the expectations
will be higher for pair projects.
Specific Requirements
Your submission must include at least two figures which graphically illustrate
quantitative aspects of your results, such as training/testing error curves,
learned parameters, algorithm outputs, input data sorted by results in some
way, etc.
Your submission must include at least 3 references to previously published
papers or book sections.
Your submission should follow the generally accepted style of paper
writing: include an introduction section to motivate your problem and
algorithm, a section describing your approach and how it compares to
previous work, a section outlining the experiments you ran and the
results you obtained, and a short conclusions section to sum up what
you discovered. We are expecting the report for a single person
project to be 5 to 10 pages. You can write it up in whatever format
you prefer, but the submitted version must be sent as a .pdf.
**Note: If you choose to do a project that is not one of the
suggested projects below, you should make an appointment with Geoffrey
Hinton soon to discuss it.
Marking Scheme
The projects will be marked out of 35, with each point being worth 1%
of your final grade.
The following criteria will be taken into account when marking:
1. Clarity/Relevance of problem statement and description of approach.
2. Discussion of relationship to previous work and references.
3. Design and execution of experiments.
4. Figures/Tables/Writing: easily readable, properly labeled, informative.
Friendly Advice
Be selective! Don't choose a project that has nothing to do with machine
learning. Don't investigate an algorithm that is clearly doomed to failure or
un-implementable. Don't attack a problem that is irrelevant, ill-defined or
unsolvable.
Be honest! You are not being marked on how good the results are. It doesn't
matter if your method is worse than the ones you compare
to. What matters is that you try something sensible, clearly describe the
problem, your method, what you did, and what the results were.
Be modest! Don't pick a project that is way too hard. Usually, if you select
the simplest thing you can think of to try, and do it carefully, it will take
much longer than you think.
Have fun! If you pick something you think is cool, that will make getting it
to work less painful and writing up your results less boring.
Suggested Projects
1. Train a Deep Boltzmann Machine with two hidden layers on a set of
binary vectors. Investigate the effect of pretraining on the speed of
training and on the quality of the examples generated by the trained
DBM. This project is mainly about implementing the rather complicated
algorithm described in the reading for that lecture.
2. Compare different energy functions for modeling
image patches using contrastive backpropagation.
First, replicate the model in the notes for lecture 3b that uses two layers of logistic
hidden units to learn to model a two-dimensional density composed of four squares
that contain the data. The hybrid Monte Carlo method that you will need to use is explained
in “Probabilistic inference using Markov chain Monte Carlo methods”, Neal
(1993). For this example, it is probably sufficient to use CD1 without any repetitions
of the choice of random momentum in the trajectory that is used to get the “negative”
data.
Once your code works on this toy problem, try using contrastive backpropagation
with multi-layer feedforward neural nets of various designs with various energy
functions to learn a model of the 8x8 image patches that can be found at:
http://www.cs.toronto.edu/~hinton/data/patches.mat
You might need to use a relatively small training set with a relatively small feedforward
net in order to get your experiments finished in time.
Your report should discuss the effects of using different feed-forward architectures,
different energy functions and different versions of the training procedure. You obviously
do not have time to systematically explore all of these variations so it would
be sufficient to have one “standard” model and to report the effects of one sensible
variation of the network architecture, one sensible variation of the training procedure
and one sensible variation of the energy function.