Course Projects

Due: Friday December 19, 2008 by noon

Please email (.pdf or .ps only no MS-Word) to csc2515prof@cs.toronto.edu Worth: 36%

General Guidelines

The idea of the final project is to give you some experience trying to do a piece of original research in machine learning and writing up your results in a paper style format. What we expect to see is an idea/task that you describe clearly, relate to existing work, implement and test on a dataset. To do this you will need to write code, run it on some data, make some figures, read a few background papers, collect some references, and write a few pages describing your task, the algorithm(s) you used and the results you obtained. As a rough rule of thumb, spend about one week's worth of work (spread out over a longer time to allow the computers to do some work in the interim!), and about a day writing it up after that. Projects can be done individually, or in pairs. We encourage you to work in pairs, but of course, the expectations will be higher for pair projects.

Specific Requirements

Your project must implement one or more machine learning algorithms and apply them to some data. Your project may be a comparison of several existing algorithms, or it may propose a new algorithm in which case you still must compare it to at least one other approach. You can either pick a project of your own design, or you can choose from the set of pre-defined projects described below. Regardless of which way you select a project, you cannot use the excuse that you got a "bad project" to explain doing a poor job on it. So select wisely! Your submission must include at least two figures which graphically illustrate quantitative aspects of your results, such as training/testing error curves, learned parameters, algorithm outputs, input data sorted by results in some way, etc. Your submission must include at least 4 references to previous published papers or book sections. Your submission should follow the generally accepted style of paper writing: include an introduction section to motivate your problem and algorithm, a section describing your approach and how it compares to previous work, a section outlining the experiments you ran and the results you obtained, and a short conclusions section to sum up what you discovered. Your submission must be prepared in the NIPS 2006 paper style (using Latex is encouraged but not required), and must be no longer than 6 pages in length (10 for pair projects), including all figures, tables, references, etc. Do not hand in any code of any kind.

Project Proposal

You must turn in a brief project proposal (1-2 paragraphs) in class on Oct 22nd. Your project proposal should either say which of the pre-defined projects you plan to pursue, or describe the idea behind your self-defined project. You should also briefly describe software you will need to write, and papers (2-3) you plan to read. Please also say if you will have a partner, and if so, who it will be.
{\bf Include your email address on your proposal}. We need this to contact you and arrange meetings to discuss your proposal.

Pre-Defined Projects

1. Collaborative filtering

2. Mixture of Experts

3. Boosting

We decided that the project on using unlabeled data that was outlined in class was too routine to make a good project. Instead, a simplified version of that project is going to become the programming part of assignment 3. So if you already did some work on it, you will find assignment 3 very easy. You can still design your own project that uses this way of learning multilayer networks, but its should involve a very different dataset so that you have room for a significant amount of exploration.

Marking Scheme

The projects will be marked out of 36, with each point being worth 1% of your grade. The following criteria will be taken into account when marking:

Clarity/Relevance of problem statement and description of approach.

Discussion of relationship to previous work and references.

Design and execution of experiments.

Figures/Tables/Writing: easily readable, properly labeled, informative.

Friendly Advice

Be selective! Don't choose a project that has nothing to do with machine learning. Don't investigate an algorithm that is clearly doomed to failure or un-implementable. Don't attack a problem that is irrelevant, ill-defined or unsolvable.

Be honest! You are not being marked on how good the results are. It doesn't matter one bit if your method is better or worse than the ones you compare to. What matters is that you try something sensible, clearly describe the problem, your method, what you did, and what the results were.

Be modest! Don't pick a project that is way too hard. Usually, if you select the simplest thing you can think of to try, and do it carefully, it will take much longer than you think.

Be careful! Don't do foolish things like test on your training data, set parameters by cheating, compare unfairly against other methods, include plots with unlabeled axes, use undefined symbols in equations, etc. Do sensible cross-checks like running your algorithms several times, leaving out small parts of your data, adding a few noisy points, etc. to make sure everything still works reasonably well. Make lots of pictures along the way.

Learn! The point of the project is to give you a chance to "test drive" the process of writing a paper, which many of you have never done, in a low-stress setting, away from the pressures of your thesis and conference deadlines. Consider this an opportunity to learn how to write code to run large experiments, make nice figures, layout readable equations, describe your work concisely to a smart but uninitiated reader, etc.

Have fun! If you pick something you think is cool, that will make getting it to work less painful and writing up your results less boring.