TA Office hours
Study Guide
CSC2515 Grad Project

CSC411/2515: Machine Learning and Data Mining

Winter 2018

CSC2515 Graduate Project

Students enrolled in CSC2515 will complete a graduate course project instead of writing an exam.

You have to submit a project proposal by the end of February, but you are encouraged to submit the proposal earlier than that. We will review your proposal, provide feedback, and either approve it or discuss more suitable project options with you.

The grad project will give you some experience with trying to do a piece of original research in machine learning and writing up your results in a conference paper format. We expect to see is an idea/task that you describe clearly, relate to existing work, implement and test on a dataset. To do this you will need to write code, run it on some data, make some figures, read a few background papers, and write a few pages describing your task, the algorithm(s) you used, and the results you obtained. There is no expectation at all that you will produce a publishable paper, but we'll be very happy if you do!

Students can work on projects individually or in pairs. We encourage you to work in pairs.

We expect the final project report to be 4-8 pages in length, not including appendices.

Grading rubric

A detailed rubric, along with more advice, is here.

Intro talk on ML research and 2515 projects

Please find the slides here.

Where to get good project ideas?

Research is open-ended. However, here is some advice on coming up with project ideas. We will use published papers as examples. Again, you do not have to complete a publishable paper -- published papers are just project ideas that worked out really well.

Modifying an existing algorithm

A lot of the time in class we'll mention some shortcomings of the algorithms that we are disucssing, or trade-offs we encounter when selecting an algorithm to apply to data. Can you think of a way that would address some shortcoming of an existing algorithm, potentially improving the performance sometimes, perhaps at the expense of hurting performance in some cases?

Applying an existing algorithm to a (potentially new) interesting problem domain

Do you have an idea for something that can be done with machine learning that hasn't been done before, preferably by applying a complex algorithm (applying linear regression to a boring dataset will not quite cut it!) This might be worth trying. Here are some ideas for datasets: Project Gutenberg and Wikipedia contain a lot of text data, and Wikipedia contains graph data (the link structure) as well. Historical election and sports results are another rich source of data. Flickr and twitter can be scraped. If you want to publish, adding some analysis would be useful. See here for an example.

Coming up with a suitable algorithm for a novel problem

See the papers under "AI Applications" (among others) here for examples. (To find the text of a paper, enter its title into Google Scholar.)

Working with an existing complex algorithm

This is pretty much the same as "modifying an existing algorithm" or "applying an algorithm to an interesting problem domain," but for new complicated algorithms. Be on the lookout for links to implementations of new algorithms on the internet (e.g., /r/MachineLearning, or the social media accounts of Yann LeCun or Andrej Karpathy)

Examples of good projects

Pretty much all projects from Stanford CS231n and CS229are excellent. The Canadian Conference on Artificail Intelligence publishes many papers that started out as course projects, and is a good source for accessible papers. (Search for the name of the paper on Google Scholar to find the full text.) Conferences like NIPS, ICML, ICLR, KDD, CVPR, and ACL publish top-tier research that you could find inspiring.

When to start thinking about projects

The sooner the better, although that might require that you read ahead! By the end of February, we will have covered the basics of supervised and unsupervised learning as well as the neural networks. You are of course welcome to work on something that we have not covered in class.