SML310 Course Project

General guidelines

The idea of the course project is for you to get some experience trying to do a piece of original research in data science and writing up your results in a format that is similar to what you would see in published literature. We expect to see an idea/task that you describe clearly, relate to existing work, and implement, as well as a report on the results. You will need to write code, run it on some data, make some figures, read a few background papers, collect some references, and write a few pages describing your task, the algorithm(s) you used (if any) and the results you obtained.

You should submit both your write-up and the code that you wrote.

Requirements

There is no specific length requirement, but we would expect that good projects would require about 4-8 pages of text (not including appendices and figures). Note that many journals and conferences would limit you to 6-10 pages in total. For the course project, you are allowed to write more, which is sometimes easier than writing less and still covering all the ground that you need to cover.

It is to be expected that the rubric below won't be a perfect fit for some people's projects, so don’t feel obligated to always follow the instructions below exactly if there is a way to make your paper better by deviating from them. However, if you want to structure your paper significantly differently, please consult us before doing so! When grading, we will consider the problem at hand. For example, if it is very difficult to produce interesting visualizations for your problem (because you are mainly developing an algorithm), we will take that into account when grading.

  1. Abstract (5 points): a summary of the main idea of the project and its contributions.


    Grading scheme :
  2. Introduction (10 points): a statement of the problem being addressed and why we might want to solve it. In this section, it makes sense to link the work that you are doing to prior work and explain what your contribution is; to explain in more detail why the work you are doing is potentially imporant; to briefly state your results; and to outline the structure of the paper and what each of the following sections in the paper achieves.

    Grading scheme :

  3. Background (10 points): an explanation of where the data comes from and how it is collected. In this section, you should explain your dataset. For example, if you are working with finance data, briefly explain any financial terminology (e.g., if you are working with stock prices, explain briefly how stock trading works); if you are working with sociological data, explain each variable being measured and why it is important to measure it.

    Grading scheme :

  4. Prior work (15 points): a survey of at least 4-5 research papers that are important for understanding the dataset you are working with and the methods you are using (there should be at least one paper related to the data, and one paper related to the methods). This would usually be in the form of an essay rather than in the form of a list. You should relate the background work to what you are doing. For each work you survey, the reader should understand the main contribution of the paper (i.e., what the paper did that was new and interesting) and the methods that were used. While you do not need to re-explain things like linear regression, it is appropriate to discuss the methods being used if they were not mentioned in the course and are not explained in something like OpenIntro Statistics or PRML .

    Grading scheme :

  5. Exploratory Analysis and Visualization (15 points). Produce figures and tables that summarize the data in an understandable way. It is important that your analysis both tells the reader about the dataset, and gives the reader insights into the problem that you are addressing. For example, if you are trying to predict Y using X, there should be a visualization of the relationship between Y and X. If you are fitting a statistical model, you should present visual evidence that the model makes sense, if possible. You should produce at least 2 figures. You will likely want visualize the results of your analysis/experiments as well.

    Grading scheme :

  6. Method description (10 points) A description of your method. You should include a description of everything you did here (including model checking etc.). All terms must be defined, and all algorithms that you use explained. Highlight how your model is different from other approaches, or what the main relevant considerations are for the domain. Including a diagram is usually useful. This can be done by comparing it to an existing model, perhaps by using another diagram or in words. For example, if you are proposing a new algorithm that only changes one line in an existing algorithm, highlight that one line, or do a side-by-side comparison.

    Grading scheme :

  7. Design of the Analysis and/or Experiments (15 points) This might or might not be a separate section in your paper. You should design the analysis and/or experiments carefully, and justify the design choices that you made. You will be graded both on the clarity of the description and on the design itself. The grading will vary from person to person.

    Grading scheme :

  8. Limitations (5 points): A section describing the limitations of your approach with respect to being able to draw conclusions about the population you are researching and/or the method. For example, the dataset might not be fully representative of the population you are trying to research; or the algorithm you are using might only work in some situations, but might fail on new data under some circumstances.

    Grading scheme :

  9. Results and Conclusion (10 points): A section describing the results. Note that you are not graded on whether your hypothesis turned out to be correct – the only thing that’s important is that the results are presented well. For the conclusion section, present the main takeaways, and give some ideas for future work.

    Grading scheme :

  10. Depth and Originality (5 points): 5 points are assigned for the depth of the idea. Essentially, 5/5 will be awarded for original ideas and/or especially difficult-to-execute projects. 4/5 will be awarded for good ideas that make sense. Fewer points will be awarded for projects that fall short of that, but I hope that this will rarely happen since we will work together to make the project make sense.