Machine Learning in Computational Biology


Time:                           Tuesdays 10-12

Location:                     BA 2179

Instructor office:          686 Bay St. Rm 12-9708

Instructor availability:  meeting pre-arranged by email

Prerequisites:              background in machine learning (at least an introductory course) and

                                              basic probability and statistics are required

Course Overview

The goal of this graduate seminar course is to investigate the areas of computational biology where machine learning can make the most difference. We will cover many topics in such diverse areas as variation in the genome, regulation, epigenetics and microbiome, etc with relation to human disease. Together we will read and discuss the most recent computational developments in these areas and brainstorm about what can be done to improve current performance.

Importantly, this will not be an introduction to molecular biology nor machine learning. This course is intended for people with machine learning background (or at least an introductory course) who will be able to develop new methods or modify and apply existing algorithms to problems in computational biology. Thus the emphasis in my lectures will be on introducing the areas and framing the problems in compbio and computational medicine.

Each lecture will be divided into a discussion hour of relevant literature led by assigned students and a brief introduction to the new topic.

Grading Scheme

Paper presentations     30%

Written summaries       20%

Project                          50%

Paper presentations:     each student will present ~ 3 papers on 3 different topics throughout the duration of the course. Each presentation will be 20-25min + question/answer portion

Written summaries:     each student is required to submit a 1 (max 2) paragraphs on the topic that was covered in the previous lecture and the presented papers by 5pm on the day of the presentations. The paragraphs should reflect student’s understanding of the area covered. It should list definition and advantages and disadvantages of existing methods. The summaries should be submitted by email from a uoft account in plain text (or as a pdf attachment if formulas are included) to anna dot goldenberg at utoronto dot ca with the subject “CSC2431 summary”.

Project (in teams of 1-2 people):     the goal of the project is to produce a machine learning contribution (preferably publishable) addressing an area of computational biology/medicine. Each student will be evaluated on proposal, midterm report, final paper and presentation. Each paper should be written using a conference template of your choice with the goal to publish at a conference.

    Proposal: goals, division of work (in team projects), aims, methods, datasets, challenges and 

                     alternatives, competitors (methods you will compare to)

    Midterm report: adjusted aims, draft of the final paper with placeholders for figures (which figures    

                     and tables you expect to generate in the final draft)

    Final draft: paper written in a conference format

                    (pick your conference style - ISMB, RECOMB, ICML, NIPS)

CSC 2431