CSC2431: Topics in Computational Biology
Analysis of Next Generation Sequencing Data

Winter 2008


Classes: W 11-1 in Bahen 025

Instructor: Michael Brudno
Office: Pratt (PT) 286C & CCBR 604
Office Hours: By appointment



Announcements

General information

Next Generational Sequencing (NGS) technologies, such as Illumina/Solexa, AB SOLiD and 454 Pyrosequencing are revolutionizing the acquisition of genomics data. These platforms offer much reduced costs and an increased speed of data acquisition, but the length of the sequences acquired is much reduced, from 500-1000 base pairs, to as little as 25 base pairs per read. Simultaneously the methodologies offer several important advantages, for example the ability to acquire paired reads on a very large scale.

The development of NGS is forcing a reconsideration of the computational methods used for genome analysis, with the problems of read mapping and genome assembly becoming much more complex. Simultaneously, NGS is enabling the development of methods to address problems which were previously not addressed with genome sequencing, such as the prediction of structural or copy number polymorphisms. The NGS data has a very different error model, requiring modifications to classical algorithms, and the sheer size of the data requires the use of effective algorithms, appropriate hardware, and effective implementations. In this class we will explore the features of NGS data that make it different from classical sequencing data, and try to determine what are the possible methods to address some of these differences. Because of the novelty of the data and of the problems, the emphasis will be on discovering the right solutions, rather than just learning about them.

The prerequisite is CSC 2417 -- Algorithms for Genome Analysis, or permission of the instructor. The permission will be given if you have a basic knowledge of molecular biology (transcription, etc), a strong background in algorithms (at least CSC 373 level), and basic probability theory.

Grading:
The basic requirements for the class will be a course project (60% of the grade), paper presentations and participation (20% of the grade) and written paper summaries (20% of the grade).

Syllabus & Readings

Writing paper summaries

Each person taking the class for credit is responsible for submitting a one page summary of *at least two* of the assigned papers before every class. The system for grading them will be a simple check-off, so no need to sweat too much. From the writeup I am looking for evidence that you read the papers and thought about them. Some evidence of this would be talking about 1. the weaknesses of the paper (the strengths are in the abstract :)), 2. if the method is not directly applicable to NGS how it can be used there. The writeup need not be long or thoroughly polished; it is supposed to be evidence that you've done the work, not work in itself. If you are presenting aa paper, you are exempt from doing a writeup that week.

The whole point of the paper summaries is to make sure that you've read the papers before coming to class. However I will allow you to hand in no more than two summaries up to 2 days late (by Friday of the same week).

Administrative details:

The class will satisfy the 2c breadth.