CSC2431: Topics in Computational Biology
Analysis of High Throughput Sequencing Data

Winter 2010

Classes: W 10-12 in Bahen 3000
Instructor: Michael Brudno
Office: Pratt (PT) 286C & CCBR 604
Office Hours: By appointment

Announcements.
General information.
Topics & Reading
A guideline for writing paper summaries

Announcements

1/1 -- We now have a google group: csc2431w10uoft. Please sign up for it.

High Throughput Sequencing (HTS) technologies, such as Illumina/Solexa, AB SOLiD and 454 Pyrosequencing are revolutionizing the acquisition of genomics data. These platforms offer much reduced costs and an increased speed of data acquisition, but the length of the sequences acquired is much reduced, from 500-1000 base pairs, to as little as 35 base pairs per read. Simultaneously the methodologies offer several important advantages, for example the ability to acquire paired reads on a very large scale.

The development of HTS is forcing a reconsideration of the computational methods used for genome analysis, with the problems of read mapping and genome assembly becoming much more complex. Simultaneously, HTS is enabling the development of methods to address problems which were previously not addressed with genome sequencing, such as the prediction of structural or copy number polymorphisms. The HTS data has a very different error model, requiring modifications to classical algorithms, and the sheer size of the data requires the use of effective algorithms, appropriate hardware, and effective implementations. In this class we will explore the features of HTS data that make it different from classical sequencing data, and try to determine what are the possible methods to address some of these differences. Because of the novelty of the data and of the problems, the emphasis will be on discovering the right solutions, rather than just learning about them.

The prerequisite is an undergrad-level bioinformatics course, or permission of the instructor. The permission will be given if you have a basic knowledge of molecular biology (transcription, etc), a strong background in algorithms (at least CSC 373 level), and basic probability theory.

Grading:
The basic requirements for the class will be a course project (60% of the grade), paper presentations and participation (20% of the grade) and written paper summaries (20% of the grade).

Syllabus & Readings

January 6 -- Organizational Meeting Slides
January 13th -- High Throughput Sequencing Platforms. Presenter: Michael Brudno
Reading: Nature Methods -- Method of the Year 2007. pp 11-21.
Background:
Personal Genomics
Functional Genomics
DNA Sequencing
RNA Sequencing
ChIP-SEQ
January 20th -- Image processing for base calling
Alta-cyclic Presenter: Jian Zhao
NaiveBayesCall Presenter: Frank Li
January 27th -- De novo Genome Assembly
Overview paper: Medvedev et al. This is not an assigned paper, but may be useful for background
Velvet assembler Presenter: Hui Yuan Xiong
Allpaths assembler Presenter: Mohit Jain
ABYSS-Explorer visualization tool Presenter: Andrew Trusty
February 3rd -- Short Read Alignment
Overview Paper: Dalca et al. This is not an assigned paper. It may be useful for understanding this week's and next week's reading.
BWA Aligner Presenter: Alecia Fowler
SHRiMP aligner Presenter: Billy Chang
Slider Aligner Presenter: Marc Fiume
February 10th -- SNP Calling
PolyBayes Presenter: Hui Yuan Xiong
SoapSNP Presenter: Mark Sun
VARiD Presenter: Velian Pandeliev
February 24th -- Structural Variation Discovery
Overview Paper Medvedev et al.. This is not an assigned paper. It may be useful for understanding this week's and next week's reading.
Pindel Presenter: Jian Zhao
Hormozdiari et al Presenter: Alyssa Rosenzweig
BreakDancer Presenter: Nilgun Donmez
March 3rd -- Copy Number Variation
Chiang et al Presenter: Akhil Mathur
Yoon et al Presenter: Mohit Jain
CNVer Presenter: Michael Brudno
March 10th -- Transcriptome Analysis
TopHat Presenter: Nick Shim
Abyss-RNA Presenter: Nilgun Donmez
Lacroix et al Presenter: Mark Sun
March 17th -- Association Mapping
Biesecker et al Presenter: Alyssa Rosenzweig
Kim et al Presenter: Marc Fiume
Homer et al Presenter: Billy Chang
March 24th -- HCI issues for HTS
EagleView Presenter: Velian Pandeliev
Savant Presenter: Nick Shim
Gambit (no paper, but look at the website) Presenter: Alecia Fowler
March 31st -- Systems & Hardware issues in HTS
MummerGPU Presenter: Frank Li
CloudBurst Presenter: Akhil Mathur

Writing paper summaries

Each person taking the class for credit is responsible for submitting a one page summary of *at least two* of the assigned papers before every class. The system for grading them will be a simple check-off, so no need to sweat too much. From the writeup I am looking for evidence that you read the papers and thought about them. Some evidence of this would be talking about 1. the weaknesses of the paper (the strengths are in the abstract :)), 2. if the method is not directly applicable to HTS how it can be used there. The writeup need not be long or thoroughly polished; it is supposed to be evidence that you've done the work, not work in itself. If you are presenting aa paper, you are exempt from doing a writeup that week.

The whole point of the paper summaries is to make sure that you've read the papers before coming to class. However I will allow you to hand in no more than two summaries up to 2 days late (by Friday of the same week).

Administrative details:

The class will satisfy the 2c breadth.

CSC2431: Topics in Computational Biology Analysis of High Throughput Sequencing Data

Winter 2010 Classes: W 10-12 in Bahen 3000 Instructor: Michael Brudno Office: Pratt (PT) 286C & CCBR 604 Office Hours: By appointment