Michael Brudno -- Research

| home | research | publications | teaching | CV | personal |

The Group

I am priviledged to work with:
  • Elango Cheran (CS MSc student)
  • Adrian Dalca (CS MSc)
  • Nilgun Donmez (CS PhD)
  • Marc Fiume (Bioinformatics BSc)
  • Seunghak Lee (CS MSc)
    • Paul Medvedev (CS PhD)
    • Stephen Rumble (CS BSc)
    • Taya Santare (CS & Math BSc)
    • Joseph Whitney (CS PhD)
    • Vlad Yanovsky (CS PhD)

Current Projects:

Members of the group work on a diverse set of topics, ranging from Theory, to Machine Learning, to Systems (on the computer science side) and from Genome analysis to PPI networks and Protein structure (on the biology side). The projects below are some of the most active research areas within the group.

Algorithms for Genome Assembly

In a recent paper with Paul Medvedev, Konstantinos Georgiou, and Gene Myers we analyzed the complexity of several popular assembly paradigms, as well as the problem of assembly of double-stranded DNA molecules rather than single-stranded strings. Following up on this work, with Paul Medvedev we developed an algorithm for genome assembly with short, mated reads via convex optimization, and the all-pairs shortest path algorithm. A pre-print is available here. We are now working on expanding and improving this work, including developing algorithms for assembly of a diploid organism.

Alignment & Mapping of Short Reads to a Genome

Together with Stephen Rumble (in a collaboration with Arend Sidow and his group) we have been working on SHRiMP -- the SHort Read Mapping Program. SHRiMP can align short reads to a reference genome quickly and accurately, while allowing for insertions/deletions. It also comes with special color-space options to handle reads made by the AB SOLiD technology. Adrian Dalca and I have been working to generalize the sequence alignment scoring schemes into a common framework we call "Rectangle Scoring". Adrian implemented an alignment program for any rectangle scoring scheme in the FRESCO Package. Vlad Yanovsky is exploring more efficient algorithms for genome indexing and sequence alignment. I am also still maintining the LAGAN Alignment toolkit (see the Past projects section below).

Genome Variation

Several members of our group are exploring the variation present among the individuals of a certain species. In collaboration with Alexey Kondrashov and Yegor Bazykin, Nilgun Donmez explored the genome of Ciona savingyi for evidence of positive selection. Elango Cheran and Seunghak Lee are exploring algorithms to detect large scale (structural) variation in the human genome.

Snowflock: Parallelization with Virtual Machines

In collaboration with Andres Lagar Cavilla and Eyal de Lara, Joe Whitney and Stephen Rumble have been working to enable the use of Virtual Machines for parallelizable applications. You can view a talk I gave on this topic at Google.

Past projects:

Ciona genome co-assembly

With Arend Sidow and Kerrin Small at Stanford we assembled the Ciona savignyi genome. Assembling the Ciona genome was especially difficult because of its high polymorphism rate - 5%, or 50 fold higher than in humans. Hence when the genome is given to a regular assembly algorithm the result is two genomes, as different as human and macaque and enriched for misassemblies from being sequenced together.

DNA Alignment

I led the development of the LAGAN toolkit, which consists of several algorithms for sequence alignment. LAGAN was developed in Serafim Batzoglou's lab at Stanford; Chuong Do, Sanket Malde, Michael F. Kim and Mukund Sundararajan have contributed to various programs in the package. LAGAN has been cited in over 50 publications in the year and a half since it appeared, and has been incorporated into several packages for biological sequence alignment. Seven hundred users from more than thirty countries have used LAGAN over 7,000 times through its website, and 130 users have subscribed to receive updates about the program.

LAGAN proper consists of three main parts:

  • (Multi-)LAGAN

    LAGAN is a global aligner for long genomic sequences. It has been proven effective at aligning not only closely related genomes, such as mammals, but demonstrated significant conservation of non-coding functional elements between distant organisms such as mammals and fish.

  • Shuffle-LAGAN

    Shuffle-LAGAN is a glocal aligner (one that combines features of global and local alignment) for genomic sequence that have undergone rearrangements. The initial approach was for alignment of two sequences, which we have extended to alignment of whole genomes. A multiple sequence version of shuffle-LAGAN is in the works.

  • CHAOS

    CHAOS is a highly sensitive local aligner I wrote in a collaboration with Burkhard Morgenstern. It is used as the anchoring system in the LAGAN programs, the CHAOS/DIALIGN alignment program. CHAOS has been used for C. intestinalis-C. savignyi comparisons and human-fish comparisons.

Whole Genome Alignments

Working within the Rat Genome Consortium we developed some of the first methods for multiple alignment of whole genomes, and applied them to the comparison and analysis of the rat genome. More recently I worked on developing methodologies for whole genome synteny mapping using the Shuffle-LAGAN algorithm.

Protein Sequence Alignment

I participated in the development of the ProbCons protein aligner that was written by Chuong (Tom) Do. This aligner combines the ideas of consistency introduced in previous programs such as DIALIGN and T-COFFEE, with a maximum expected accuracy parse of the alignment pair-HMM that leads to results more accurate than other alignment tools, but with no heuristics.

Alternative Splicing Regulation

Alternative splicing is an important regulatory mechanism known to be used in about half of all mammalian genes. During this process an exon present in DNA may be left out of the mature mRNA, and hence will not be converted into a protein. This mechanism can be used to tailor the protein to the current needs of the cell, and many of the known alternative splicing exons are either tissue-specific or development-specific. With John Conboy, Inna Dubchak, and Mikhail Gelfand we worked on identification of enhancers of alternative splicing.

Alignment Visualization

My work in sequence alignment has lead me to think extensively about methods to interpret the resulting alignments for the biologist. This interest has lead to my participation in both VISTA and Phylo-VISTA projects with Inna Dubchak, Nameeta Shah, Kelly Frazer and many others.