Computability of models for sequence assembly.

Paul Medvedev, Konstantinos Georgiou, Gene Myers, Michael Brudno

Abstract

Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NP-hardness of overlap graphs, this demonstrates that all of the popular graph-theoretic sequence assembly paradigms are NP-hard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest doublestranded DNA sequence which contains a given set of _-long words. This algorithm has applications to sequencing by hybridization and short read assembly

Publication

International Workshop on Algorithms in Bioinformatics, 2007

Date

September, 2007

Links

PDF