My research is in bioinformatics. More specifically, I like to explore the application of computer science to genomics data in order to help realize the idea of personalized genomics. Here are some of the projects I am working on, or have worked on in the past:
High Throughput Sequencing technologies have revolutionized the way we determine the sequence of nucleotide in a DNA molecule. While the data is cheaper to produce, the reads are shorter compared to traditional sequencing approaches, thus making analysis more difficult. Savant is a visualization tool designed to help present large datasets such as those produced by HTS machines to researchers.
Genetic variation comes in many forms. A Copy Number Variation (CNV) is a type of variation where two individuals differ in the number of times some sequence occurs in their genomes. CNVer is a computational tool which locates potential CNVs in an individual whose genome has been sequenced by High Throughput Sequence technologies.
Genomics is largely based on comparison. For example, one may infer rates of evolution by comparing the genomes of related species like human and chimpanzee. The first step after sequencing a genome of an individual is typically to compare the reads to another, reference genome of the same species. The SHort Read Mapping Package (SHRiMP) is a tool for aligning reads generated from High Througput Sequencing machines to a reference genome, and works well particularly for species which exhibit a high degree of polymorphism.
Copyright © 2012