K. Jun Gao

they/them

A photo of Jun at ISMB 2022

I’m a research student and fourth-year undergraduate at the University of Toronto. I am pursuing a degree in computer science, bioinformatics and computational biology, with a math minor and focus in theoretical CS. I work at the Simpson Lab at the Ontario Institute for Cancer Research (OICR). Previously, I worked as an undergraduate student at the Yu Lab at the University of Toronto.

My research interests are algorithms and data structures for bioinformatics, and more broadly combinatorial problems that arise in bioinformatics. Apart from research, I am also interested in mobile development, photography, and Japanese language and culture.

I can be reached at kgao at cs dot toronto dot edu or jgao at oicr dot on dot ca.

Links: { cv, short-resume, blog, github, linkedin }

Teaching

University of Toronto

  • CSC165 (Mathematical Expression and Reasoning for Computer Science) Winter 2022 and 2023, Teaching assistant
  • CSC236 (Intro to Theory of Computation) Fall 2022 and 2023, Teaching assistant

You can find my notes for theory of computation, data structures, and algorithm design on my GitHub. I have compiled notes focused on the mathematical and algorithmic foundations of computational biology that may be used as teaching or workshop material.

Algorithms and Data Structures for Computational Biology: Math, Theory, and Practice for Large-Scale Biological Data Analysis

Projects

A new approach for efficient storage and retrieval of homology data
We created a new data structure and index format for storing homology pair data, accommodating a large homology database. By leveraging gene tree hierarchy, we avoided storing all homologous relationships, reducing space complexity from to . We also implemented interval-based labeling to efficiently parse trees and extract homology information.
Algorithms for mapping sequencing reads to population references
We created ChromMiniGraph, a memory-efficient tool for constructing pangenome references that utilizes k-mer sampling and node coloring to reduce storage while maintaining accuracy. ChromMiniGraph maps reads efficiently and accurately using subsampling and colinear chaining on a linearized coordinate. ChromMiniGraph offers a streamlined workflow for pangenome references, read phasing, and structural variation identification.

Conference and Poster Presentations

* indicates equal contribution

ChromMiniGraph: Space-Efficient Minimizer-based Pangenome Reference Graph and Haplotype Mapping Tool

Publications

Coming soon (hopefully)