The Genomic Birthday Paradox: How Much Is Enough?

Abstract

Genomic matchmaking databases (GMDs) allow participants to submit genomic and phenotypic data with the goal of identifying previously uncharacterized disease-associated genes by 'matching' to other comparable cases. Current estimates suggest that there are at least 3,000 Mendelian disease-associated genes that have not yet been characterized as such, but the true number may be substantially higher. Therefore, GMDs are addressing a pressing medical need, and it is important to ask how they should be designed and how much data they should strive to contain in order to identify a certain number of these genes. In this work, we argue that genomic matchmaking has similarities to the so-called 'birthday paradox,' which refers to the observation that within a group of just 23 persons, two people will have the same birthday with probability greater than 50%. We develop a series of simulations to provide a rough estimate of the number of cases required and to explore the influence of parameters such as genetic heterogeneity, mode of inheritance, background variation, precision of phenotypic descriptions, disease prevalence, and the accuracy of bioinformatics pathogenicity prediction programs on the performance of genomic matchmaking.

Publication
Human mutation, 36: 989-997, 2015
Date
Links