Motivation
One of the most deadly cancer diagnoses is the carcinoma of unknown primary origin. Without knowledge of the site of origin, treatment regimens are limited in their specificity and result in high mortality rates. Classification models based on microarray gene expression data have been previously constructed to predict the site of origin, but they depend on previously classified cancer expression data on which to train, do not account for sample heterogeneity, and rely on noisy microarray technology.
Results
We present ISOLATE, a statistical model that simultaneously predicts the primary site of origin of cancers and addresses sample heterogeneity, while taking advantage of new high throughput sequencing technology that promises to bring higher accuracy and reproducibility to gene expression profiling experiments. ISOLATE makes predictions de novo, without having seen any training expression profiles of cancers with identified origin. Compared to previous methods, ISOLATE is able to predict the primary site of origin, deconvolve and remove sample heterogeneity, and identify differentially expressed genes with higher accuracy, across both synthetic and clinical datasets. Models such as ISOLATE will be an asset to clinicians faced with carcinomas of unknown origin.
Contact
gerald.quon @ utoronto.ca ; quaid.morris @ utoronto.ca
Paper download
ISOLATE version 0.3 download - updated August 21, 2009
ISOLATE is currently available as a Python package - see the included README file for details on how to run it. Please contact us at the above email addresses if you have any questions, concerns, or need help running the software! In the near future, we will be releasing executables that do not require you to download Python and the associated NumPy and SciPy packages, so stay tuned!
See the CHANGELOG for a revision history.