Software analysis and modeling tools

Our focus is on network analysis and modeling, integrated with cancer profiles that will enable us to identify diagnostic and prognostic biomarkers, understand disease initiation and progression, which will lead to improving cancer treatment. Our tools and resources, such as pathDIP, GSOAP, IID, NAViGaTOR, FpClass, I2D, mirDIP, CDIP, GAP, RNSC and BTSVQ enable users to interpret integrated cancer profiles, and create relevant models dynamically. We also host a GeneCards mirror. We also have RQSA - robust, quantitative scratch assay analysis system.

pathDIP - An annotated resource for known and predicted human gene-pathway associations and pathway enrichment analysis

pathDIP integrates data from twenty source pathway databases -"core pathways" - with physical protein-protein interactions (PPIs) from IID to predict biologically relevant protein-pathway associations, i.e., "extended pathways". Cross-validation determined 71% recovery rate of our predictions (randomization test, p-value < 0.0001). Data integration and predictions increase coverage of pathway annotations for protein-coding genes to 86% from 57%, and provide novel annotations for 5,786 pathway orphans.

Rahmati, S., Abovsky, M., Pastrello, C., Kotlyar, M., Lu, R., Cumbaa, C.A., Rahman, P., Chandran, V. and Jurisica, I. pathDIP 4: An extended pathway annotations and enrichment analysis resource for human, model organisms and domesticated species, Nucl Acids Res, In press. 2019.

Rahmati, S., Abovsky, M., Pastrello, C., Jurisica, I. pathDIP: An annotated resource for known and predicted human gene-pathway associations and pathway enrichment analysis. Nucl Acids Res, 45(D1): D419-D426, 2017.

Go to pathDIP home page

GSOAP: A novel tool for visualisation of gene set over-representation analysis

Gene set over-representation analysis (GSOA) is a method of enrichment analysis that measures the fraction of genes of interest (e.g differentially expressed genes) belonging to a tested group of genes (e.g. pathway, protein family, Gene Ontology instance, etc.). GSOAP (Gene Set Over-representation Analysis Plots) is a tool for exploration and visualization of GSOA results. GSOAP provides simple yet efficient tool for exploration and visualisation of the results obtained by GSOA. Applying binary distance measures and dimensionality reduction techniques, GSOAP depicts relationships between the pathways (or other instances) obtained from GSOA, given the set of query genes; and allows to highlight important instance attributes, such as significance, closeness (centrality), clustering, or outliers. It can be used to visualise the results obtained from most common GSOA tools, including pathDIP, clusterProfiler, topGO, etc.

Tokar, T., Pastrello, C., Jurisica, I. GSOAP: A tool for visualization of gene set over-representation analysis, Bioinformatics, 2020. In press.

Go to GSOAP home page

IID - Integrated Interactions Database

IID is the first database providing tissue-specific protein-protein interactions (PPIs) for model organisms (yeast, worm, fly, rat, mouse) and human, providing access to 1,566,043 PPIs among 68,831 proteins. PPIs are annotated with up to 30 tissues per species.

Kotlyar, M., Pastrello, C., Malik, Z., Jurisica, I., IID 2018 update: context-specific physical protein-protein interactions in human, model organisms and domesticated species. Nucl Acids Res, 47(D1):D581-D589, 2019.

Kotlyar M, Pastrello C, Sheahan N, Jurisica I. Integrated interactions database: tissue-specific view of the human and model organism interactomes. Nucl Acids Res, 44(D1):D536-41, 2016

Go to IID home page

FpClass - Data mining-based prediction of physical protein interactions

FpClass is an association mining algorithm that we used and validated for comprehensive, in silico prediction of physical protein interactions. FpClass is a reliable, validated method for data mining-based prediction of physical protein interactions, and provides 250,542 high confidence interactions among 10,529 human proteins, including 1,089 interactome orphans. Extensive computational and biological validation shows FpClass outperforms existing computational methods and most biological assays in sensitivity and specificity. Using three bioassays we tested 233 high and medium confidence predictions, and validated 137 interactions, including seven novel potential partners of the tumor suppressor p53. Importantly, we validated 5 of these p53 interactions with orphans by GST pull-down assay (5 of 6 tested -- validation rate of 83%). Overall, validation rates were 40% (2/5) for co-IP, 47% (14/30) for GST pull-down, and 61% (121/198) for MaMTH (Petschnigg et al., Nat Methods, 2014). The high validation rate for MaMTH suggests that FpClass could help guide high-throughput screening, in a combined computational-experimental approach to interactome mapping. This substantially extends our interactome work, including I2D (Brown, Jurisica, Genome Biol, 2007) and (Brown, Jurisica, Bioiformatics, 2005). NAViGaTOR (Brown et al., Bioinformatics, 2009) was used for network analysis and visualization.

Kotlyar M, Pastrello C, Pivetta F, Lo Sardo A, Cumbaa C, Li H, Naranian T, Niu Y, Ding Z, Vafaee F, Broackes-Carter F, Petschnigg J, Mills GB, Jurisicova A, Stagljar I, Maestro R, Jurisica I. In silico prediction of physical protein interactions and characterization of interactome orphans. Nat Methods.12(1):79-847, 2015.

Go to FpClass home page

NAViGaTOR-Network Analysis, Visualization, & Graphing TORonto

NAViGaTOR is a software package for scalable, interactive visual data mining - visualization and analysis of large, typed graphs. These networks could be protein-protein interaction networks, microRNA:gene or transcriptional regulatory networks, metabolic networks, or other graphs, such as transportationa networks, communication networks or even solar system. NAViGaTOR can query IID - online database of interaction data - as well as PSICQUIC, pathDIP, KEGG, Reactome and other data sources, as well as link annotation from Uniprot, GO, Pubmed, and display networks in 2D or 3D. To improve scalability and performance, NAViGaTOR combines Java with OpenGL to provide a 2D/3D visualization system on multiple hardware platforms. NAViGaTOR also provides analytical capabilities and supports standard import and export formats such as GO and the Proteomics Standards Initiative (PSI). In protein-protein interaction networks, nodes represent proteins, and edges between nodes represent physical interactions between the proteins.These visualizations can enable insights into the proteins that play key roles in diseases such as cancer.

Brown KR, Otasek D, Ali M, McGuffin MJ, Xie W, Devani B, van Toch IL, Jurisica I. NAViGaTOR: Network Analysis, Visualization and Graphing Toronto. Bioinformatics. 25(24):3327-9, 2009.

Figure from: Benleulmi-Chaachoua A, Chen L, Sokolina K, Wong V, Jurisica I, Emerit MB, Darmon M, Espin A, Stagljar I, Tafelmeyer P, Zamponi GW, Delagrange P, Maurice P, Jockers R. Protein interactome mining defines melatonin MT1 receptors as integral component of presynaptic protein complexes of neurons. J Pineal Res. 60(1):95-108, 2016

Go to NAViGaTOR home page

I2D-Interologous Interaction Database

I2D is an on-line database of known and predicted mammalian and eukaryotic protein-protein interactions. It has been built by mapping high-throughput (HTP) data between species. Thus, until experimentally verified, these interactions should be considered "predictions". I2D remains one of the most comprehensive sources of known and predicted eukaryotic PPI.

Brown KR, Jurisica I. Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biol.8(5):R95, 2007.

Figure includes interactomes from 6 species - over 1 million edges - generatated using NAViGaTOR: Brown KR, Otasek D, Ali M, McGuffin MJ, Xie W, Devani B, van Toch IL, Jurisica I. NAViGaTOR: Network Analysis, Visualization and Graphing Toronto. Bioinformatics. 25(24):3327-9, 2009.

Go to I2D home page

mirDIP - microRNA:target prediction Data Integration Portal

mirDIP is an on-line database that integrates thirty microRNA resources, providing nearly 152 million human microRNA:target predictions. We also introduce an integrative score, which was statistically inferred from the obtained predictions, and was assigned to each unique microRNA:target interaction to provide a unified measure of confidence. We demonstrate that integrating predictions across multiple resources does not cumulate prediction bias towards biological processes or pathways.

Tokar, T., Pastrello, C., Rossos, A., Abovsky, M., Hauschild, A.C., Tsay, M., Lu, R., Jurisica, I. mirDIP 4.1 – Integrative database of human microRNA target predictions, Nucl Acids Res, 46(D1): D360-D370, 2018.

Shirdel EA, Xie W, Mak TW, Jurisica I. NAViGaTing the micronome - using multiple microRNA prediction databases to identify signalling pathway-associated microRNAs. PLoS One. 6(2):e17429, 2011.

Go to mirDIP home page

CDIP - Cancer Data Integration Portal

CDIP is an on-line database of significantly deregulated genes in lung, ovarian, prostate and head&neck cancers. Work on pancreas cancer and sarcoma is ongoing.

Go to CDIP home page

CMapBatch - A computational pipeline for drug repositioning

CMapBatch is computational pipeline that based on a set of disease signatures produces a list of drugs predicted to consistently reverse pathological gene changes. We have validated CMapBatch by conduct the largest and most systematic repurposing study on lung cancer transcriptomes, using 21 signatures. We show that scaling up transcriptional knowledge significantly increases the reproducibility of top drug hits, from 44% to 78%.

Fortney K, Griesman J, Kotlyar M, Pastrello C, Angeli M, Sound-Tsao M, Jurisica I. Prioritizing therapeutics for lung cancer: an integrative meta-analysis of cancer gene signatures and chemogenomic data. PLoS Comput Biol. 11(3):e1004068, 2015.

Go to CMapBatch home page

NetwoRx - A database for linking drugs to pathways and networks

NetwoRx stores pre-computed drug lists for KEGG pathways, GO categories, YEASTRACT transcription factor targets, and orthologs of human KEGG DISEASE groups. Users can interactively explore or download pathway-drug, pathwaypathway, and drug-drug networks, or submit a new gene set to NetwoRx and retrieve the drugs that target it.

Fortney K, Xie W, Kotlyar M, Griesman J, Kotseruba Y, Jurisica I. NetwoRx: connecting drugs to networks and phenotypes in Saccharomyces cerevisiae. Nucleic Acids Res. 41:D720-7, 2013

Go to NetwoRx home page

SCRIPDB - A Portal for Easy Access to Syntheses,Chemicals, and Reactions In Patents

SCRIPDB provides the full original patent text, reactions, and relationships described within any individual patent, in addition to the molecular files common to structural databases. We discuss how such information is valuable in medical text mining, chemical image analysis, reaction extraction, and in silico pharmaceutical lead optimization. SCRIPDB may be searched by exact chemical structure, substructure, or molecular similarity and the results may be restricted to patents describing synthetic routes.

Heifets A, Jurisica I. SCRIPDB: a portal for easy access to syntheses, chemicals and reactions in patents. Nucleic Acids Res. 40:D428-33, 2012

Go to SCRIPDB home page

RQSA - Robust Quantitative Scratch Assay Analysis Tool

RQSA algorithm was created to help analyze results from the wound healing assay (or scratch assay) - a technique used to quantify the dependence of cell motility-a central process in tissue repair and evolution of disease-subject to various treatments conditions. RQSA implements statistical methods where migration rates are estimated, cellular behaviour is distinguished and outliers are identified among groups of unique experimental conditions. It decreased measurement errors and increased accuracy in the wound boundary at comparable processing times compared to previously developed methods.

Vargas A, Angeli M, Pastrello C, McQuaid R, Li H, Jurisicova A, Jurisica I. Robust quantitative scratch assay. Bioinformatics. 32(9):1439-40, 2016

Go to RQSA home page

RNSC - Restricted Neighborhood Search Clustering Algorithm

RNSC efficiently partitions networks into clusters using a cost function with the goal to identify and predict protein complexes.

King AD, Przulj N, Jurisica I. Protein complex prediction via cost-based clustering. Bioinformatics. 20(17):3013-20, 2004.

Go to RNSC home page

SDREGION: fast spotting of changing communities in biological networks

We designed a novel algorithm, SDREGION, that identifies subgraphs that decrease in density monotonically over time, referred to as d-regions or increase in density monotonically over time, referred to as i-regions. We introduced the objective function, Î"density, for identifying d-(i-)regions. SDREGION is a generic algorithm, and we evaluated it by modeling of the progression of lung cancer. In particular, we observed that SDREGION identified d-(i-)regions that capture mechanisms supported by literature. Importantly, andditional findings may provide novel mechanisms in tumor progression that will guide future biological experiments. SDREGION is scalable with a time complexity of O(mlogn + nlogn) where m is the number of edges, and n is the number of vertices in a given dynamic graph.

Wong, S., Pastrello, C., Kotlyar, M., Faloutsos, C., Jurisica, I. SDREGION: Fast spotting of changing communities in biological networks. ACM KDD Proceedings, 2018. In press.

Temp-O: Modeling tumor progression via the comparison of stage-specific graphs

We proposed the Temporal-Omics -Temp-O- workflow to model tumor progression in non-small cell lung cancer (NSCLC) using graph comparisons between multiple stage-specific graphs. We showed that temporal structures are meaningful in the tumor progression of NSCLC. While the Temp-O workflow is generic, we applied it to NSCLC expression data from tumor samples across disease stages to model lung cancer progression, creating stage-specific tumor graphs. Validating our findings in independent datasets showed that differences in temporal network structures capture diverse mechanisms in NSCLC. Furthermore, results showed that structures are consistent and potentially biologically important as we observed that genes with similar protein names were captured in the same cliques for all cliques in all datasets. Importantly, the identified temporal structures are meaningful in the tumor progression of NSCLC as they agree with the molecular mechanism in the tumor progression or carcinogenesis of NSCLC. In particular, the identified major histocompatibility complex of class II temporal structures capture mechanisms concerning carcinogenesis; the proteasome temporal structures capture mechanisms that are in early or late stages of lung cancer; the ribosomal cliques capture the role of ribosome biosynthesis in cancer development and sustainment. Further, on a large independent dataset we validated that temporal network structures identified proteins that are prognostic for overall survival in NSCLC adenocarcinoma.

Wong SWH, Pastrello C, Kotlyar M, Faloutsos C, Jurisica I. Modeling tumor progression via the comparison of stage-specific graphs. Methods. 2018 Jan 1;132:34-41. doi: 10.1016/j.ymeth.2017.06.033. Epub 2017 Jul 3. PubMed PMID: 28684340.

T-WPPDC: A Tree-based Approach to Motif Discovery and Sequence Classification

Tree-based Weighted-Position Pattern Discovery and Classification algorithm (T-WPPDC) supports both unsupervised pattern discovery and supervised sequence classification. It is a minimally parameterized algorithm for both pattern discovery and sequence classification that directly incorporates positional information. It identifies positionally enriched patterns using the Kullbackâ€" Leibler distance between foreground and background sequences at each position. This spatial information is used to discover positionally important patterns. T-WPPDC then uses a scoring function to discriminate different biological classes. We validated T-WPPDC by prediction of single nucleotide polymorphisms (SNPs) from flanking sequence. We evaluated 672 separate experiments on 120 datasets derived from multiple species. T-WPPDC outperformed other pattern discovery methods and was comparable to the supervised machine learning algorithms. The algorithm is computationally efficient and largely insensitive to dataset size. It allows arbitrary parameterization and is embarrassingly parallelizable.

Yan, R., Boutros, P.C., Jurisica, I. A tree-based approach for motif discovery and sequence classification. Bioinformatics. 27(15):2054-61, 2011.

GAP Portal: Integrative approach to predicting gene functional associations using using novel semantic similarity measure

GAP (Gene functional Association Predictor) is an integrative method for predicting and characterizing gene functional associations. It integrates different biological features using a novel taxonomy-based semantic similarity measure in predicting and prioritizing high-quality putative gene associations. The proposed similarity measure increases information gain from the available gene annotations. The annotation information is incorporated from several public pathway databases, Gene Ontology annotations as well as drug and disease associations from the scientific literature.

Vafaee F, Rosu D, Broackes-Carter F, Jurisica I. Novel semantic similarity measure improves an integrative approach to predicting gene functional associations. BMC Syst Biol. 7:22, 2013.

Go to GAP home page

BTSVQ-Binary tree structured vector quantization

BTSVQ is a computational tool to analyze and visualize microarray gene expression data. This technique merges the results of SOM (genes space), and partitive k-means (specimen space). The algorithm uses vector quantization and self-organizing capabilities of SOMs in finding significant gene centers in gene space (high dimensionality and large number of clusters), and the effectiveness of k-means in experiment space (medium dimensionality and low number of clusters).

Sultan M, Wigle DA, Cumbaa CA, Maziarz M, Glasgow J, Tsao MS, Jurisica I. Binary tree-structured vector quantization approach to clustering and visualizing microarray data. Bioinformatics. 18 Suppl 1:S111-9, 2002.

Go to BTSVQ home page

All contents copyright Jurisica Lab, Krembil Research Institute, UHN. Last modified January 2020