World Community Grid
World Community Grid is a public, high-performance computing platform available for "open science/open data" research that benefits humanity. WCG has been created and funded by a philanthropic initiative of IBM Corporate Social Responsibility since 2004 till 2021, to empower anyone with a computer, Android or Raspberry Pi device to contribute to research on health and sustainability. Check out our Youtube Channel, where you can find videos such as; How World Community Grid works and How can World Community Grid transform scientific research.

It is a global, heterogeneous, distributed cloud infrastructure that runs on secure BOINC platform developed at Berkeley. Over the years, 806,240 volunteers with almost 7.6 million devices contributed more than 2.4 million CPU years towards research on cancer, tuberculosis, COVID-19, HIV/AIDS, microbiome immunity, understanding genomes and protein structure, developing new materials, clean energy, ensuring sustainable water, understanding weather patterns in Africa, and other globally important projects.

WCG embraces open-source and open-data sharing policy. The policy contributes to collaborative and reproducible science. Access to data is a significant factor driving science today. So far WCG has generated over a petabyte of data, available freely to the global research community. Our scientific and contributing partners include: Scripps Research Harvard University, The University of Texas Medical Branch, Pierre et Marie Curie University, University of Washington, Chiba Cancer Center Research Institute, University of Cape Town, UMDNJ-Robert Wood Johnson Medical School and The Cancer Institute of New Jersey, Institute for Systems Biology, New York University, Georgia Tech.

WCG meets important security standards. Following security by design principles, the physical servers are located within a managed services environment which follows the Compute Canada security standards, and communication with client-side machines are secured using private digital keys.

WCG uses Berkeley Open Infrastructure for Network Computing (BOINC) clients, built at U Berkeley, and tested by over 4 million volunteers since 1995. BOINC is used world-wide by over 45 projects, including: ERN (LHC@Home), University of Washington (Rosetta@Home), Max Planck Institute for Graviational Physics - Hanover (Einstein@Home), Rensselaer Polytechnic Institute (Milkyway@Home), Oxford University (ClimatePrediction.net).

We strive to make the world a better and healthier place for all of humanity. Our mission is to accelerate science by creating a supercomputer empowered by a global community of volunteers. Ultimately, our volunteers and partners have helped create an impressive global - yet individually driven - people's supercomputer!

We are in the process of transferring World Community Grid to Krembil Research, UHN.
(UHN announcement; IBM WCG announcement)
Software analysis and modeling tools
Our focus is on network analysis and modeling, integrated with cancer profiles that will enable us to identify diagnostic and prognostic biomarkers, understand disease initiation and progression, which will lead to improving cancer treatment. Our tools and resources, such as IID (I2D, OPHID), mirDIP, pathDIP, BIP, NAViGaTOR, FpClass, GAP, CDIPliver, OsteoDIP, miRAnno, GSOAP, SDREGION, RNSC and BTSVQ enable users to interpret integrated cancer profiles, and create relevant models dynamically. We also host a GeneCards mirror. We also have RQSA - robust, quantitative scratch assay analysis system.
pathDIP - An annotated resource for known and predicted human gene-pathway associations and pathway enrichment analysis
pathDIP integrates data from twenty source pathway databases -"core pathways" - with physical protein-protein interactions (PPIs) from IID to predict biologically relevant protein-pathway associations, i.e., "extended pathways". Cross-validation determined 71% recovery rate of our predictions (randomization test, p-value < 0.0001). Data integration and predictions increase coverage of pathway annotations for protein-coding genes to 86% from 57%, and provide novel annotations for 5,786 pathway orphans.

Rahmati, S., Abovsky, M., Pastrello, C., Kotlyar, M., Lu, R., Cumbaa, C.A., Rahman, P., Chandran, V. and Jurisica, I. pathDIP 4: An extended pathway annotations and enrichment analysis resource for human, model organisms and domesticated species, Nucl Acids Res, 48(D1): D479–D488, 2020.

Rahmati, S., Abovsky, M., Pastrello, C., Jurisica, I. pathDIP: An annotated resource for known and predicted human gene-pathway associations and pathway enrichment analysis. Nucl Acids Res, 45(D1): D419-D426, 2017.

Rahmati, S., Pastrello, C., Rossos, A., Jurisica, I. Two Decades of Biological Pathway Databases: Results and Challenges, In: Ranganathan, S., Nakai, K., Schönbach C. and Gribskov, M. (eds.), Encyclopedia of Bioinformatics and Computational Biology, vol. 1, pp. 1071–1084. Oxford: Elsevier, 2018.

Agapito, G., Pastrello, C., Jurisica, I. Comprehensive pathway enrichment analysis workflows: Covid-19 case study, Briefings Bioinform, 22(2): 676-689, 2021.

Pastrello, C, Niu, Y, Jurisica, I. Pathway Enrichment Analysis of Microarray Data. Ed. G. Agapito. Microarray Data Analysis, Methods Mol Biol, vol. 2401: 147-159, 2022.

Go to pathDIP home page
GSOAP: A novel tool for visualisation of gene set over-representation analysis
Gene set over-representation analysis (GSOA) is a method of enrichment analysis that measures the fraction of genes of interest (e.g differentially expressed genes) belonging to a tested group of genes (e.g. pathway, protein family, Gene Ontology instance, etc.). GSOAP (Gene Set Over-representation Analysis Plots) is a tool for exploration and visualization of GSOA results. GSOAP provides simple yet efficient tool for exploration and visualisation of the results obtained by GSOA. Applying binary distance measures and dimensionality reduction techniques, GSOAP depicts relationships between the pathways (or other instances) obtained from GSOA, given the set of query genes; and allows to highlight important instance attributes, such as significance, closeness (centrality), clustering, or outliers. It can be used to visualise the results obtained from most common GSOA tools, including pathDIP, clusterProfiler, topGO, etc.

Tokar, T., Pastrello, C., Jurisica, I. GSOAP: A tool for visualization of gene set over-representation analysis, Bioinformatics, 36(9):2923-2925, 2020.

Go to GSOAP home page
IID - Integrated Interactions Database
IID is the first database providing tissue-specific protein-protein interactions (PPIs) for model organisms (yeast, worm, fly, rat, mouse) and human, providing access to 1,566,043 PPIs among 68,831 proteins. PPIs are annotated with up to 30 tissues per species.

Kotlyar M, Pastrello C, Ahmed Z, Chee J, Varyova Z, Jurisica I, IID 2021: Towards context-specific protein interaction analyses by increased coverage, enhanced annotation and enrichment analysis. Nucl Acids Res, 50(D1):D640-D647, 2022.

Kotlyar, M., Pastrello, C., Malik, Z., Jurisica, I., IID 2018 update: context-specific physical protein-protein interactions in human, model organisms and domesticated species. Nucl Acids Res, 47(D1):D581-D589, 2019.

Kotlyar M, Pastrello C, Sheahan N, Jurisica I. Integrated interactions database: tissue-specific view of the human and model organism interactomes. Nucl Acids Res, 44(D1):D536-41, 2016.

Kotlyar, M, Wong, SWH, Pastrello, C, Jurisica, I. Improving Analysis and Annotation of Microarray Data with Protein Interactions. Ed. G. Agapito. Microarray Data Analysis, Methods Mol Biol, vol. 2401: 51-68, 2022.

Pastrello C, Kotlyar M, Jurisica I., Informed Use of Protein-Protein Interaction Data: A Focus on the Integrated Interactions Database (IID). Methods Mol Biol., 2074:125-134, 2020.

Hauschild, A-C, Pastrello, C, Kotlyar, M and Jurisica, I. Protein-protein interaction data, their quality, and major public databases. Ed. N. Przulj. Analyzing Network Data in Biology and Medicine, An Interdisciplinary Textbook for Biological, Medical and Computational Scientists, Cambridge University Press, Cambridge, UK, pp.151-192, 2019.

Go to IID home page
FpClass - Data mining-based prediction of physical protein interactions
FpClass is an association mining algorithm that we used and validated for comprehensive, in silico prediction of physical protein interactions. FpClass is a reliable, validated method for data mining-based prediction of physical protein interactions, and provides 250,542 high confidence interactions among 10,529 human proteins, including 1,089 interactome orphans. Extensive computational and biological validation shows FpClass outperforms existing computational methods and most biological assays in sensitivity and specificity. Using three bioassays we tested 233 high and medium confidence predictions, and validated 137 interactions, including seven novel potential partners of the tumor suppressor p53. Importantly, we validated 5 of these p53 interactions with orphans by GST pull-down assay (5 of 6 tested -- validation rate of 83%). Overall, validation rates were 40% (2/5) for co-IP, 47% (14/30) for GST pull-down, and 61% (121/198) for MaMTH (Petschnigg et al., Nat Methods, 2014). The high validation rate for MaMTH suggests that FpClass could help guide high-throughput screening, in a combined computational-experimental approach to interactome mapping. This substantially extends our interactome work, including I2D (Brown, Jurisica, Genome Biol, 2007) and (Brown, Jurisica, Bioiformatics, 2005). NAViGaTOR (Brown et al., Bioinformatics, 2009) was used for network analysis and visualization.

Kotlyar M, Pastrello C, Pivetta F, Lo Sardo A, Cumbaa C, Li H, Naranian T, Niu Y, Ding Z, Vafaee F, Broackes-Carter F, Petschnigg J, Mills GB, Jurisicova A, Stagljar I, Maestro R, Jurisica I. In silico prediction of physical protein interactions and characterization of interactome orphans. Nat Methods.12(1):79-847, 2015.

Go to FpClass home page
NAViGaTOR-Network Analysis, Visualization, & Graphing TORonto
NAViGaTOR is a software package for scalable, interactive visual data mining - visualization and analysis of large, typed graphs. These networks could be protein-protein interaction networks, microRNA:gene or transcriptional regulatory networks, metabolic networks, or other graphs, such as transportationa networks, communication networks or even solar system. NAViGaTOR can query IID - online database of interaction data - as well as PSICQUIC, pathDIP, KEGG, Reactome and other data sources, as well as link annotation from Uniprot, GO, Pubmed, and display networks in 2D or 3D. To improve scalability and performance, NAViGaTOR combines Java with OpenGL to provide a 2D/3D visualization system on multiple hardware platforms. NAViGaTOR also provides analytical capabilities and supports standard import and export formats such as GO and the Proteomics Standards Initiative (PSI). In protein-protein interaction networks, nodes represent proteins, and edges between nodes represent physical interactions between the proteins.These visualizations can enable insights into the proteins that play key roles in diseases such as cancer.

Brown KR, Otasek D, Ali M, McGuffin MJ, Xie W, Devani B, Toch IL, Jurisica I. NAViGaTOR: Network Analysis, Visualization and Graphing Toronto. Bioinformatics. 25(24):3327-9, 2009.

Figure from: Benleulmi-Chaachoua A, Chen L, Sokolina K, Wong V, Jurisica I, Emerit MB, Darmon M, Espin A, Stagljar I, Tafelmeyer P, Zamponi GW, Delagrange P, Maurice P, Jockers R. Protein interactome mining defines melatonin MT1 receptors as integral component of presynaptic protein complexes of neurons. J Pineal Res. 60(1):95-108, 2016.

Hauschild, AC, Pastrello, C., Rossos, A., Jurisica, I. Visualization of Biomedical Networks, In: Ranganathan, S., Nakai, K., Schönbach C. and Gribskov, M. (eds.), Encyclopedia of Bioinformatics and Computational Biology, vol. 1, pp. 1016–1035. Oxford: Elsevier, 2018.

Go to NAViGaTOR home page
mirDIP - microRNA:target prediction Data Integration Portal
mirDIP is an on-line database that integrates thirty microRNA resources, providing nearly 152 million human microRNA:target predictions. We also introduce an integrative score, which was statistically inferred from the obtained predictions, and was assigned to each unique microRNA:target interaction to provide a unified measure of confidence. We demonstrate that integrating predictions across multiple resources does not cumulate prediction bias towards biological processes or pathways.

Tokar, T., Pastrello, C., Rossos, A., Abovsky, M., Hauschild, A.C., Tsay, M., Lu, R., Jurisica, I. mirDIP 4.1 – Integrative database of human microRNA target predictions, Nucl Acids Res, 46(D1): D360-D370, 2018.

Shirdel EA, Xie W, Mak TW, Jurisica I. NAViGaTing the micronome - using multiple microRNA prediction databases to identify signalling pathway-associated microRNAs. PLoS One. 6(2):e17429, 2011.

Go to mirDIP home page
CDIP - Cancer Data Integration Portal - Liver
CDIP is an on-line database of significantly deregulated genes in Liver, lung, ovarian, prostate and head&neck cancers. Work on pancreas cancer and sarcoma is ongoing.

Bhat, M. Elisa Pasini, Chiara Pastrello, Sarah Rahmati, Marc Angeli, Max Kotlyar, Anand Ghanekar and Igor Jurisica, Integrative Analysis of Layers of Data in Hepatocellular Carcinoma Reveals Pathway Dependencies, World J Hepatology, 13(1):94-108, 2021.

Bhat M, Pasini E, Pastrello C, Angeli M, Baciu C, Abovsky M, Coffee A, Adeyi O, Kotlyar M, Jurisica I., Estrogen Receptor 1 Inhibition of Wnt/β-catenin Signaling Contributes to Sex Differences in Hepatocarcinogenesis, Front Oncol. 11:777834, eCollection, 2021

Go to CDIPliver home page
OsteoDIP - Osteoarthritis Data Integration Portal
OsteoDIP is a database collecting and annotating osteoarthritis-specific omics data from human studies. The curation considered 1,204 papers and 28 data sets from GEO. The databese includes 1,924 patient samples.

Pastrello C, Abovsky M, Lu R, Ahmed Z, Kotlyar M, Veillette C, Jurisica I, Osteoarthritis Data Integration Portal (OsteoDIP): A web-based gene and non-coding RNA expression database, Osteoarthritis and Cartilage Open, 4(1): 100237, 2022.

Go to OsteoDIP home page
CMapBatch - A computational pipeline for drug repositioning
CMapBatch is computational pipeline that based on a set of disease signatures produces a list of drugs predicted to consistently reverse pathological gene changes. We have validated CMapBatch by conduct the largest and most systematic repurposing study on lung cancer transcriptomes, using 21 signatures. We show that scaling up transcriptional knowledge significantly increases the reproducibility of top drug hits, from 44% to 78%.

Fortney K, Griesman J, Kotlyar M, Pastrello C, Angeli M, Sound-Tsao M, Jurisica I. Prioritizing therapeutics for lung cancer: an integrative meta-analysis of cancer gene signatures and chemogenomic data. PLoS Comput Biol. 11(3):e1004068, 2015.

Go to CMapBatch home page
NetwoRx - A database for linking drugs to pathways and networks
NetwoRx stores pre-computed drug lists for KEGG pathways, GO categories, YEASTRACT transcription factor targets, and orthologs of human KEGG DISEASE groups. Users can interactively explore or download pathway-drug, pathwaypathway, and drug-drug networks, or submit a new gene set to NetwoRx and retrieve the drugs that target it.

Fortney K, Xie W, Kotlyar M, Griesman J, Kotseruba Y, Jurisica I. NetwoRx: connecting drugs to networks and phenotypes in Saccharomyces cerevisiae. Nucleic Acids Res. 41:D720-7, 2013

Go to NetwoRx home page
SCRIPDB - A Portal for Easy Access to Syntheses,Chemicals, and Reactions In Patents
SCRIPDB provides the full original patent text, reactions, and relationships described within any individual patent, in addition to the molecular files common to structural databases. We discuss how such information is valuable in medical text mining, chemical image analysis, reaction extraction, and in silico pharmaceutical lead optimization. SCRIPDB may be searched by exact chemical structure, substructure, or molecular similarity and the results may be restricted to patents describing synthetic routes.

Heifets A, Jurisica I. SCRIPDB: a portal for easy access to syntheses, chemicals and reactions in patents. Nucleic Acids Res. 40:D428-33, 2012

Go to SCRIPDB home page
miRAnno - an integrative tool for pathway annotation of miRNAs
miRAnno uses comprehensive molecular interaction network and random walks with restart to measure the association between miRNAs and individual pathways. Independent validation shows that miRAnno achieves higher signal-to-noise ratio compared to the standard enrichment analysis.

Tomas Tokar, Chiara Pastrello, Mark Abovsky, Sara Rahmati, Igor Jurisica, miRAnno—network-based functional microRNA annotation, Bioinformatics, 38(2):592–593, 2022.

Go to miRAnno home page
BIP - Parsing and enrichment analysis of BioPAX pathways
BIP is an automatic and graphical software tool aimed at performing the parsing and accessing of BioPAX pathway data, along with pathway enrichment analysis by using information coming from pathways encoded in BioPAX.

Agapito G, Pastrello C, Guzzi PH, Jurisica I, Cannataro M. BioPAX-Parser: parsing and enrichment analysis of BioPAX pathways, Bioinformatics, 36(15):4377-4378, 2020.

Go to BIP home page
RQSA - Robust Quantitative Scratch Assay Analysis Tool
RQSA algorithm was created to help analyze results from the wound healing assay (or scratch assay) - a technique used to quantify the dependence of cell motility-a central process in tissue repair and evolution of disease-subject to various treatments conditions. RQSA implements statistical methods where migration rates are estimated, cellular behaviour is distinguished and outliers are identified among groups of unique experimental conditions. It decreased measurement errors and increased accuracy in the wound boundary at comparable processing times compared to previously developed methods.

Vargas A, Angeli M, Pastrello C, McQuaid R, Li H, Jurisicova A, Jurisica I. Robust quantitative scratch assay. Bioinformatics. 32(9):1439-40, 2016

Go to RQSA home page
RNSC - Restricted Neighborhood Search Clustering Algorithm
RNSC efficiently partitions networks into clusters using a cost function with the goal to identify and predict protein complexes.

King AD, Przulj N, Jurisica I. Protein complex prediction via cost-based clustering. Bioinformatics. 20(17):3013-20, 2004.

Go to RNSC home page
SDREGION: fast spotting of changing communities in biological networks
We designed a novel algorithm, SDREGION, that identifies subgraphs that decrease in density monotonically over time, referred to as d-regions or increase in density monotonically over time, referred to as i-regions. We introduced the objective function, Î"density, for identifying d-(i-)regions. SDREGION is a generic algorithm, and we evaluated it by modeling of the progression of lung cancer. In particular, we observed that SDREGION identified d-(i-)regions that capture mechanisms supported by literature. Importantly, andditional findings may provide novel mechanisms in tumor progression that will guide future biological experiments. SDREGION is scalable with a time complexity of O(mlogn + nlogn) where m is the number of edges, and n is the number of vertices in a given dynamic graph.

Wong, S., Pastrello, C., Kotlyar, M., Faloutsos, C., Jurisica, I. SDREGION: Fast spotting of changing communies in biological networks. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 867-875, 2018.
Temp-O: Modeling tumor progression via the comparison of stage-specific graphs
We proposed the Temporal-Omics -Temp-O- workflow to model tumor progression in non-small cell lung cancer (NSCLC) using graph comparisons between multiple stage-specific graphs. We showed that temporal structures are meaningful in the tumor progression of NSCLC. While the Temp-O workflow is generic, we applied it to NSCLC expression data from tumor samples across disease stages to model lung cancer progression, creating stage-specific tumor graphs. Validating our findings in independent datasets showed that differences in temporal network structures capture diverse mechanisms in NSCLC. Furthermore, results showed that structures are consistent and potentially biologically important as we observed that genes with similar protein names were captured in the same cliques for all cliques in all datasets. Importantly, the identified temporal structures are meaningful in the tumor progression of NSCLC as they agree with the molecular mechanism in the tumor progression or carcinogenesis of NSCLC. In particular, the identified major histocompatibility complex of class II temporal structures capture mechanisms concerning carcinogenesis; the proteasome temporal structures capture mechanisms that are in early or late stages of lung cancer; the ribosomal cliques capture the role of ribosome biosynthesis in cancer development and sustainment. Further, on a large independent dataset we validated that temporal network structures identified proteins that are prognostic for overall survival in NSCLC adenocarcinoma.

Wong SWH, Pastrello C, Kotlyar M, Faloutsos C, Jurisica I. Modeling tumor progression via the comparison of stage-specific graphs. Methods. 2018 Jan 1;132:34-41. doi: 10.1016/j.ymeth.2017.06.033. Epub 2017 Jul 3. PubMed PMID: 28684340.
T-WPPDC: A Tree-based Approach to Motif Discovery and Sequence Classification
Tree-based Weighted-Position Pattern Discovery and Classification algorithm (T-WPPDC) supports both unsupervised pattern discovery and supervised sequence classification. It is a minimally parameterized algorithm for both pattern discovery and sequence classification that directly incorporates positional information. It identifies positionally enriched patterns using the Kullbackâ€" Leibler distance between foreground and background sequences at each position. This spatial information is used to discover positionally important patterns. T-WPPDC then uses a scoring function to discriminate different biological classes. We validated T-WPPDC by prediction of single nucleotide polymorphisms (SNPs) from flanking sequence. We evaluated 672 separate experiments on 120 datasets derived from multiple species. T-WPPDC outperformed other pattern discovery methods and was comparable to the supervised machine learning algorithms. The algorithm is computationally efficient and largely insensitive to dataset size. It allows arbitrary parameterization and is embarrassingly parallelizable.

Yan, R., Boutros, P.C., Jurisica, I. A tree-based approach for motif discovery and sequence classification. Bioinformatics. 27(15):2054-61, 2011.
GAP Portal: Integrative approach to predicting gene functional associations using using novel semantic similarity measure
GAP (Gene functional Association Predictor) is an integrative method for predicting and characterizing gene functional associations. It integrates different biological features using a novel taxonomy-based semantic similarity measure in predicting and prioritizing high-quality putative gene associations. The proposed similarity measure increases information gain from the available gene annotations. The annotation information is incorporated from several public pathway databases, Gene Ontology annotations as well as drug and disease associations from the scientific literature.

Vafaee F, Rosu D, Broackes-Carter F, Jurisica I. Novel semantic similarity measure improves an integrative approach to predicting gene functional associations. BMC Syst Biol. 7:22, 2013.

Go to GAP home page
BTSVQ-Binary tree structured vector quantization
BTSVQ is a computational tool to analyze and visualize microarray gene expression data. This technique merges the results of SOM (genes space), and partitive k-means (specimen space). The algorithm uses vector quantization and self-organizing capabilities of SOMs in finding significant gene centers in gene space (high dimensionality and large number of clusters), and the effectiveness of k-means in experiment space (medium dimensionality and low number of clusters).

Sultan M, Wigle DA, Cumbaa CA, Maziarz M, Glasgow J, Tsao MS, Jurisica I. Binary tree-structured vector quantization approach to clustering and visualizing microarray data. Bioinformatics. 18 Suppl 1:S111-9, 2002.

Go to BTSVQ home page