Journal Publications

Gortzak-Uzan, L., Ignatchenko, A., Evangelou, A., Agochiya, M., Brown, K. St.Onge, P., Kireeva, I., Schmitt-Ulms, G., Brown, T., Murphy, J., Rosen, B., Shaw, P., Jurisica, I., Kislinger, T. A proteome resource of ovarian cancer ascites: Integrated proteomic and bioinformatic analyses to identify putative biomarkers. Journal of Proteome Research. 7(1): 339-351, 2008.

Ghavidel, A., T. Kislinger, O. Pogoutse, R. Sopko, I. Jurisica, and A. Emili. Regulated tRNA export mediates the execution of G1 checkpoint in response to DNA damage. Cell, 131(5):915-26, 2007..

Kim S.S., Shago M., Kaustov L., Boutros P.C., Clendening J.W., Sheng Y., Trentin G.A., Barsyte-Lovejoy D., Mao D.Y., Kay R., Jurisica I., Arrowsmith C., Penn L.Z. CUL7 is a novel anti-apoptotic oncogene, Cancer Research, 67(20): 9616-9622, 2007.

Lau, S.K., P. C. Boutros, M. Pintilie, F. H. Blackhall, C.-Q. Zhu, D. Strumpf, M. R. Johnston, G. Darling, S. Keshavjee, T. K. Waddell, N. Liu, D. Lau, L. Z. Penn, F. A. Shepherd, I. Jurisica, S. D. Der, M.-S. Tsao. A three-gene prognostic classifier for early-stage non-small cell lung cancer. J Clinical Oncology, 25(35): 5562-5569, 2007.

Zhu, C.Q., S. Popova, E. R S Brown, D. Barsyte-Lovejoy, R. Navab, W. Shih, M. Li, M. Lu, I. Jurisica, L. Penn, D. Gullberg and M.-S. Tsao. Integrin a11 regulates IGF-2 expression in fibroblasts to enhance tumorigenicity of human non-small cell lung cancer cells, PNAS, 104(28): 11754-9, 2007.

Barrios-Rodilesm M., A. Viloria-Petit, K. R. Brown, I. Jurisica, and J. L. Wrana. High-throughput screening of protein interaction networks in the TGFb interactome: understanding the signaling mechanisms driving tumor progression. Cancer Drug Discovery and Development: Transforming Growth Factor-b in Cancer Therapy, Vol 2: Cancer Treatment and Therapy Edited by: Sonia B. Jakowlew, Humana Press Inc., Totowa, N.J., 2007.

Brown, K. and I. Jurisica. Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biology,8(5), 2007.

Wu, C., Ma, M. H., Brown, K. R., Geisler, M., Li, L., Tzeng, E., Jia, C. Y., Jurisica, I., Li, S. S. Systematic identification of SH3 domain-mediated human protein-protein interactions by peptide array target screening. Proteomics. 7(11):1775-85, 2007.

Cox, B., T. Kislinger, D. A. Wigle, A. Kannan, K. Brown, T. Okubo, B. Hogan, I. Jurisica, B. Frey, J. Rossant and A. Emili. Integrated proteomic and transcriptomic profiling of mouse lung development and Nmyc target genes, Molecular Systems Biology, 3:109, 2007.

Wei-Lynn Wong, W., J. W. Clendening, A. Martirosyan, P. C. Boutros, C. Bros, F. Khosravi, I. Jurisica, K. Stewart, P. L. Bergsagel, and L. Z. Penn. Determinants of sensitivity to lovastatin-induced apoptosis in multiple myeloma, Molecular Cancer Therapeutics, 6(6):1886-97, 2007.

Bachtiary, B., P. Boutros, M. Pintilie, W. Shi, C. Bastianutto, J.-H. Li, J. Schwock, L. Z. Penn, I. Jurisica, A. Fyles, F.-F. Liu. Gene expression profiling in cervical cancer . an exploration of intra-tumor heterogeneity. Clinical Cancer Research, 12(19):5632-5640, 2006.

D. A. Wigle and I. Jurisica. Cancer as a system failure. Cancer Informatics. Systems Biology Special Issue editorial. 3(2):10-18, 2007.

A. Evangelou, L. Gortzak-Uzan, I. Jurisica and T. Kislinger. Mass spectrometry, proteomics, data mining and their applications in infectious disease research, Anti-Infective Agents in Medicinal Chemistry, 6(2):89-105, 2007.

Motamed-Khorasani, A., I. Jurisica, M. Letarte, P.A. Shaw, R.K. Parkes, X. Zhang, A. Evangelou, B. Rosen, K.J. Murphy, and T.J. Brown. Differentially Androgen-Modulated Genes in Ovarian Epithelial Cells from BRCA Mutation Carriers and Control Patients Predict Ovarian Cancer Survival and Disease Progression. Oncogene, 26:(2):198-214, 2007.

    Epidemiological studies have implicated androgens in the etiology and progression of epithelial ovarian cancer. We previously reported that some androgen responses were dysregulated in malignant ovarian epithelial cells relative to control, non-malignant ovarian surface epithelial (OSE) cells. Moreover, dysregulated androgen responses were observed in OSE cells derived from patients with germline BRCA-1 or -2 mutations (OSEb), which account for the majority of familial ovarian cancer predisposition, and such altered responses may be involved in ovarian carcinogenesis or progression. In the present study, gene expression profiling using cDNA microarrays identified 17 genes differentially expressed in response to continuous androgen exposure in OSEb cells and ovarian cancer cells as compared to OSE cells derived from control patients. A subset of these differentially affected genes was selected and verified by quantitative real-time RT-PCR. Six of the gene products mapped to the OPHID protein-protein interaction database, and five were networked within two interacting partners. Basic leucine zipper transcription factor 2 (BACH2) and acetylcholinesterase (ACHE), which were up-regulated by androgen in OSEb cells relative to OSE cells, were further investigated using an ovarian cancer tissue microarray from a separate set of 149 clinical samples. Cytoplasmic ACHE and BACH2 immunostaining were both significantly increased in ovarian cancer relative to benign cases. High levels of cytoplasmic ACHE staining correlated with decreased survival, whereas nuclear BACH2 staining correlated with decreased time to disease recurrence. The finding that products of genes differentially responsive to androgen in OSEb cells may predict survival and disease progression supports a role for altered androgen effects in ovarian cancer. In addition to BACH2 and ACHE, this study highlights a set of potentially functionally related genes for further investigation in ovarian cancer.

Shi, W., C. Bastianutto, A. Li, B. Perez-Ordonez, R. Ng, K.-Y. Chow, W. Zhang, I. Jurisica, A. Bayley, J. Kim, B. O'Sullivan, L. Siu, E. Chen, F.-F. Liu. Multiple dysregulated pathways in nasopharyngeal carcinoma revealed by gene expression profiling, Int J Cancer, 119(10):2467-2475, 2006.

    Gene expression profiling was conducted using primary human nasopharyngeal carcinoma (NPC) biopsy samples to improve the understanding of the molecular pathways defining NPC and to identify novel potential therapeutic targets. RNA samples were extracted from 36 patients suspected to have NPC and hybridized onto the Affymetrix U133A chip. NPC was diagnosed in 19 patients, 11 had lymphoid hyperplasia (LH), and 6 were .normal. biopsies. Clinical stages for these NPC patients ranged from I.IV, including one M1. All NPC patients (except the M1) were treated with curative intent, which included radiotherapy alone (4 patients), or combined with chemotherapy (14 patients). Unsupervised clustering demonstrated a distinct NPC expression pattern, compared to normal biopsies. Subsequent Significance Analysis of Microarrays (SAM) derived from 14 NPC and 6 normal samples discovered 1089 differentially regulated genes. Pathway analyses revealed novel insights into the mechanisms leading to NPC, whereby up-regulation of NFkB2 and survivin play central roles in increasing resistance to apoptosis, and changes in integrin and WNT/b-catenin signaling leading to uncontrolled proliferation. The role of survivin in resisting apoptosis in NPC was confirmed by RNA interference. Our data provide novel insights into the development and progression of NPC, and suggest survivin as a novel therapeutic target for NPC.

Barsyte-Lovejoy, D., Lau, S.K., Boutros, P.C., Khosravi, F., Jurisica, I., Andrulis, I.L., Tsao, M.S., Penn, L.Z. The c-Myc oncogene directly induces H19 non-coding RNA by allele specific binding to potentiate tumorigenesis. Cancer Research, 66(10):1-8, 2006.

    The product of the MYC oncogene is widely deregulated in cancer and functions as a regulator of gene transcription. Despite an extensive profile of regulated genes, the transcriptional targets of c-Myc essential for transformation remain unclear. In this study we show that c-Myc significantly induces the expression of the H19 non-coding RNA in several diverse cell types including breast epithelial, glioblastoma and fibroblast cells. C-Myc binds to evolutionary conserved E boxes in the imprinting control region to facilitate histone acetylation and transcriptional initiation of the H19 promoter. In addition, c-Myc downregulates the expression of the IGF2, the reciprocally imprinted gene at the H19/IGF2 locus. Evidence shows that c-Myc regulates these two genes independently and does not affect the imprinting of H19. Indeed, allele-specific chromatin immunoprecipitation and expression analyses indicate that c-Myc binds and drives the expression of only the maternal H19 allele. The role of H19 in transformation is addressed using a knockdown approach and shows that downregulation of H19 significantly decreases breast and lung cancer cell clonogenicity and anchorage independent growth. In addition, c-Myc and H19 expression shows strong association in primary breast and lung carcinomas. This work indicates that c-Myc induction of the H19 gene product holds an important role in transformation.

Brierley, M., K. L. Marchington, I. Jurisica, E. N. Fish. The role of STAT2 in interferon inducible GAS-mediated gene transcription, FEBS J, 273(7):1569-1581, 2006.

    STAT2 is a critical component of interferon-. (IFN) signaling. To identify genes regulated by IFN-inducible STAT2-DNA binding, cDNA from IFN-treated cells expressing intact STAT2 or a DNA-binding mutant STAT2 were analyzed by Affymetrix microarrays. IFN-inducible expression of genes regulated by IFN-stimulated gene factor 3 (ISGF3), wherein STAT2 functions as a transactivator, 2 5. OAS, Mx, ISG15, 9-27, MHC-I, is similar in both cell types. Nineteen genes were identified whose expression was higher in IFN-treated cells expressing intact STAT2 compared with cells expressing the mutant STAT2. Using quantitative PCR, we confirmed that ISGF3-dependent gene transcription is unaffected in cells expressing mutant STAT2 but that a subset of IFN-inducible genes is differentially regulated in these cells: CLDN4, BF, DGFK, MSR1 and TLR3, containing .-activated sequence (GAS)-like elements in their 5. flanking sequences. Our data indicate that the DNA binding domain of STAT2 is required for full IFN-inducible activation of (GAS)-regulated target genes.

Kislinger, T. and I. Jurisica. Proteomics and bioinformatics in biomedical research. Cancer Genomics and Proteomics, 3(1):11-28, 2006.

    Proteomics, the science of globally detecting proteins in cells, tissues or organisms under defined conditions has highly benefited from recent developments in mass spectrometry (MS). It is now possible to detect hundreds to thousands of proteins with high confidence in a single experiment. In this review, we summarize the basic MS technologies currently used by laboratories around the world to identify proteins in complex biological samples. We further provide the reader with a short overview of useful separation strategies to minimize the initial complexity of biological samples, and the multitude of bioinformatics tools essential to manage large-scale proteomics data to obtain meaningful biological insight. Finally, we summarize recent advances in three main areas of medical proteomics; proteomics in cancer research, proteomics of the heart, and proteomics in diabetes research.

Przulj, N, D. G. Corneil, I. Jurisica. Efficient estimation of graphlet frequency distributions in protein-protein interaction networks. Bioinformatics, 22(8):974-980, 2006. Advance Access published on February 1, 2006; doi: doi:10.1093/bioinformatics/btl030

    Algorithmic and modeling advances in the area of protein-protein interaction (PPI) network analysis could contribute to the understanding of biological processes. Local structure of networks can be measured by the frequency distribution of graphlets, small connected non-isomorphic induced subgraphs. This measure of local structure has been used to show that high-confidence PPI networks have local structure of geometric random graphs. Finding graphlets exhaustively in a large network is computationally intensive. More complete PPI networks, as well as PPI networks of higher organisms, will thus require efficient heuristic approaches.

    We propose two efficient and scalable heuristics for finding graphlets in high-confidence PPI networks. We show that both PPI and their model geometric random networks, have defined boundaries that are sparser than the "inner parts" of the networks. In addition, these networks exhibit "uniformity" of local structure inside the networks. Our first heuristic exploits these two structural properties of PPI and geometric random networks to find good estimates of graphlet frequency distributions in these networks up to 690 times faster than the exhaustive searches. Our second heuristic is a variant of a more standard sampling technique and it produces accurate approximate results up to 377 times faster than the exhaustive searches. We indicate how the combination of these approaches may result in an even better heuristic.

Kotlyar, M. and I. Jurisica. Predicting protein-protein interactions by association mining. Information Systems Frontiers, 8: 37-47, 2006.

    Identifying protein-protein interactions is a key problem in molecular biology. Currently, interactions cannot be reliably predicted on a proteome-wide scale but direct and indirect evidence for interactions is increasingly available from high-throughput interaction detection methods, gene expression microarrays, and protein annotation projects. In this paper we propose an association mining approach to integrating these diverse types of evidence. We apply this approach to a number of datasets consisting of interacting and non-interacting protein pairs annotated with different types of evidence. We identify patterns that distinguish interacting and non-interacting protein pairs, and use these patterns to assign a confidence level to proposed interactions.

Seiden-Long, I. M., K. Brown, W. Shih, D. A. Wigle, N. Radulovich, I. Jurisica, M.-S. Tsao. Transcriptional targets of Hepatocyte Growth Factor signaling and Ki-ras oncogene activation in colorectal cancer, Oncogene, 25(1): 91-102, 2006.

    Both Ki-ras mutation and Hepatocyte Growth Factor (HGF) receptor Met overexpression occur at high frequency in colon cancer. This study investigated the transcriptional changes induced by Ki-ras oncogene and HGF-Met signaling activation in colon cancer cell lines in vitro and in vivo. The microarray global transcriptional profiling data demonstrate that changes induced by Met receptor activation overlap with those induced by Ki-ras oncogene. However, in the presence of Ki-ras mutation, the magnitude of transcriptional alterations in response to HGF-Met signaling in vitro and in vivo was attenuated. Overlapping genes between in vitro and in vivo microarray datasets were selected as a subset of HGF/Met and Ki-ras oncogene regulated targets, and were investigated further for validation. Using the Online Predicted Human Interaction Database (OPHID), we identified novel Met and Ki-ras regulated proteins and other functionally linked targets. . The novel proteins comprised histone acetyltransferase 1 (HAT1), phosphoribosyl pyrophosphate synthetase 2 (PRPS2), chaperonin containing TCP1, subunit 8 (CCT8), CSE1 chromosome segregation 1-like (yeast)/cellular apoptosis susceptibility (mammals) (CSE1L/CAS) and Cyclin H. The results demonstrate a strategy that may reveal novel pathways or mechanisms by which HGF/Met and Ki-ras oncogene signaling affects the biology of colon cancer cells.

Heisler L.E., Torti D., Boutros P.C., Watson J., Chan C., Winegarden N., Takahashi M., Yau P., Huang T.H., Farnham P.J., Jurisica I., Woodgett J.R., Bremner R., Penn L.Z., Der S.D. CpG Island microarray probe sequences derived from a physical library are representative of CpG Islands annotated on the human genome. Nucl. Acids Res. 33(9):2952-2961, 2005.

    An effective tool for the global analysis of both DNA methylation status and protein-chromatin interactions is a microarray constructed with sequences containing regulatory elements. One type of array suited for this purpose takes advantage of the strong association between CpG Islands (CGIs) and gene regulatory regions. We have obtained 20,736 clones from a CGI Library and used these to construct CGI arrays. The utility of this library requires proper annotation and assessment of the clones, including CpG content, genomic origin and proximity to neighboring genes. Alignment of clone sequences to the human genome (UCSC hg17) identified 9595 distinct genomic loci; 64% were defined by a single clone while the remaining 36% were represented by multiple, redundant clones. Approximately 68% of the loci were located near a transcription start site. The distribution of these loci covered all 23 chromosomes, with 63% overlapping a bioinformatically identified CGI. The high representation of genomic CGI in this rich collection of clones supports the utilization of microarrays produced with this library for the study of global epigenetic mechanisms and protein-chromatin interactions. A browsable database is available on-line to facilitate exploration of the CGIs in this library and their association with annotated genes or promoter elements.

M.Trus, R. L. Yang,. F. Suarez-Saiz, L. Bordeleau, I.Jurisica and M.D. Minden. The histone deacetylase inhibitor valproic acid alters sensitivity towards all trans retinoic acid in acute myeloblastic leukemia cells, Leukemia, 19(7):1161-1168, 2005.

    Acute myeloblastic leukemia (AML) may be classified in a number of ways. Using the French American British classification, the M3 form of the disease or acute promyelocytic leukemia (APL) has been found to be sensitive in vitro and in vivo to the retinoid all trans retinoic acid (ATRA). The mechanism for this is by restoration of normal gene expression through the release of histone deacetylase complexes (HDACs). In contrast to APL, other forms of AML are either nonresponsive or show blunted responses to ATRA. We evaluated if the inhibitor of HDAC activity, valproic acid (VPA), could mimic or enhance retinoid sensitivity in the AML cell line, OCI/AML-2, and clinical samples derived from patients with AML. An Affymetrix GeneChip experiment demonstrated that VPA modulated the expression of numerous genes in OCI/AML-2 cells that were not affected by ATRA including p21, a retinoid responsive gene in APL. VPA induced p21 expression in OCI/AML-2 cells and the majority of the AML samples tested; this was associated with cell cycle arrest and apoptosis not seen with ATRA alone. The addition of ATRA to VPA accentuated many of these responses, supporting the potential beneficial combination of these drugs in the treatment of AML.Leukemia advance online publication, 5 May 2005; doi:10.1038/sj.leu.2403773.

Soleymanlou, N., Jurisica, I., Nevo, O., Ietta, F., Zhang, X., Zamudio, S., Post, M. and Caniggia, I. Molecular evidence of placental hypoxia in preeclampsia.J Clin Endocr Metab, 907:(4299-308), 2005.

    Oxygen plays a central role in human placental pathologies including preeclampsia, a leading cause of fetal and maternal death and morbidity. Insufficient utero-placental oxygenation in preeclampsia is believed to be responsible for the molecular events leading to the clinical manifestations of this disease. Using high-throughput functional genomics, we determined the global gene expression profiles of placentae from high altitude pregnancies, a natural in vivo model of chronic hypoxia, as well as that of first trimester explants under 3% and 20% oxygen, an in vitro organ culture model. We next compared the genomic profile from these two models to that obtained from pregnancies complicated by preeclampsia. Microarray data was analyzed using the Binary Tree-Structured Vector Quantization (BTSVQ) algorithm, which is capable of generating global gene expression maps. Our data highlight a striking global gene expression similarity between 3% O2-treated explants, high altitude placentae and importantly placentae from preeclamptic pregnancies. We demonstrate herein the utility of explant culture and high altitude placenta as biologically-relevant and powerful models for studying the oxygen-mediated events in preeclampsia. Our results provide the first molecular evidence that aberrant global placental gene expression changes in preeclampsia are due to reduced oxygenation and that these events can successfully be mimicked in vivo and in vitro models of placental hypoxia.

Barrios-Rodiles, M., K. R. Brown, B. Ozdamar, R. Bose, Z. Liu, R. S. Donovan, F. Shinjo, Y. Liu, J. Dembowy, I. W. Taylor, V. Luga, N. Przulj, M. Robinson, H. Suzuki, Y. Hayashizaki, I. Jurisica, and J. L. Wrana. High-Throughput Mapping of a Dynamic Signaling Network in Mammalian Cells , Science 307:(5715): 1621-1625, 2005.

    Signaling pathways transmit information through protein interaction networks that are dynamically regulated by complex extracellular cues. We developed LUMIER (for luminescence-based mammalian interactome mapping), an automated high-throughput technology, to map protein-protein interaction networks systematically in mammalian cells and applied it to the transforming growth factor -B (TGFB) pathway. Analysis using self-organizing maps and k-means clustering identified links of the TGFB pathway to the p21-activated kinase (PAK) network, to the polarity complex, and to Occludin, a structural component of tight junctions. We show that Occludin regulates TGFB type I receptor localization for efficient TGFB-dependent dissolution of tight junctions during epithelial-to-mesenchymal transitions.

Arshadi, N. and I. Jurisica. Integrating case-based reasoning systems with data mining techniques for discovering and using disease biomarkers. IEEE Transactions on Knowledge and Data Engineering. Special Issue-Mining Biological Data. 17(8): 1127-1137, 2005. e-pub June 17, 2005.

    Case-based reasoning (CBR) is a suitable paradigm for class discovery in molecular biology, where the rules that define the domain knowledge are difficult to obtain, and the number and the complexity of the rules affecting the problem are too large for formal knowledge representation. To extend the capabilities of CBR, we propose mixture of experts for case-based reasoning (MOE4CBR), a method that combines an ensemble of CBR classifiers with spectral clustering and Logistic Regression. Our approach not only achieves higher prediction accuracy, but also leads to the selection of a subset of features that have meaningful relationships with their class labels.

    We evaluate MOE4CBR by applying the method to a CBR system called TA3 -- a computational framework for CBR systems. For two mass spectrometry data sets, the prediction accuracy improves from 80% to 93% and from 90% to 98.4%, respectively. We also apply the method to leukemia and lung microarray data sets with prediction accuracy improving from 65% to 74% and from 60% to 70%, respectively. Finally, we compare our list of discovered biomarkers with the lists of selected biomarkers from other studies for the mass spectrometry data sets.

Cumbaa, C. A. and I. Jurisica. Automatic classification and pattern discovery in high-throughput protein crystallization trials, Journal of Structural and Functional Genomics,6(2-3):195-202, 2005.

Conceptually, protein crystallization can be divided into two phases: search and optimization. Robotic protein crystallization screening can speed up the search phase, and has a potential to increase process quality.

Automated image classification helps to increase throughput and consistently generate objective results. Although the classification accuracy can always be improved, our image analysis system can classify images from 1536-well plates with high classification accuracy (85%) and ROC score (0.87), as evaluated on 127 human-classified protein screens` containing 5600 crystal images and 189472 non-crystal images.

Data mining can integrate results from high-throughput screens with information about crystallizing conditions, intrinsic protein properties, and results from crystallization optimization. We apply association mining, a data mining approach that identifies frequently occurring patterns among variables and their values. This approach segregates proteins into groups based on how they react in a broad range of conditions, and clusters cocktails to reflect their potential to achieve crystallization. These results may lead to crystallization screen optimization, and reveal associations between protein properties and crystallization conditions. We also postulate that past experience may lead us to the identification of initial conditions favorable to crystallization for novel proteins.

Brown, K. and I. Jurisica. Online Predicted Human Interaction Database OPHID, Bioinformatics, 21(9):2076-2082, 2005. Advance Access published on January 18, 2005. doi:10.1093/bioinformatics/bti273.

    Motivation: High-throughput experiments are being performed at an ever-increasing rate to systematically elucidate protein-protein interaction (PPI) networks for model organisms, while complexities of higher eukaryotes have prevented these experiments for humans.

    Results: The Online Predicted Human Interaction Database (OPHID) is a web-based database of predicted interactions between human proteins. It combines the literature-derived human PPI from BIND, HPRD and MINT, with predictions made from S. cerevisiae, C. elegans, D. melanogaster, and M. musculus. The 23,889 predicted interactions currently listed in OPHID are evaluated using protein domains, gene co-expression and Gene Ontology terms. OPHID can be queried using single or multiple IDs, and results can be visualized using our custom graph visualization program.

    Availability: Freely available to academic users at http://ophid.utoronto.ca, both in tab-delimited and PSI-MI formats. Commercial users, please contact I.J.

Blackhall FH, Pintilie M, Wigle DA, Jurisica I, Liu N, Radulovich N, Johnston MR, Keshavjee S, Tsao MS. Stability and heterogeneity of expression profiles in lung cancer specimens harvested following surgical resection. Neoplasia, 6(6):761-767, 2004.

    One of the major concerns in microarray profiling studies of clinical samples is the effect of tissue sampling and RNA extraction on data. We analyzed gene expression in lung cancer specimens that were serially harvested from tumor mass and snap-frozen at several intervals up to 120 minutes after surgical resection. Global gene expression was profiled on cDNA microarrays, and selected stress and hypoxia-activated genes were evaluated using real-time reverse transcription polymerase chain reaction (RT-PCR). Remarkably, similar gene expression profiles were obtained for the majority of samples regardless of the time that had elapsed between resection and freezing. Real-time RT-PCR studies showed significant heterogeneity in the expression levels of stress and hypoxia-activated genes in samples obtained from different areas of a tumor specimen at one time point after resection. The variations between multiple samplings were significantly greater than those of elapsed time between sampling/freezing. Overall samples snap-frozen within 30 to 60 minutes of surgical resection are acceptable for gene expression studies, thus making sampling and snap-freezing of tumor samples in a routine surgical pathology laboratory setting feasible. However, sampling and pooling from multiple sites of each tumor may be necessary for expression profiling studies to overcome the molecular heterogeneity present in tumor specimens.

Przulj, N., Corneil, D., Jurisica, I. Modeling interactome: Scale-free or geometric?, Bioinformatics, 20(18):3508-3515, 2004. Bioinformatics Advance Access published on July 29, 2004 Bioinformatics 2004; doi:10.1093/bioinformatics/bth436.

    Motivation: Networks have been used to model many real-world phenomena to better understand the phenomena and to guide experiments in order to predict their behavior. Since incorrect models lead to incorrect predictions, it is vital to have an improved model. As a result, new techniques and models for analyzing and modeling real-world networks have recently been introduced.

    Results: One example of large and complex networks involves protein-protein interaction (PPI) networks. We analyze PPI networks of yeast \emph{S. cerevisiae} and fruitfly \emph{D. melanogaster} using a newly introduced measure of local network structure as well as the standardly used measures of global network structure. We examine the fit of four different network models, including Erd\"{o}s-R\'{e}nyi, scale-free, and geometric random network models, to these PPI networks with respect to the measures of local and global network structure. We demonstrate that the currently popular scale-free model of PPI networks fails to fit the data in several respects and show that a random geometric model provides a much more accurate model of the PPI data. We hypothesize that only the noise in these networks is scale-free.

    Conclusions: We systematically evaluate how well different network models fit the PPI networks. We show that the structure of PPI networks is better modeled by a geometric random graph than by a scale-free model.

King, A. D., N. Przulj, Jurisica, I. Protein complex prediction via cost-based clustering. Bioinformatics,, 20(17):3013-3020, 2004. Bioinformatics Advance Access published on June 4, 2004 Bioinformatics 2004; doi:10.1093/bioinformatics/bth351.

    Motivation: When studying the workings of a biological cell, it is useful to be able to detect known and predict still undiscovered protein complexes within the cell's protein-protein interaction (PPI) network. Such predictions may be used as an inexpensive tool to direct biological experiments. The increasing amount of available PPI data necessitates a fast, accurate approach to protein complex identification.

    Results: We have developed the Restricted Neighbourhood Search Clustering Algorithm (RNSC) to efficiently partition networks into clusters using a cost function. We applied this cost-based clustering algorithm to PPI networks of S. cerevisiae, D. melanogaster, and C. elegans to identify and predict protein complexes. We also investigated functional and graph-theoretical properties of known complexes in the MIPS database, and by filtering clusters based on these properties, we attained a high matching rate between filtered clusters and true protein complexes.

    Conclusions: Our application of the cost-based clustering algorithm provides a scalable, accurate, and efficient method of detecting and predicting protein complexes within a PPI network.

Jiang Liu, Fiona Blackhall, Isolde Seiden-Long, Igor Jurisica, Roya Navab, Ni Liu, Nikolina Radulovich, Dennis Wigle, Muhajid Sultan, Jim Hu, Ming-Sound Tsao, and Michael R. Johnston. Modeling of lung cancer by an orthotopically growing H460SM variant cell line reveals novel candidate genes for systemic metastasis, Oncogene,23(37): 6316-6324, 2004.

Endobronchial implantation of NCI-H460 cells into the nude rat generates a primary lung tumor with mediastinal lymph node spread, but rarely systemic metastases. We isolated tumor cells from mediastinal nodes, orthotopically reimplanted the cells into nude rats and repeated this four times to derive a cell line, designated H460SM, that spontaneously metastasizes to bone, kidney, brain, soft tissue and contralateral lung. H460SM cells demonstrated higher invasive activity in vitro than parental NCI-H460 cells. Spectral karyotyping revealed a new inversion within 17q and loss of an extra normal copy of chromosome 14 present in parental NCI-H460 cells. Expression profiling of orthotopic primary tumors revealed differential expression of 360 genes. Of these, 173 were represented in the probe set of a 19.2K OCI cDNA microarray previously used to profile the gene expression of surgically resected lung cancer specimens. We have computationally validated clinical importance of these genes by using in silico analysis of 18 cases of pulmonary adenocarcinoma, which were split into two patient groups with markedly different clinical outcome. The model identifies additional novel candidate genes for the progression of lung cancer to systemic metastases and poor prognosis.

Fiona H. Blackhall, Dennis Wigle, Igor Jurisica, Melania Pintilie, Ni Liu, Gail Darling, Michael R. Johnston, Shaf Keshavjee, Thomas Waddel, Frances A. Shepherd and Ming-Sound Tsao. Validating the prognostic value of marker genes derived from a non-small cell lung cancer microarray study. Lung Cancer 46(2): 197-204. 2004.

We previously reported that our cDNA microarray analysis of primary non-small cell lung carcinoma (NSCLC) could predict for patients at increased risk of cancer recurrence. From the result of this analysis, we selected 11 genes that were considered candidate prognostic marker genes and used the realtime reverse transcription polymerase chain reaction (RT-PCR) to investigate their expression in the same set of NSCLC cases used in the microarray study. Cluster analysis of the realtime RT-PCR data separated these patients into two groups with significantly different disease-free survivals (log-rank test, [Formula: see text] ). In contrast, cluster analysis failed to confirm the prognostic significance of the realtime RT-PCR results for these 11 genes in a validation series of 92 NSCLC cases. In univariate analysis, hypoxia inducible factor 1alpha, Rho-GDP dissociation inhibitor (GDI) alpha (RhoGDI) and Citron/rho-interacting serine-threonine kinase 21 (Citron K21) were significant prognostic factors for disease-free survival in the entire cohort of 130 NSCLC patients, but none were significant in multivariate analysis. The results demonstrate that the prognostic significance of microarray (SAM) results can be partially validated using realtime RT-PCR, but secondary validation using larger and independent series of tumors is necessary to identify true prognostic marker genes.

Dennis A. Wigle, Ming Tsao, Igor Jurisica. Making sense of lung cancer gene expression profiles, Genome Biology, 5, 309.1-309.3, 2004.

Giles C. Warner, Patricia P. Reis, Igor Jurisica, Mujahid Sultan, Christina Macmillan, Nigel Beasley, Antti A. Makitie, Shilpi Arora, Mahadeo Sukhai, Reidar Grénman, Richard A. Wells, Dale Brown, Ralph Gilbert, Patrick Gullane, Jonathan Irish, Suzanne Kamel-Reid. Molecular classification of oral cancer by cDNA microarrays identifies overexpressed genes correlated with nodal metastasis. Inernational Journal of Cancer, 110:857-868, 2004.

Our purpose was to classify OSCCs based on their gene expression profiles, to identify differentially expressed genes in these cancers and to correlate genetic deregulation with clinical and histopathologic data and patient outcome. After conducting proof-of-principle experiments utilizing 6 HNSCC cell lines, the gene expression profiles of 20 OSCCs were determined using cDNA microarrays containing 19,200 sequences and the BTSVQ method of data analysis. We identified 2 sample clusters that correlated with the T3-T4 category of disease (p = 0.035) and nodal metastasis (p = 0.035). BTSVQ analysis identified a subset of 23 differentially expressed genes with the lowest QE scores in the cluster containing more advanced-stage tumors. Expression of 6 of these differentially expressed genes was validated by quantitative real-time RT-PCR. Statistical analysis of quantitative real-time RT-PCR data was performed and, after Bonferroni correction, CLDN1 overexpression was significantly correlated with the cluster containing more advanced-stage tumors (p = 0.007). Despite the clinical heterogeneity of OSCC, molecular subtyping by cDNA microarray analysis identified distinct patterns of gene expression associated with relevant clinical parameters. Application of this methodology represents an advance in the classification of oral cavity tumors and may ultimately aid in the development of more tailored therapies for oral carcinoma.

Acton, B.M., Jurisicova, A., Jurisica, I. and Casper, R.F. Alterations in mitochondrial membrane potential during preimplantation stages of murine and human embryo development. Molecular Human Reproduction, 10(1):23-32, 2004.

    Mitochondria are cellular organelles regulating metabolism and cell death pathways. This study examined changes in mitochondrial membrane potential (DYm) throughout the stages of preimplantation development in murine embryos conceived either in vivo or in vitro and human embryos donated to research from IVF. Embryos stained with the DYm sensitive dye (JC-1) were quantified for the ratio of highly to lowly polarized mitochondria using a deconvolution microscope. Overall, murine zygotes and early embryos contain a subset of highly polarized mitochondria with a progressive increase in the ratio of highly to lowly polarized mitochondria observed with increasing cleavage. A transient increase in the ratio of high to low DYm was observed in in vivo fertilized two-cell stage embryos, coincident with embryonic genome activation in the mouse, but not in two-cell embryos obtained through IVF. We further observed that arrested murine two-cell embryos possessed an increased ratio of highly to lowly polarized mitochondria compared to non-arrested embryos. In human eight cell embryos we observed an increased ratio of highly to lowly polarized mitochondria with increasing degrees of embryo fragmentation. We concluded that the pattern of DYm progressively changes throughout preimplantation development, and that an aberrant shift in DYm could contribute to or is associated with embryo abnormalities.

Przulj, N., Wigle, D., Jurisica, I. Functional topology in a network of protein interactions. Bioinformatics, 20(3):340-348, 2004.

    The building blocks of biological networks are individual protein-protein interactions (PPI). The cumulative PPI dataset in S. cerevisiae now exceeds 78,000. Studying the network of these interactions will provide valuable insight into the inner workings of cells.

    Results: We performed a systematic graph theory based analysis of this PPI network to construct computational models for describing and predicting the properties of lethal mutations and proteins participating in genetic interactions, functional groups, protein complexes, and signaling pathways. Our analysis suggests that lethal mutations are not only highly connected within the network, but they also satisfy an additional property: the ir removal causes a disruption in network structure. We also provide evidence for the existence of alternate paths that bypass viable proteins in PPI networks, while such paths do not exist for lethal mutations. In addition, we show that distinct functional classes of proteins have differing network properties. We also demonstrate a way to extract and iteratively predict protein complexes and signaling pathways. We evaluate the power of predictions by comparing them to a random model, and assess accuracy of predictions by analyzing their overlap with MIPS database.

    Conclusions: Our models provide a means for understanding the complex wiring underlying cellular function, and enable us to predict essentiality, genetic interaction, function, protein complexes and cellular pathways. This analysis uncovers structure-function relationships observable in a large PPI network.

Cumbaa, C., Lauricella, A., Fehrman, N., Veatch, C., Collins, R., Luft, J., DeTitta, G., Jurisica, I. Automatic classification of sub-microlitre protein crystallization trials in 1536-well plates, Acta Crystallographica Section D-Biological Crystallography D59(9):1619-1627, 2003.

    A technique for automatically evaluating microbatch (400 nL) protein crystallization trials is described. This method addresses analysis problems introduced at the sub-microlitre scale, including non-uniform lighting and irregular droplet boundaries. The droplet is segmented from the well using a loopy probabilistic graphical model with a two-layered grid topology. A vector of 23 features is extracted from the droplet image using the Radon transform for straight-edge features and a bank of correlation filters for microcrystalline features. Image classification is achieved by linear discriminant analysis of its feature vector. The results of the automatic method are compared to those of a human expert on 32 1536-well plates. Using the human-labeled images as ground truth, this method classifies images with 85% accuracy and a ROC score of 0.84. This result compares well with the experimental repeatability rate assessed at 87%. Images falsely classified as crystal-positive variously contain speckled precipitate resembling microcrystals, skin effects, or genuine crystals falsely labeled by the human expert. Many images falsely classified as crystal-negative variously contain very fine crystal features or dendrites lacking straight edges. A characterization of these misclassifications suggests directions for improving the method.

Breitkreutz, A., Boucher, L., Breitkreutz, B.J., Sultan, M., Jurisica, I., Tyers, M. Phenotypic and transcriptional plasticity directed by a yeast MAPK network. Genetics,165(3):997-1015, 2003.

    The yeast pheromone/filamentous growth MAPK pathway mediates both mating and invasive-growth responses. The interface between this MAPK module and the transcriptional machinery consists of a network of two MAPKs, Fus3 and Kss1, two regulators, Rst1 and Rst2 (a.k.a. Dig1 and Dig2) and two transcription factors, Ste12 and Tec1. Of sixteen possible combinations of gene deletions in FUS3, KSS1, RST1, and RST2 in the S1278 background, ten exhibited constitutive invasive-growth. Rst1 was the primary negative regulator of invasive growth, while other components either attenuated or enhanced invasive growth, depending on the genetic context. Despite activation of the invasive response by lesions at the same level in the MAPK pathway, transcriptional profiles of different invasive mutant combinations did not exhibit a unified program of gene expression. The distal MAPK regulatory network is thus capable of generating phenotypically similar invasive-growth states (an attractor) from different molecular architectures (trajectories) that can functionally compensate for one another. This systems level robustness may also account for the observed diversity of signals that trigger invasive-growth.

Janice Glasgow, Igor Jurisica and Burkhard Rost. Introduction to Special Issue on AI and Bioinformatics, Artificial Intelligence Magazine, 25(1):7-8, 2004.

I. Jurisica and J. Glasgow, Application of case-based reasoning in molecular biology. Artificial Intelligence Magazine, Special issue on Bioinformatics. 25(1):85-95, 2004.

    Case-Based Reasoning (CBR) is a computational reasoning paradigm that involves the storage and retrieval of past experiences to solve novel problems. It is an approach that is particularly relevant in scientific domains, where there is a wealth of data, but often a lack of theories or general principles. This paper describes several CBR systems that have been developed to carry out planning, analysis and prediction in the domain of molecular biology.

Jurisica, I., J. Mylopoulos, E. Yu. Ontologies for knowledge management: An information systems perspective. An International Journal of Knowledge and Information Systems, Special issue on Ontologies, 6(4):380-401, 2004.

    Knowledge management research focuses on concepts, methods, and tools supporting the management of human knowledge. The main objective of this paper is to survey basic concepts that have been used in Com-puter Science for the representation of knowledge and summarize some of their advantages and drawbacks. A secondary objective is to relate these techniques to Information Science theory and practice.

    The survey classifies the concepts used for knowledge representation into four broad ontological categories. Static ontologies describe static aspects of the world, i.e., what things exist, their attributes and relationships. A dynamic ontology, on the other hand, describes the changing aspects of the world in terms of states, state transitions and processes. Intentional ontologies encompass the world of things agents believe in, want, prove or disprove, and argue about. Finally, social ontologies cover social settings - agents, positions, roles, authority, permanent organizational structures or shifting networks of alliances and interdependencies.

Evangelou, A, Letarte, M., Jurisica, I., Sultan, M., Murphy, K., Rosen, B., Brown, T. Loss of coordinated androgen regulation in non-malignant ovarian epithelial cells with BRCA1/2 mutations and ovarian cancer cells. Cancer Research, 63:2416-2424, 2003.

    Epidemiological studies have implicated androgens in the etiology/ progression of epithelial ovarian cancer. Because normal and malignant ovarian epithelial cells are growth inhibited by transforming growth factor (TGF) beta, we tested the ability of 5alfa-dihydrotestosterone (DHT) to modulate this response and the expression of TGFbeta receptor types I and II. Cells derived from the ovarian surface epithelium of women undergoing oophorectomy (n = 7) for nonovarian indications or with a germ-line BRCA1 or 2 mutation (n = 9), and from the ascitic fluid of patients with primary ovarian cancer (n = 8) were cultured with and without DHT. Cell proliferation after TGF-beta1 or vehicle treatment was determined, and transcripts for TGF-beta receptors were measured by quantitative reverse transcription-PCR. As low levels of androgen receptor were observed in the cultures, we also measured transcript levels for steroid receptor coactivators SRC-1, ARA70, and AIB1. TGF-beta1 inhibited growth in 12 of 13 cultures tested, and DHT generally reversed this effect, demonstrating that androgens can block TGF-beta-induced growth inhibition in both malignant and nonmalignant ovarian epithelial cells. Transcripts for TGF-beta receptors, SRC-1, and ARA70 were found to be coordinately regulated by androgen in control cells, but not in either malignant or BRCA1/2-positive cell cultures. These findings raise the possibility that by modulating steroid receptor coactivator expression, androgen might affect other hormonal responses and contribute to the initiation of ovarian cancer.

Jurisica, I. and Wigle, D. Understanding biology through intelligent systems. Genome Biology, 3(11):Reports 4035.1-4035.4, 2002.

    A report on the Tenth International Conference on Intelligent Systems for Molecular Biology (ISMB), Edmonton, Canada, 3-7 August 2002.

Wigle, D., Jurisica, I., N. Radulovich, M. Pintilie, J. Rossant, N. Liu, C. Lu, J. Woodgett, I. Seiden, M. Johnston, S. Keshavjee, G. Darling, T. Winton, B. Breitkreutz, P. Jorgenson, M. Tyers, F. A. Shepherd, M.S. Tsao. Molecular profiling of non-small cell lung cancer and correlation with disease-free survival Cancer Research, 62(11):3005-3008, 2002.

    Recent studies have suggested that information from gene expression profiles could be used to develop molecular classifications of cancer. We hypothesized that expression levels of specific genes in operative specimens could be correlated to recurrence risk in non-small cell lung cancer (NSCLC). We performed expression profiling using 19.2K cDNA microarrays on tumor specimens from a total of 39 NSCLC patients with known clinical follow-up information. Statistical analysis and clustering approaches were used to determine patterns of gene expression segregating with clinical outcome. The results provide evidence that molecular subtyping of NSCLC can identify distinct profiles of gene expression correlating with disease-free survival.

Sultan, M., Wigle, D., Cumbaa, C., Maziarz, M., Glasgow, J., M.-S. Tsao, Jurisica, I. Binary tree-structured vector quantization approach to clustering and visualizing microarray data. Bioinformatics. Special Issue of ISMB'02, 18(Suppl. 1):S111-S119. 2002.

    Motivation: With the increasing number of gene expression databases, the need for more powerful analysis and visualization tools is growing. Many techniques have successfully been applied to unravel latent similarities among genes and/or experiments. Most of the current systems for microarray data analysis use statistical methods, hierarchical clustering, self-organizing maps, support vector machines, or k-means clustering to organize genes or experiments into meaningful groups. Without prior explicit bias almost all of these clustering methods applied to gene expression data not only produce different results, but may also produce clusters with little or no biological relevance. Of these methods, agglomerative hierarchical clustering has been the most widely applied, although many limitations have been identified.

    Results: Starting with a systematic comparison of the underlying theories behind clustering approaches, we have devised a technique that combines tree-structured vector quantization and partitive k-means clustering (BTSVQ). This hybrid technique has revealed clinically relevant clusters in three large publicly available data sets. In contrast to existing systems, our approach is less sensitive to data preprocessing and data normalization. In addition, the clustering results produced by the technique have strong similarities to those of self-organizing maps (SOMs). We discuss the advantages and the mathematical reasoning behind our approach.

    Availability: The BTSVQ system is implemented in Matlab R12 using the SOM toolbox for the visualization and preprocessing of the data. BTSVQ is available for non-commercial use (http://www.uhnres.utoronto.ca/ta3/BTSVQ).

Luft, J., Wolfley, J., Jurisica, I., Glasgow, J., Fortier, S., DeTitta, G.T. Macromolecular crystallization in a high throughput laboratory - the search phase. Journal of Crystal Growth, 232: 591-595, 2001.

    Macromolecular crystallization efforts are frequently divided into a search phase, during which approximate conditions are sought, and an optimization phase, when the approximate conditions are optimized to yield crystals of sufficient quality for diffraction work. Faced with the possibility that, on a yearly basis, many hundreds of proteins might be generated, both in our laboratories and at the laboratories of our collaborators, we have recently designed and commissioned a high throughput robotics lab designed for the search phase. The lab is capable of setting up and photographically evaluating over 60,000 microbatch crystallization experiments per week. In the first four months of operation we have set up crystallization experiments for more than one hundred proteins.

Jurisica, I., Rogers, P., Glasgow, J., Collins, R., Wolfley, J., Luft, J., DeTitta, G.T. Improving Objectivity and Scalability in Protein Crystallization: Integrating Image Analysis With Knowledge Discovery. IEEE Intelligent Systems Journal, Special issue on Intelligent Systems in Biology, 16(6): 26-34, 2001.

    This paper describes issues related to integrating image analysis techniques with knowledge discovery and case-based reasoning. Although the work is applicable to a number of problem domains, here we focus on the problem of analyzing and classifying outcomes of protein crystallization experiments in high-throughput structural genomics. We apply fast Fourier transform to analyze image content in order to extract important features of the spectrum. A combination of these features is used to classify crystallization experiments' outcomes. Although humans can analyze images more flexibly, a computational approach makes the process scalable and more objective. We evaluate the classification process and present results on how the automatically-extracted features can be combined to discover important crystallographic knowledge.

Wigle, D., Rossant, J., Jurisica, I. Mining mouse microarray data, Genome Biology, 2(7): 1019.1-1019.4, 2001.

    Microarrays of mouse genes are now available from several sources, and they have so far given new insights into gene expression in embryonic development, regions of the brain and during apoptosis. Microarray data posted on the internet can be reanalyzed to study a range of questions.

Jurisica, I., Rogers, P., Glasgow, J., Fortier, S., Luft, J., Wolfley, J., Bianca, M., Weeks, D., DeTitta, G.T. Intelligent Decision Support for Protein Crystal Growth. IBM Systems Journal, Special issue on Deep Computing for Life Sciences, 40(2): 394-409, 2001.

Genomic projects are producing hundreds of proteins a year for structural analysis. The challenge of the research described in this paper is to remove crystal growth experiments as a rate-limiting step in the enterprise of structure determination of proteins. We meet this challenge by combining a high-throughput crystallization setup and evaluation in the wet lab with a sophisticated algorithmic analysis of the outcomes in the computer lab. Furthermore, we apply techniques from knowledge management and artificial intelligence to develop an automated system that assists expert crystallographers in planning and evaluating novel crystal growth experiments. Fundamental to our computational approach to crystallization is a comprehensive information repository for crystal growth experiments. This stored information will be used to discover general rules or principles underlying the growth process for crystals, as well as to guide the reasoning algorithm for planning experiments.

The paper reports on the preliminary results in the wet lab and computation lab respectively. We define the problem, propose an architecture for intelligent decision support in the crystallization domain, and report on the status of the individual components of the architecture.

Jurisica, I., J. Glasgow, and J. Mylopoulos. Incremental Iterative Retrieval and Browsing for Efficient Conversational CBR Systems. International Journal of Applied Intelligence. 12(3): 251-268, 2000.

    A case base is a repository of past experiences that can be used for problem solving. Given a new problem, expressed in the form of a query, the case base is browsed in search of "similar" or "relevant" cases. Conversational case-based reasoning (CBR) systems generally support user interaction during case retrieval and adaptation. Here we focus on case retrieval where users initiate problem solving by entering a partial problem description. During an interactive CBR session, a user may submit additional queries to provide a  "focus of attention". These queries may be obtained by relaxing or restricting the constraints specified for a prior query. Thus, case retrieval involves the iterative evaluation of a series of queries against the case base, where each query in the series is obtained by restricting or relaxing the preceding query.

    This paper considers alternative approaches for implementing iterative browsing in conversational CBR systems. First, we discuss a naive algorithm, which evaluates each query independent of earlier evaluations. Second, we introduce an incremental algorithm, which reuses the results of past query evaluations to minimize the computation required for subsequent queries. In partiular, the paper proposes an efficient algorithm for case base browsing and retrieval using database techniques for incremental view maintenance. In addition, the paper evaluates the performance of the proposed algorithm with respect to alternative approaches considering two perspectives: (i) experimental efficiency evaluation using diverse application domains, and (ii) scalability evaluation using the performance model of the proposed system.

Jurisica, I., Mylopoulos, J., Glasgow, J., Shapiro, H., and Casper, R. F. Case-based reasoning in IVF: Prediction and knowledge mining. Artificial Intelligence in Medicine, 12(1), 1-24, 1998.

In vitro fertilization (IVF) is a medically-assisted reproduction technique, enabling infertile couples to achieve successful pregnancy. Given the unpredictability of the task, we propose to use a case-based reasoning system that exploits past experiences to suggest possible modifications to an IVF treatment plan in order to improve overall success rates. Once the system's knowledge base is populated with a sufficient number of past cases, it can be used to explore and discover interesting relationships among data, thereby achieving a form of knowledge mining. The article describes the TA3IVF system -- a case-based reasoning system which relies on context-based relevance assessment to assist in knowledge visualization, interactive data exploration and discovery in this domain. The system can be used as an advisor to the physician during clinical work and during research to help determine what knowledge sources are relevant for a treatment plan.

Jurisica, I. and J. Glasgow. Improving performance of case-based classification using context-based relevance. International Journal of Artificial Intelligence Tools. Special Issue of IEEE International Conf. on Tools with AI (ICTAI-96) Best Papers. 6(4):511-536, 1997.

Classification involves associating instances with particular classes by maximizing intra-class similarities and minimizing inter-class similarities. Thus, the way similarity among instances is measured is crucial for the success of the system. In case-based reasoning, it is assumed that similar problems have similar solutions. The case-based approach to classification is founded on retrieving cases from the case base that are similar to a given problem, and associating the problem with the class containing the most similar cases.

Similarity-based retrieval tools can advantageously be used in building flexible retrieval and classification systems. Case-based classification uses previously classified instances to label unknown instances with proper classes. Classification accuracy is affected by the retrieval process -- the more relevant the instances used for classification, the greater the accuracy.

The paper presents a novel approach to case-based classification. The algorithm is based on a notion of similarity assessment and was developed for supporting flexible retrieval of relevant information. Case similarity is assessed with respect to a given context that defines constraints for matching. Context relaxation and restriction is used for controlling the classification accuracy. The validity of the proposed approach is tested on real-world domains, and the system's performance, in terms of accuracy and scalability, is compared to that of other machine learning algorithms.

Updated January 2008