UofT Database Group LogoUofT Database Group Logo
UofT Database Group LogoUofT Database
Group Logo

2002-2003 Database Seminar Series

September 19, 2002 Mariano P. Consens, University of Waterloo
September 19, 2002 Luc Segoufin, INRIA
September 23, 2002 Victor Vianu, University of California, San Diego
September 24, 2002 Victor Vianu, University of California, San Diego
September 30, 2002 Yannis Papakonstantinou, University of California, San Diego
October 29, 2002 Ronald Fagin, IBM Almaden Research Center, San Jose
November 12, 2002 James Hendler, University of Maryland, College Park
December 3, 2002 Michael Kifer, State University of New York, Stony Brook
January 27, 2003 Juliana Freire, OGI School of Science and Engineering, Portland
February 14, 2003 Mario Nascimento, University of Alberta, Edmonton
April 1, 2003 Lise Getoor, University of Maryland

[Return to DB Group Seminar Page]

Thursday, September 19, 2002

Title: Making Fully-Indexed Data Pervasive
Speaker: Mariano P. Consens
University of Waterloo
Abstract: The information the world produces each year is growing exponentially, going from one billion gigabytes a year in 2000 to over 50 billion gigabytes by 2004. By then, 75% of the total information will be digital assets that must be preserved unaltered. Increasingly, this fixed content is stored in magnetic media and is available on-line. At the top of the list of challenges created by this data explosion are finding/analyzing information and managing the storage required at an acceptable cost. In this talk we introduce fully-indexed data systems (or FIDS), a class of data management software that addresses these challenges. The four key characteristics of FIDS are that they:
(1) Provide index-based retrieval of all data
(2) Fully exploit indexing during query evaluation (including arbitrary joins and higher order queries)
(3) Index multiple content sources and facilitate the pervasive dissemination of indexes
(4) Fully self-manage

We will also describe a novel index structure (Spindex) and associated algorithms suitable for implementing FIDS, and we will survey a commercial system based on Spindex.

Bio: Mariano Consens research interests are in the areas of Data Management Systems and the Web, Wireless Software and Application Infrastructure, Pervasive Computing, and Trusted Data Management. He has over 20 publications and two patents granted by the US Patent Office. Mariano received his PhD (1994) and MSc (1989) degrees in Computer Science from the University of Toronto. He also holds a Computer Systems Engineer degree from the Universidad de la Republica, Uruguay. Mariano has been a Research Assistant Professor at the School of Computer Science, University of Waterloo, until 1999. He has been active in the software industry as a founder of Freedom Intelligence (a query engine provider), CTO of Classwave Wireless (a Bluetooth software infrastructure company) and technology advisor for Xign (an electronic payment systems supplier), OpenText (a knowledge management software vendor) and others.
  • Location: BA1180
  • Time: 11:00AM
[Return to Toronto DB Seminar Index]



Thursday, September 19, 2002

Title: Parameterized Complexity and Conjunctive Queries
Speaker: Luc Segoufin
INRIA
Abstract: Evaluating a conjunctive query against a database is NP-complete, but still it works well in practice. In this talk we will give a theoretical explanation for this miracle.

In order to achieve this, we will briefly introduce parameterized complexity, define a notion of tree likeness also known as tree width and then prove that the more the query and|or the database looks like a tree, the most efficient the evaluation will be.

Bio: Dr. Segoufin is an INRIA researcher and a Verso member. He received his PhD from University of Paris in 1999. His research interests include database theory, logic, finite model theory and automata.
  • Location: PT378
  • Time: 12:00AM
[Return to Toronto DB Seminar Index]



Monday, September 23, 2002

Title: Validating Streaming XML Documents
Speaker: Victor Vianu
University of California, San Diego
Abstract: This talk discusses some results on the on-line validation of streaming XML documents with respect to a DTD, under memory constraints. We first consider validation using constant memory, formalized by a finite-state automaton ( FSA). We examine two flavors of the problem, depending on whether or not the XML document is assumed to be well-formed. The main results of the paper provide conditions on the DTDs under which validation of either flavor can be done using an FSA. For DTDs that cannot be validated by an FSA, we investigate two alternatives. The first relaxes the constant memory requirement by allowing a stack bounded in the depth of the XML document, while maintaining the deterministic, one-pass requirement. The second approach consists in refining the DTD to provide additional information that allows validation by an FSA.

Bio: Victor Vianu received his PhD in Computer Science from USC in 1983. Since then, he has been on the faculty at UC San Diego and is now Professor of Computer Science. His interests include database theory, logic and complexity, and Web data. His most recent research focuses on static analysis of XML queries, specification and verification of ecommerce protocols, and spatial databases. Vianu's publications include numerous research articles and a graduate textbook on database theory. He has served as General Chair of SIGMOD and Program Chair of the PODS and ICDT conferences, and is an editor of JACM and of the ACM Transactions on Computational Logic.
  • Location: PT378
  • Time: 12:00AM
[Return to Toronto DB Seminar Index]



Tuesday, September 24, 2002

Title: Databases and the Web: From Codd to XML
Speaker: Victor Vianu
University of California, San Diego
Abstract: Databases and the Web are organically connected at many levels. Web sites are increasingly powered by databases. Collections of linked Web pages distributed across the Internet are themselves targets for databases. The emergence of XML as the lingua franca of the Web holds tremendous potential for using database techniques to manage Web information.

This talk discusses some of the developments related to the Web from the viewpoint of database theory. As will be argued, the Web scenario requires revisiting some of the basic assumptions of the area. We discuss some of the challenges and contributions of database theory to the formal foundations of the Web, with special focus on XML. In particular, we exhibit a powerful (and elegant) connection between XML Schemas and tree automata, and between XML queries and tree transducers, and show how this can be used in the static analysis of XML queries in the context of data exchange and data integration.

Bio: Victor Vianu received his PhD in Computer Science from USC in 1983. Since then, he has been on the faculty at UC San Diego and is now Professor of Computer Science. His interests include database theory, logic and complexity, and Web data. His most recent research focuses on static analysis of XML queries, specification and verification of ecommerce protocols, and spatial databases. Vianu's publications include numerous research articles and a graduate textbook on database theory. He has served as General Chair of SIGMOD and Program Chair of the PODS and ICDT conferences, and is an editor of JACM and of the ACM Transactions on Computational Logic.
  • Location: BA1180
  • Time: 11:00AM
[Return to Toronto DB Seminar Index]



Monday, September 30, 2002

Title: Issues in XML Mediators
Speaker: Yannis Papakonstantinou
University of California, San Diego
Abstract: Business and scientific applications often require access, filtering, integration and transformation of the data of multiple distributed and heterogeneous information sources. Mediator systems address the need by providing query-able integrated views of the source data. We focus on mediators providing virtual XML views. When such a mediator receives an XQuery on the XML view, it decomposes it into queries and requests that are sent to the underlying sources and are compatible with their abilities. The mediator presents a virtual query result to the client. Consequently, client navigations in the virtual query result are reduced into queries and navigations in the sources. We present a mediator architecture, built around a tuple-oriented XML algebra. We discuss issues related to
1. the navigation-driven evaluation of XML queries,
2. algebraic optimization of queries and the challenges of integration,
3. handling the limited and different capabilities of the various sources.
Finally, we briefly overview a set of related issues that we have researched at the database lab of UCSD:
* Inferring and Using XML Schema in Mediation
* Interfaces for Querying XML & Semistructured Data/Views
* Going Beyond Structured Queries: Fuzzy (Ranked) Queries & Information Discovery on the Semistructured Graph

Bio: Yannis Papakonstantinou serves on the Faculty of Computer Science and Engineering at the University of California, San Diego, since 1996. His research is in the intersection of database and Internet technologies. Yannis has published over forty research articles in scientific conferences and journals, given tutorials at major conferences, and served on journal editorial boards and program committees for numerous international conferences and symposiums. He was the co-Chair of WebDB 2002, is the General Chair of ACM SIGMOD 2003 and the Vice PC Chair for the ``XML, Metadata and Semistructured Data'' track of IEEE ICDE 2004. In 1998, Yannis received the NSF CAREER award for his work on integrating heterogeneous data. In 2000 Yannis founded Enosys Software, which provides software for XML-based querying and integration of distributed sources. Yannis holds a Diploma of Electrical Engineering from the National Technical University of Athens and MS and Ph.D. in Computer Science from Stanford University (1997).
  • Location: BA5256
  • Time: 11:00AM
[Return to Toronto DB Seminar Index]



Tuesday, October 29, 2002

Title: Optimal Aggregation Algorithms
Speaker: Ronald Fagin
IBM Almaden Research Center, San Jose
Abstract: Assume that each object in a database has m grades, or scores, one for each of m attributes. For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells how round it is. For each attribute, there is a sorted list, which lists each object and its grade under that attribute, sorted by grade (highest grade first). Each object is assigned an overall grade, that is obtained by combining the attribute grades using a fixed monotone aggregation function (or combining rule), such as min or average. Our goal is to find the top k objects, while minimizing access to the lists. The problem arises in several contexts. In particular, the speaker originally considered it for the purpose of combining fuzzy information in a multimedia database system.

We define a very strong notion of optimality, which we call "instance optimality". An algorithm is instance optimal if, up to a constant factor, its performance is as good as every other (correct) algorithm, on every instance (in our case, on every database). We give an elegant and remarkably simple algorithm that is instance optimal for our problem (under some natural restrictions).

This research, which is joint with Amnon Lotem and Moni Naor, won the Best Paper Award for the 2001 ACM Symposium on Principles on Database Systems. The talk will be completely self-contained.

Bio: Ronald Fagin is manager of the Foundations of Computer Science group at the IBM Almaden Research Center in San Jose, California. He received his B.A. in mathematics from Dartmouth College and his Ph.D. in mathematics from the University of California at Berkeley. Much of his research has focused on applications of logic to computer science.

He has published around 100 papers, and has coauthored a book on "Reasoning about Knowledge". He is a Fellow of the Institute of Electrical and Electronic Engineers, and a Fellow of the Association for Computing Machinery. He was named Docteur Honoris Causa by the University of Paris.

  • Location: BA1180
  • Time: 11:00AM
[Return to Toronto DB Seminar Index]



Tuesday, November 12, 2002

Title: The Semantic Web: What can it do?
Speaker: James Hendler
University of Maryland, College Park
Abstract: The World Wide Web is often referred to as a web of information, but is it?  When you ask a query on the web you get pointers to pages, not answers.  If you're looking for something beyond text, you're often unable to find it.  The next generation of the Web, already in the works, aims to fix this by making more of the content on the web "understandable" to the programs that help us find, filter and use what is out there.  In this talk, I will describe this new generation of the web, discuss some of the technologies that will help to power it, and consider some of the ways in which it may be used to create new and powerful web applications beyond the capabilities of the current web.

Bio: James Hendler is a Professor at the University of Maryland where he is the Director for Semantic Web and Agent Technology at the Maryland Information and Network Dynamics Laboratory. He has joint appointments in the Department of Computer Science, the Institute for Advanced Computer Studies and the Institute for Systems Research, and he is also an affiliate of the Electrical Engineering Department. He has authored close to 150 technical papers in areas including artificial intelligence, robotics, agent-based computing and high performance processing. Hendler was the recipient of a 1995 Fulbright Foundation Fellowship, is a member of the US Air Force Science Advisory Board, and is a Fellow of the American Association for Artificial Intelligence. He is also the former Chief Scientist for Information Systems at the US Defense Advanced Research Projects Agency (DARPA), and chairs the W3C's Web Ontology Working Group.
  • Location: BA1180
  • Time: 11:00AM
[Return to Toronto DB Seminar Index]



Tuesday, December 3, 2002

Title: On the Semantics of Anonymous Identity and Reification
Speaker: Michael Kifer
State University of New York, Stony Brook
Abstract: Reification and anonymous resources are two of the more interesting features of the Resource Description Framework (RDF) --- an emerging standard for representing semantic information on the Web. Ironically, when RDF was standardized by W3C over three years ago, it came without a semantics. There is now growing understanding that a Semantic Web language without a semantics is an oxymoron, and a number of efforts are directed towards giving RDF a precise semantics.

In this paper we propose a simple semantics for reification and anonymous resources in F-logic --- a frame-based logic language, which is a popular formalism for representing and reasoning about semantic information on the Web.

The choice of F-logic (over RDF) as a basis for our semantics is motivated by the fact that F-logic provides a comprehensive solution for the problem of integrating frames, rules, and deduction, and it has been shown to provide an effective inference service for RDF.

Bio: Michael Kifer is a Professor with the Department of Computer Science, State University of New York at Stony Brook (USA). He received his Ph.D. in Computer Science in 1985 from the Hebrew University of Jerusalem, Israel, and the M.S. degree in Mathematics in 1976 from Moscow University, Russia.

Dr. Kifer's interests include database systems, knowledge representation, and Web information systems. He has published two text books and numerous articles in these areas. In 1999 and 2002 he was a recipient of the ACM-SIGMOD "Test of Time" awards for his works on object-oriented database languages.

  • Location: BA5256
  • Time: 2:00PM
[Return to Toronto DB Seminar Index]



Monday, January 27, 2003

Title: Bridging the XML-Relational Divide with LegoDB
Speaker: Juliana Freire
OGI School of Science and Engineering
Abstract:

XML is becoming the predominant data exchange format in a variety of application domains (supply-chain, scientific data processing, telecommunication infrastructure, etc.). By relying on relational engines for storage purposes, XML developers can benefit from a complete set of data management services (including concurrency control, crash recovery, and scalability) and from the highly optimized relational query processors. However, due to the mismatch between the XML and the relational models and the many different ways to map an XML document into relations, it is very hard to tune a relational engine and ensure that XML queries will be evaluated efficiently.

LegoDB is a cost-based XML storage mapping engine that automatically explores a space of possible XML-to-relational mappings and selects the best mapping for a given application. LegoDB leverages current XML and relational technologies: 1) it models the target application with an XML Schema, XML data statistics, and an XQuery workload; 2) the space of configurations is generated through XML-Schema rewritings; and 3) the best among the derived configurations is selected using cost estimates obtained through a standard relational optimizer. In this talk, I will give an overview of the LegoDB storage engine and present experimental results that demonstrate the effectiveness of this approach. I will also discuss recent work on two important facets of the LegoDB system: building statistical summaries of XML documents; and new efficient algorithms for searching the space of XML-to-relational mappings.

Joint work with Philip Bohannon, Jayant Haritsa, Maya Ramanath, Prasan Roy and Jerome Simeon

Bio: Juliana Freire is an Assistant Professor at the Department of Computer Science at the OGI School of Science and Engineering. Before joining OGI, Juliana was a Member of the Technical Staff in the Network Data and Services Department at Bell Laboratories, Lucent Technologies. She received a BS from Federal University of Ceara (Brazil), and MS and Ph.D. from the University at Stony Brook, all in Computer Science. Her research interests span the areas of databases and Web applications, and include data management issues related to Web data (in particular XML), information integration, and ubiquitous access to the Web.
  • Location: BA5256
  • Time: 11:00AM
[Return to Toronto DB Seminar Index]



Friday, February 14, 2003

Title: Effective and Efficient Region-based Image Retrieval
Speaker: Mario Nascimento
University of Alberta
Abstract:

Content Based Image Retrieval (CBIR) is a challenging task. Current research works attempt to obtain and use the semantics of image to perform better retrieval. Towards this goal, segmentation of an image into regions has been used in recent years, since local properties of regions can help matching objects between images and thereby contribute towards a more effective CBIR. Our work improves on a CBIR technique, called SNL, that utilizes the regional properties of the images. In SNL each image is segmented and features including the color, shape, size and spatial position of the obtained regions are extracted. Regions are then compared using the Integrated Region Matching (IRM) distance measure, which is not a metric, which prevents the use of metric access structures or filtering techniques based on the triangle inequality. We overcome this issue, by using MiCRoM, a true metric distance to compare segmented images. This resulting approach, called SNL*, can be used in conjunction with a filtering technique to reduce substantially the number of images compared. Albeit metric-based, SNL* is computationally expensive. We address this drawback, in the SNL+ approach, where we replace the expensive metric distance in SNL* by the inexpensive original (non-metric) IRM distance. We found that one can still make use of the same filtering technique, at the expense of little loss in retrieval effectiveness. Thus, the main contribution of this work is SNL+, a very effective and highly efficient region-based image retrieval technique.

Joint work with Veena Sridhar and Xiaobo Li

Bio: Mario A. Nascimento obtained his Ph.D. degree in Computer Science at Southern Methodist University's School of Engineering in 1996. Between 1989 and 1999 he was a researcher with the Brazilian Agency for Agricultural Research (Information Technology Center) and, between 1997 and 1999, he was also associated with the Institute of Computing of the State University of Campinas (Brazil). Since then he has been with the Department of Computing Science of the University of Alberta. He has published over forty papers in international conferences, journals and workshops, has served or is serving as Program Committee member of several conferences and as Program Chair or Co-chair for ACM MM 2001 MIR, IDEAS'02 and SPIRE'2003. He is also currently serving as member of the ACM SIGMOD DiSC Editorial Board and ACM SIGMOD Executive Board (as its Information Director). His main research interests lie in the area of Content-based Image Retrieval and Access Structures for Databases, and he is a member of ACM, SIGMOD, SIGIR, and IEEE Computer Society.
  • Location: BA5256
  • Time: 11:00AM
[Return to Toronto DB Seminar Index]

Tuesday, April 1, 2003

Title: Selectivity Estimation using Probabilistic Models
Speaker: Lise Getoor
University of Maryland
Abstract:

Estimating the result size of complex queries that involve selection on multiple attributes and the join of several relations is a difficult but fundamental task in database query processing. It arises in cost-based query optimization, query profiling, and approximate query answering. In this work, we show how probabilistic graphical models can be effectively used for this task as an accurate and compact approximation of the joint frequency distribution of multiple attributes across multiple relations. Statistical Relational Models (SRMs) are a recent development that extends graphical statistical models such as Bayesian Networks to relational domains. They represent the statistical dependencies between attributes within a table, and between attributes across foreign-key joins. We provide an efficient algorithm for constructing an SRM from a database, and show how an SRM can be used to compute selectivity estimates for a broad class of queries. One of the major contributions of this work is a unified framework for the estimation of queries involving both select and foreign-key join operations. Furthermore, our approach is not limited to answering a small set of predetermined queries; a single model can be used to effectively estimate the sizes of a wide collection of potential queries across multiple tables. We present results for our approach on several real-world databases. For both single-table multi-attribute queries and a general class of select-join queries, our approach produces more accurate estimates than standard approaches to selectivity estimation, using comparable space and time.

Joint work with Daphne Koller and Benjamin Taskar.

Bio: Lise Getoor recently joined the University of Maryland, College Park as an assistant professor in the computer science department. She received her PhD from Stanford University in December, 2001. Her research interests include probabilistic models, link analysis, machine learning and data mining. She has published papers on a variety of topics including learning probabilistic models, utility elicitation, on-line scheduling, constraint-based planning and machine learning. Before pursuing her PhD at Stanford, she worked at NASA-Ames Research Center as a research associate. She received her M.S. in Computer Science from UC Berkeley in 1989 and her B.S. in Computer Science from UC Santa Barbara in 1986. She is the recipient of a National Physical Sciences Consortium fellowship and member of ACM, AAAI and Tau Beta Pi.
  • Location: BA5256
  • Time: 11:00AM
[Return to Toronto DB Seminar Index]




Last modified: Tue Mar 25 18:53:46 EST 2003