


| September 19, 2002 | Mariano P. Consens, University of Waterloo |
| September 19, 2002 | Luc Segoufin, INRIA |
| September 23, 2002 | Victor Vianu, University of California, San Diego |
| September 24, 2002 | Victor Vianu, University of California, San Diego |
| September 30, 2002 | Yannis Papakonstantinou, University of California, San Diego |
| October 29, 2002 | Ronald Fagin, IBM Almaden Research Center, San Jose |
| November 12, 2002 | James Hendler, University of Maryland, College Park |
| December 3, 2002 | Michael Kifer, State University of New York, Stony Brook |
| January 27, 2003 | Juliana Freire, OGI School of Science and Engineering, Portland |
| February 14, 2003 | Mario Nascimento, University of Alberta, Edmonton |
| April 1, 2003 | Lise Getoor, University of Maryland |
| Title: | Making Fully-Indexed Data Pervasive |
| Speaker: |
Mariano P. Consens
University of Waterloo |
| Abstract: |
The information the world produces each year is growing exponentially,
going from one billion gigabytes a year in 2000 to over 50 billion
gigabytes by 2004. By then, 75% of the total information will be digital
assets that must be preserved unaltered. Increasingly, this fixed
content is stored in magnetic media and is available on-line. At the top
of the list of challenges created by this data explosion are
finding/analyzing information and managing the storage required at an
acceptable cost. In this talk we introduce fully-indexed data systems (or
FIDS), a class of data management software that addresses these
challenges. The four key characteristics of FIDS are that they:
(1) Provide index-based retrieval of all data (2) Fully exploit indexing during query evaluation (including arbitrary joins and higher order queries) (3) Index multiple content sources and facilitate the pervasive dissemination of indexes (4) Fully self-manage
We will also describe a novel index structure (Spindex) and associated
algorithms suitable for implementing FIDS, and we will survey a commercial
system based on Spindex.
|
| Bio: |
Mariano Consens research interests are in the areas of Data Management
Systems and the Web, Wireless Software and
Application Infrastructure, Pervasive Computing, and Trusted Data
Management. He has over 20 publications and two patents
granted by the US Patent Office. Mariano received his PhD (1994) and MSc
(1989) degrees in Computer Science from the
University of Toronto. He also holds a Computer Systems Engineer degree
from the Universidad de la Republica, Uruguay.
Mariano has been a Research Assistant Professor at the School of Computer
Science, University of Waterloo, until 1999. He
has been active in the software industry as a founder of Freedom
Intelligence (a query engine provider), CTO of Classwave
Wireless (a Bluetooth software infrastructure company) and technology
advisor for Xign (an electronic payment systems
supplier), OpenText (a knowledge management software vendor) and others.
|
| Title: | Parameterized Complexity and Conjunctive Queries |
| Speaker: |
Luc Segoufin
INRIA |
| Abstract: |
Evaluating a conjunctive query against a database is NP-complete, but
still it works well in practice. In this talk we will give a
theoretical explanation for this miracle.
In order to achieve this, we will briefly introduce parameterized
complexity, define a notion of tree likeness also known as tree width
and then prove that the more the query and|or the database looks
like a tree, the most efficient the evaluation will be.
|
| Bio: |
Dr. Segoufin is an INRIA researcher and a Verso member. He received his
PhD from University of Paris in 1999. His research interests include
database theory, logic, finite model theory and automata.
|
| Title: | Validating Streaming XML Documents |
| Speaker: |
Victor Vianu
University of California, San Diego |
| Abstract: |
This talk discusses some results on the on-line validation of
streaming XML documents with respect to a DTD, under memory
constraints. We first consider validation using constant memory,
formalized by a finite-state automaton ( FSA). We examine two flavors
of the problem, depending on whether or not the XML document is
assumed to be well-formed. The main results of the paper provide
conditions on the DTDs under which validation of either flavor can be
done using an FSA. For DTDs that cannot be validated by an FSA, we
investigate two alternatives. The first relaxes the constant memory
requirement by allowing a stack bounded in the depth of the XML
document, while maintaining the deterministic, one-pass
requirement. The second approach consists in refining the DTD to
provide additional information that allows validation by an FSA.
|
| Bio: |
Victor Vianu received his PhD in Computer Science from USC in 1983. Since
then, he has been on the faculty at UC San Diego and is now Professor of
Computer Science. His interests include database theory, logic and
complexity, and Web data. His most recent research focuses on static
analysis of XML queries, specification and verification of ecommerce
protocols, and spatial databases. Vianu's publications include numerous
research articles and a graduate textbook on database theory. He has
served as General Chair of SIGMOD and Program Chair of the PODS and ICDT
conferences, and is an editor of JACM and of the ACM Transactions on
Computational Logic.
|
| Title: | Databases and the Web: From Codd to XML |
| Speaker: |
Victor Vianu
University of California, San Diego |
| Abstract: |
Databases and the Web are organically connected at many levels. Web sites
are increasingly powered by databases. Collections of linked Web pages
distributed across the Internet are themselves targets for databases. The
emergence of XML as the lingua franca of the Web holds tremendous
potential for using database techniques to manage Web information.
This talk discusses some of the developments related to the Web from the
viewpoint of database theory. As will be argued, the Web scenario requires
revisiting some of the basic assumptions of the area. We discuss some of
the challenges and contributions of database theory to the formal
foundations of the Web, with special focus on XML. In particular, we
exhibit a powerful (and elegant) connection between XML Schemas and tree
automata, and between XML queries and tree transducers, and show how this
can be used in the static analysis of XML queries in the context of data
exchange and data integration.
|
| Bio: |
Victor Vianu received his PhD in Computer Science from USC in 1983. Since
then, he has been on the faculty at UC San Diego and is now Professor of
Computer Science. His interests include database theory, logic and
complexity, and Web data. His most recent research focuses on static
analysis of XML queries, specification and verification of ecommerce
protocols, and spatial databases. Vianu's publications include numerous
research articles and a graduate textbook on database theory. He has
served as General Chair of SIGMOD and Program Chair of the PODS and ICDT
conferences, and is an editor of JACM and of the ACM Transactions on
Computational Logic.
|
| Title: | Issues in XML Mediators |
| Speaker: |
Yannis Papakonstantinou
University of California, San Diego |
| Abstract: |
Business and scientific applications often require access, filtering,
integration and transformation of the data of multiple distributed and
heterogeneous information sources. Mediator systems address the need by
providing query-able integrated views of the source data. We focus on
mediators providing virtual XML views. When such a mediator receives an
XQuery on the XML view, it decomposes it into queries and requests that
are sent to the underlying sources and are compatible with their
abilities. The mediator presents a virtual query result to the client.
Consequently, client navigations in the virtual query result are reduced
into queries and navigations in the sources. We present a mediator
architecture, built around a tuple-oriented XML algebra. We discuss issues
related to
1. the navigation-driven evaluation of XML queries, 2. algebraic optimization of queries and the challenges of integration, 3. handling the limited and different capabilities of the various sources. Finally, we briefly overview a set of related issues that we have researched at the database lab of UCSD: * Inferring and Using XML Schema in Mediation * Interfaces for Querying XML & Semistructured Data/Views * Going Beyond Structured Queries: Fuzzy (Ranked) Queries & Information Discovery on the Semistructured Graph |
| Bio: |
Yannis Papakonstantinou serves on the Faculty of Computer Science and
Engineering at the University of California, San Diego, since 1996. His
research is in the intersection of database and Internet technologies.
Yannis has published over forty research articles in scientific
conferences and journals, given tutorials at major conferences, and served
on journal editorial boards and program committees for numerous
international conferences and symposiums. He was the co-Chair of WebDB
2002, is the General Chair of ACM SIGMOD 2003 and the Vice PC Chair for
the ``XML, Metadata and Semistructured Data'' track of IEEE ICDE 2004. In
1998, Yannis received the NSF CAREER award for his work on integrating
heterogeneous data. In 2000 Yannis founded Enosys Software, which provides
software for XML-based querying and integration of distributed sources.
Yannis holds a Diploma of Electrical Engineering from the National
Technical University of Athens and MS and Ph.D. in Computer Science from
Stanford University (1997).
|
| Title: | Optimal Aggregation Algorithms |
| Speaker: |
Ronald Fagin
IBM Almaden Research Center, San Jose |
| Abstract: |
Assume that each object in a database has m grades, or scores, one for
each of m attributes. For example, an object can have a color grade, that
tells how red it is, and a shape grade, that tells how round it is. For
each attribute, there is a sorted list, which lists each object and its
grade under that attribute, sorted by grade (highest grade first). Each
object is assigned an overall grade, that is obtained by combining the
attribute grades using a fixed monotone aggregation function (or combining
rule), such as min or average. Our goal is to find the top k objects,
while minimizing access to the lists. The problem arises in several
contexts. In particular, the speaker originally considered it for the
purpose of combining fuzzy information in a multimedia database system.
We define a very strong notion of optimality, which we call "instance optimality". An algorithm is instance optimal if, up to a constant factor, its performance is as good as every other (correct) algorithm, on every instance (in our case, on every database). We give an elegant and remarkably simple algorithm that is instance optimal for our problem (under some natural restrictions).
This research, which is joint with Amnon Lotem and Moni Naor, won the Best
Paper Award for the 2001 ACM Symposium on Principles on Database Systems.
The talk will be completely self-contained.
|
| Bio: |
Ronald Fagin is manager of the Foundations of Computer Science group at
the IBM Almaden Research Center in San Jose, California. He received his
B.A. in mathematics from Dartmouth College and his Ph.D. in mathematics
from the University of California at Berkeley. Much of his research has
focused on applications of logic to computer science.
He has published around 100 papers, and has coauthored a book on "Reasoning about Knowledge". He is a Fellow of the Institute of Electrical and Electronic Engineers, and a Fellow of the Association for Computing Machinery. He was named Docteur Honoris Causa by the University of Paris.
|
| Title: | The Semantic Web: What can it do? |
| Speaker: |
James Hendler
University of Maryland, College Park |
| Abstract: |
The World Wide Web is often referred to as a web of information, but is
it? When you ask a query on the web you get pointers to pages, not
answers. If you're looking for something beyond text, you're often unable
to find it. The next generation of the Web, already in the works, aims to
fix this by making more of the content on the web "understandable" to the
programs that help us find, filter and use what is out there. In this
talk, I will describe this new generation of the web, discuss some of the
technologies that will help to power it, and consider some of the ways in
which it may be used to create new and powerful web applications beyond
the capabilities of the current web.
|
| Bio: |
James Hendler is a Professor at the University of Maryland where he is the
Director for Semantic Web and Agent Technology at the Maryland Information
and Network Dynamics Laboratory. He has joint appointments in the
Department of Computer Science, the Institute for Advanced Computer
Studies and the Institute for Systems Research, and he is also an
affiliate of the Electrical Engineering Department. He has authored close
to 150 technical papers in areas including artificial intelligence,
robotics, agent-based computing and high performance processing. Hendler
was the recipient of a 1995 Fulbright Foundation Fellowship, is a member
of the US Air Force Science Advisory Board, and is a Fellow of the
American Association for Artificial Intelligence. He is also the former
Chief Scientist for Information Systems at the US Defense Advanced
Research Projects Agency (DARPA), and chairs the W3C's Web Ontology
Working Group.
|
| Title: | On the Semantics of Anonymous Identity and Reification |
| Speaker: |
Michael Kifer
State University of New York, Stony Brook |
| Abstract: |
Reification and anonymous resources are two of the more interesting
features of the Resource Description Framework (RDF) --- an emerging
standard for representing semantic information on the Web. Ironically,
when RDF was standardized by W3C over three years ago, it came without
a semantics. There is now growing understanding that a Semantic Web
language without a semantics is an oxymoron, and a number of efforts
are directed towards giving RDF a precise semantics.
In this paper we propose a simple semantics for reification and anonymous resources in F-logic --- a frame-based logic language, which is a popular formalism for representing and reasoning about semantic information on the Web.
The choice of F-logic (over RDF) as a basis for our semantics is
motivated by the fact that F-logic provides a comprehensive solution
for the problem of integrating frames, rules, and deduction, and it
has been shown to provide an effective inference service for RDF.
|
| Bio: |
Michael Kifer is a Professor with the Department of Computer Science,
State University of New York at Stony Brook (USA). He received his Ph.D.
in Computer Science in 1985 from the Hebrew University of Jerusalem,
Israel, and the M.S. degree in Mathematics in 1976 from Moscow University,
Russia.
Dr. Kifer's interests include database systems, knowledge representation, and Web information systems. He has published two text books and numerous articles in these areas. In 1999 and 2002 he was a recipient of the ACM-SIGMOD "Test of Time" awards for his works on object-oriented database languages.
|
| Title: | Bridging the XML-Relational Divide with LegoDB |
| Speaker: |
Juliana Freire
OGI School of Science and Engineering |
| Abstract: |
XML is becoming the predominant data exchange format in a variety of application domains (supply-chain, scientific data processing, telecommunication infrastructure, etc.). By relying on relational engines for storage purposes, XML developers can benefit from a complete set of data management services (including concurrency control, crash recovery, and scalability) and from the highly optimized relational query processors. However, due to the mismatch between the XML and the relational models and the many different ways to map an XML document into relations, it is very hard to tune a relational engine and ensure that XML queries will be evaluated efficiently. LegoDB is a cost-based XML storage mapping engine that automatically explores a space of possible XML-to-relational mappings and selects the best mapping for a given application. LegoDB leverages current XML and relational technologies: 1) it models the target application with an XML Schema, XML data statistics, and an XQuery workload; 2) the space of configurations is generated through XML-Schema rewritings; and 3) the best among the derived configurations is selected using cost estimates obtained through a standard relational optimizer. In this talk, I will give an overview of the LegoDB storage engine and present experimental results that demonstrate the effectiveness of this approach. I will also discuss recent work on two important facets of the LegoDB system: building statistical summaries of XML documents; and new efficient algorithms for searching the space of XML-to-relational mappings. Joint work with Philip Bohannon, Jayant Haritsa, Maya Ramanath, Prasan Roy and Jerome Simeon |
| Bio: |
Juliana Freire is an Assistant Professor at the Department of Computer
Science at the OGI School of Science and Engineering. Before joining
OGI, Juliana was a Member of the Technical Staff in the Network Data
and Services Department at Bell Laboratories, Lucent Technologies.
She received a BS from Federal University of Ceara (Brazil), and MS
and Ph.D. from the University at Stony Brook, all in Computer Science.
Her research interests span the areas of databases and Web
applications, and include data management issues related to Web data
(in particular XML), information integration, and ubiquitous access to
the Web.
|
| Title: | Effective and Efficient Region-based Image Retrieval |
| Speaker: |
Mario Nascimento
University of Alberta |
| Abstract: |
Content Based Image Retrieval (CBIR) is a challenging task. Current research works attempt to obtain and use the semantics of image to perform better retrieval. Towards this goal, segmentation of an image into regions has been used in recent years, since local properties of regions can help matching objects between images and thereby contribute towards a more effective CBIR. Our work improves on a CBIR technique, called SNL, that utilizes the regional properties of the images. In SNL each image is segmented and features including the color, shape, size and spatial position of the obtained regions are extracted. Regions are then compared using the Integrated Region Matching (IRM) distance measure, which is not a metric, which prevents the use of metric access structures or filtering techniques based on the triangle inequality. We overcome this issue, by using MiCRoM, a true metric distance to compare segmented images. This resulting approach, called SNL*, can be used in conjunction with a filtering technique to reduce substantially the number of images compared. Albeit metric-based, SNL* is computationally expensive. We address this drawback, in the SNL+ approach, where we replace the expensive metric distance in SNL* by the inexpensive original (non-metric) IRM distance. We found that one can still make use of the same filtering technique, at the expense of little loss in retrieval effectiveness. Thus, the main contribution of this work is SNL+, a very effective and highly efficient region-based image retrieval technique. Joint work with Veena Sridhar and Xiaobo Li |
| Bio: |
Mario A. Nascimento obtained his Ph.D. degree in Computer Science at Southern Methodist University's School of Engineering in 1996. Between 1989 and 1999 he was a researcher with the Brazilian Agency for Agricultural Research (Information Technology Center) and, between 1997 and 1999, he was also associated with the Institute of Computing of the State University of Campinas (Brazil). Since then he has been with the Department of Computing Science of the University of Alberta. He has published over forty papers in international conferences, journals and workshops, has served or is serving as Program Committee member of several conferences and as Program Chair or Co-chair for ACM MM 2001 MIR, IDEAS'02 and SPIRE'2003. He is also currently serving as member of the ACM SIGMOD DiSC Editorial Board and ACM SIGMOD Executive Board (as its Information Director). His main research interests lie in the area of Content-based Image Retrieval and Access Structures for Databases, and he is a member of ACM, SIGMOD, SIGIR, and IEEE Computer Society.
|
| Title: | Selectivity Estimation using Probabilistic Models |
| Speaker: |
Lise Getoor
University of Maryland |
| Abstract: |
Estimating the result size of complex queries that involve selection on multiple attributes and the join of several relations is a difficult but fundamental task in database query processing. It arises in cost-based query optimization, query profiling, and approximate query answering. In this work, we show how probabilistic graphical models can be effectively used for this task as an accurate and compact approximation of the joint frequency distribution of multiple attributes across multiple relations. Statistical Relational Models (SRMs) are a recent development that extends graphical statistical models such as Bayesian Networks to relational domains. They represent the statistical dependencies between attributes within a table, and between attributes across foreign-key joins. We provide an efficient algorithm for constructing an SRM from a database, and show how an SRM can be used to compute selectivity estimates for a broad class of queries. One of the major contributions of this work is a unified framework for the estimation of queries involving both select and foreign-key join operations. Furthermore, our approach is not limited to answering a small set of predetermined queries; a single model can be used to effectively estimate the sizes of a wide collection of potential queries across multiple tables. We present results for our approach on several real-world databases. For both single-table multi-attribute queries and a general class of select-join queries, our approach produces more accurate estimates than standard approaches to selectivity estimation, using comparable space and time. Joint work with Daphne Koller and Benjamin Taskar. |
| Bio: |
Lise Getoor recently joined the University of Maryland, College Park
as an assistant professor in the computer science department. She
received her PhD from Stanford University in December, 2001. Her
research interests include probabilistic models, link analysis,
machine learning and data mining. She has published papers on a
variety of topics including learning probabilistic models, utility
elicitation, on-line scheduling, constraint-based planning and machine
learning. Before pursuing her PhD at Stanford, she worked at
NASA-Ames Research Center as a research associate. She received her
M.S. in Computer Science from UC Berkeley in 1989 and her B.S. in
Computer Science from UC Santa Barbara in 1986. She is the recipient
of a National Physical Sciences Consortium fellowship and member of
ACM, AAAI and Tau Beta Pi.
|
Last modified: Tue Mar 25 18:53:46 EST 2003