UofT Database Group LogoUofT Database Group Logo
UofT Database Group LogoUofT Database Group Logo

2000 Database Seminar Series

August 30, 2000 Hans-Peter Kriegel, University of Munich
September 19, 2000 Joseph Hellerstein, University of California, Berkeley
October 5, 2000 John Mylopoulos, University of Toronto
October 31, 2000 Prabhakar Raghavan, Verity
November 7, 2000 Hector Garcia-Molina, Stanford University
November 21, 2000 Avi Gal, Rutgers University
December 5, 2000 Jan Chomicki, SUNY Buffalo
March 13, 2001 Mitch Cherniack, Brandeis University
May 1, 2001 Peter Buneman, University of Pennsylvania

[Return to DB Group Seminar Page]

Wednesday, August 30, 2000

Title: Knowledge Discovery and Similarity Search in Databases
Speaker: Hans-Peter Kriegel
University of Munich
Abstract: Both, the number and the size of spatial and multimedia databases, such as geographic and image databases, respectively, are rapidly growing because of the large amount of data obtained from satellite images, X-ray crystallography, computer tomography or other scientific equipment. This growth by far exceeds human capacities to analyze the databases in order to find implicit similarities, regularities, rules or clusters hidden in the data. Therefore, automated knowledge discovery and efficient similarity search becomes more and more important.

The major difference between knowledge discovery in relational databases and in spatial databases is that attributes of the neighbors of some object of interest may have an influence on the object itself. Therefore, spatial data mining algorithms heavily depend on the efficient processing of neighborhood relations since the neighbors of many objects have to be investigated in a single run of a typical algorithm. We define a small set of database primitives and we demonstrate that typical spatial data mining algorithms such as clustering, characterization and trend detection are well supported by the proposed database primitives.

In the second part of the talk we focus on similarity search in multimedia databases which is highly application- and user-dependent. Therefore we derive similarity models to be adaptable to application specific requirements and individual user preferences. Examples include flexible pixel-based shape similarity, 3D shape histograms and quadratic forms resulting in ellipsoid queries in high-dimensional data spaces. We show that these ellipsoid queries can be processed efficiently. Additionally, these queries support modification of the similarity matrix at query time. This is shown using some snapshots of our similarity search system.

The talk concludes with some considerations of future research topics.

Bio: Hans-Peter Kriegel is a full professor for database systems in the Institute for Computer Science at the University of Munich. His research interests are in database systems, particularly in query processing, performance issues, similarity search, high-dimensional indexing, and in parallel systems. Data Exploration using visualization led him to the area of knowledge discovery and data mining. Kriegel received his MS and Ph.D. in 1973 and 1976, respectively, from the University of Karlsruhe, Germany. Hans-Peter Kriegel has been chairman and program committee member in many international database conferences. He has published over 150 refereed conference and journal papers.
  • Location: D.L. Pratt, Room 266
  • Time: 11:00AM
[Return to Toronto DB Seminar Index]

Tuesday, September 19th, 2000

Title: Adaptive Dataflow: Eddies and Rivers
Speaker: Joseph Hellerstein
Abstract: Data movement is a central theme in modern computer systems. One tradition in data movement comes from the networking community, which has been responsible for the low-level delivery of uninterpreted messages across unpredictable wide-area networks. Another tradition comes from the database community, which developed architectures to flow data through high-level operators on "shared-nothing" clusters of computers, and to minimize data movement across the wide area. Traditionally, the networking solutions have been very adaptive to changes in performance of the network, but have paid little attention to the contents of the data or the software using it. By contrast, the database solutions have taken great advantage of understanding both the data and the operators through which it flows, but have been relatively rigid in adapting to volatility in performance.

At Berkeley, we have been working on new architectures for adaptive dataflow, which intelligently route data through high-level operators on a network, and adapt to performance volatility at a very fine grain. In this talk, I present Eddies and Rivers, two adaptive dataflow mechanisms we are using in the Telegraph system at Berkeley. Eddies adapt to volatility by flexibly routing data through operators in a dataflow; Rivers adapt by flexibly balancing data loads across machines in a network. I will discuss how these ideas are being used in the Telegraph system we are developing, for scenarios including sensor networks and a novel web query system. I will also raise a number of research challenges that we face in achieving fully adaptive dataflow systems.

Bio: Joseph Hellerstein is an associate professor of computer science at the University of California, Berkeley. He is an Alfred P. Sloan Research University of California, Berkeley. He is an Alfred P. Sloan Research Fellow, a recipient of the NSF CAREER award, and was recently named one of the top 100 young technology innovators by MIT's Technology Review magazine. Prof. Hellerstein's research centers on database and information management systems, with a focus on federated and adaptive information systems, interactive analysis and transformation of massive datasets, and generalized indexing and indexability. Hellerstein is the Chief Scientist and co-founder of Cohera Corporation, a vendor of electronic catalog software based on federated data management technology. He received his Ph.D. from the University of Wisconsin, Madison, a master's degree from UC Berkeley, and a bachelor's degree from Harvard.
  • Location: SF 1105
  • Time: 11am
Roundtable Meet and talk with the speaker at an informal roundtable discussion focussed on pedagogic issues in database instructions.
  • Location: DL Pratt 266
  • Time: 2pm
[Return to Toronto DB Seminar Index]

October 5, 2000

Title: Technologies for Managing Knowledge: Premises, Principles and a Case Study
Speaker: John Mylopoulos, Attila Barta, Raoul Jarvis, Patricia Rodrigues-Gianolli, Shun Zhou
Abstract: The speakers will review some of the basic premises and principles of Knowledge Management and relevant technologies including data management, data mining and warehousing, document management and text processing and search engines. The presentation will then focus on a particular case study involving the development of tools to support strategic business analysis.

Strategic business analysts keep track of trends relevant to their organization and its strategic objectives. To accomplish this mission, they follow news stories and other reports as they become available, to ensure that organizational objectives remain on track. This seminar presents the basic features of a prototype knowledge management system under development that is intended to support such analysts in their daily work.

In particular, the system offers tools for building and analyzing models of strategic objectives and their inter-relationships. The system also offers facilities to assist analysts as they search for relevant material and to classify and annotate this material in a document repository. The seminar includes a demonstration of the current version of the prototype system under development.

  • Location: Architecture, Landscape and Design
    Second Floor Theatre, Room 103
    230 College Street (at Huron Street, one light east of Spadina)
  • Time: 4:30 to 6:00 p.m.
[Return to Toronto DB Seminar Index]

October 31, 2000

Title: Networks and Sub-Networks in the World-Wide Web
Speaker: Prabhakar Raghavan
Chief Scientist and Vice President of Emerging Technologies
Verity, Inc.
Abstract: We discuss several structural properties of graphs arising from the world-wide web including the graph of hyperlinks, and the graph induced by connections between distributed search servents. We review a number of algorithmic investigations of the structure of these networks, and conclude by proposing stochastic models for the evolution of these networks.
Bio: Dr. Prabhakar Raghavan is the Chief Scientist and Vice President of Emerging Technologies of Verity, Inc., a leading provider of enterprise and Internet knowledge retrieval solutions

Most recently the head of the Computer Science Principles department at IBM's Almaden Research Center (San Jose, CA), Dr. Raghavan has 14 years of extensive experience in creating innovative technologies in Web, security and cryptography, and storage architecture. Prior to joining Verity, Dr. Raghavan headed the Computer Science Principles (CSP) department at IBM's Almaden Research Center. Under Raghavan's leadership, CSP achieved global recognition for its application of fundamental ideas in creating new technologies for the Web, security and cryptography, and storage architecture. Raghavan also led IBM's Clever project on Web search and mining, highly touted in the press and scientific literature for its influential ideas in Web information and community discovery. He also worked at IBM's T.J. Watson Research Center.

Dr. Raghavan is Consulting Professor of Computer Science at Stanford University, and has also taught at Yale. He has authored over 80 papers in various fields including algorithms, optimization, Web search and databases, a recent article on Web Search in Scientific American, a popular textbook on Randomized Algorithms, and several patents. He serves on the editorial boards of several prestigious scientific journals including the Journal of the ACM and the SIAM Journal on Computing, on the external advisory boards of leading universities, on National Science Foundation panels, and on the executive board of ACM's Special Interest Group on Algorithms and Computation. He is a frequent speaker at conferences and technical events, and will give the plenary talk at the prestigious ACM SIGMOD Symposium on Principles of Database Systems, May 2000 in Dallas, where he is also co-winner of the Best Paper prize for his work on privacy in databases. Dr. Raghavan holds a PhD from the University of California at Berkeley and an undergraduate degree in electrical engineering from IIT in Madras.

  • Location: SF 1105
  • Time: 11:00am
Roundtable: Meet and talk with the speaker at an informal roundtable discussion.
  • Location: Pratt 266
  • Time: 2:00pm
[Return to Toronto DB Seminar Index]

Tuesday, November 7, 2000

Title: How to Crawl the Web
Speaker: Hector Garcia-Molina
Stanford University
Abstract: A crawler collects large numbers of web pages, to be used for building an index or for data mining. Crawlers consume significant network and computing resources, both at the visited web servers and at the site(s) collecting the pages, and thus it is critical to make them efficient and well behaved. In this talk I will discuss how to build a "good" crawler, addressing questions such as:
  • How can a crawler gather "important" pages only?
  • How can a crawler efficiently maintain its collection "fresh"?
  • How can a crawler be parallelized?

I will also summarize results from an experiment conducted on more than half million web pages over 4 months, to estimate how web pages evolve over time.

Joint work with Junghoo Cho

Bio: Hector Garcia-Molina is the Leonard Bosack and Sandra Lerner Professor in the Departments of Computer Science and Electrical Engineering at Stanford University, Stanford, California. From August 1994 to December 1997 he was the Director of the Computer Systems Laboratory at Stanford. From 1979 to 1991 he was on the faculty of the Computer Science Department at Princeton University, Princeton, New Jersey. His research interests include distributed computing systems and database systems. He received a BS in electrical engineering from the Instituto Tecnologico de Monterrey, Mexico, in 1974. From Stanford University, Stanford, California, he received in 1975 a MS in electrical engineering and a PhD in computer science in 1979. Garcia-Molina is a Fellow of the ACM, received the 1999 ACM SIGMOD Innovations Award, and is a member of the President's Information Technology Advisory Committee (PITAC).
  • Location: SF 1105
  • Time: 11am
[Return to Toronto DB Seminar Index]

Tuesday, November 21, 2000

Title: An Authorization Model for Temporal Data
Speaker: Avi Gal
Rutgers University
Abstract: The use of temporal data has become wide-spread in recent years, within applications such as data warehouses and spatiotemporal databases. In this paper, we extend the basic authorization model by facilitating it with the capability to express authorizations based on the temporal attributes associated with data, such as transaction time and valid time. In particular, a subject can specify authorizations based on data validity or data update time, using either absolute or relative time references. Such a specification is essential in providing access control for predictive data, or in constraining access to data based on currency considerations. We provide an expressive language for specifying such access control to temporal data, using a variation of temporal logic for specifying complex temporal constraints. We also introduce an easy-to-use access control mechanism for stream data.

A joined work with Vijay Atluri, Rutgers University.

  • Location: GB 221
  • Time: 10am
[Return to Toronto DB Seminar Index]

December 5, 2000

Title: Consistent Query Answers in Inconsistent Databases
Speaker: Jan Chomicki, SUNY Buffalo
Abstract: As the amount of information available in online data sources explodes, there is a growing concern about the consistency and quality of answers to user queries. This talk addresses the issue of using logical integrity constraints to gauge the consistency and quality of query answers. Although it is impractical to enforce global integrity constraints across different data sources and correct integrity violations by updating individual sources, integrity constraints are still useful because they express important semantic properties of data.

This talk introduces formal notions of database repair and consistent query answer: A consistent answer is true in every minimal repair of the database. The information about answer consistency serves as an important indication of its quality and reliability.

I will show a general method for transforming first-order queries in such a way that the transformed query computes consistent answers to the original query. I will characterize the scope of this method: the classes of queries and integrity constraints it handles. For functional dependencies, I will provide a complete characterization of the computational complexity of computing consistent query answers to scalar aggregation queries (those involving the SQL operators COUNT, MIN, MAX, SUM, or AVG). I will also show that if the set of functional dependencies is in Boyce-Codd Normal Form, the boundary between tractable and intractable can be pushed further using the tools of perfect graph theory.

(Joint work with Marcelo Arenas and Leopoldo Bertossi, with a contribution from Vijay Raghavan and Jeremy Spinrad.)

  • Location: SF 1105
  • Time: 11:00am
[Return to Toronto DB Seminar Index]

March 13, 2001 (Tuesday)

Title: Profile-Driven Data Management
Speaker: Mitch Cherniack
Abstract: Large amounts of data populate the Internet, yet it is largely unmanaged. Access to this data requires a wide array of network resources including server capacity, bandwidth, and cache space. Applications compete for these resources with little overarching support for its intelligent allocation.

The database community has seen this problem before. Data management techniques (e.g., indexing, data organization, buffer management) have long been used in DBMS's to offset the demands of limited resources made by competing applications' requests for data. Such techniques are application-driven; a database administrator (DBA) consults with users to determine their data needs, and tweaks the knobs of data management tools accordingly. But the Internet environment, consisting of autonomous data sources and global in scale, cannot be tamed by a DBA. We take the view that in this environment, the DBA role must be automated, and the specification of application-level data requirements must be made formal by way of processible user {\em profiles}.

In this talk, I will discuss the Profile-Driven Data Management project: a multi-institution endeavor funded by the National Science Foundation in October, 2001. I will demonstrate the goals and scope of the project, present some early results, and pose some of the many challenges that lie ahead.

Joint Work with: Stan Zdonik (Brown University) and Michael J. Franklin (UC Berkeley)

  • Location: SF1105
  • Time: noon
[Return to Toronto DB Seminar Index]

Last modified: Mon Dec 31 10:02:56 EST 2001