UofT Database Group LogoUofT Database Group Logo
UofT Database Group LogoUofT Database Group Logo

1999-2000 Database Seminar Series

October 8 , 1999

Title: Seamless Integration of Biological Applications into a Database Framework
Speaker: Thodoros Topaloglou
Data Logic, a division of Gene Logic Inc.
Abstract: There are more than two hundred biological data repositories available for public access, and a vast number of applications to process and interpret biological data. A major challenge for bioinformaticians is to extract and process data from multiple data sources using a variety of query interfaces and analytical tools. In this talk, I will describe tools that respond to this challenge by providing support for cross-database queries and for integrating analytical tools in a query processing environment. In particular, I will describe two alternative methods for integrating biological data processing within traditional database queries: (a) "light-weight" application integration based on Application Specific Data Types (ASDTs) and (b) "heavy-duty" integration of analytical tools based on mediators and wrappers. These methods are supported by the Object-Protocol Model (OPM) suite of tools for managing biological databases. This is jointed work with A. Kosky and V. Markowitz.
  • Location: DL Pratt 266
  • Time: 1:00pm
[Return to Toronto DB Seminar Index]

October 18, 1999 Monday

Title: The Design and Implementation of Microsoft Repository
Speaker: Philip A. Bernstein, Microsoft Research
Philip A. Bernstein is a Senior Researcher in the database group of Microsoft Research and a contributor to the Microsoft Repository product group, where he was Architect from 1994-1998. He has published over 90 articles and 3 books on database systems and related topics, and has contributed to many database system products, prototypes, and standards. His latest book is "Principles of Transaction Processing" (Morgan Kaufmann Publishers, 1996). He received his Ph.D. from University of Toronto in 1975.

Phil will also be giving a Colloquium on Tuesday, October 19th.

Abstract: Microsoft Repository is an object-oriented meta-data management facility that ships in Microsoft SQL Server and Visual Studio. It includes a repository engine that implements a set of object-oriented interfaces on top of a SQL database system. A developer can use these interfaces to define information models (i.e., schemas) and manipulate instances of the models. It also includes the Open Information Model, which is a set of information models that cover object modeling, database modeling, and component reuse. This talk presents an overview of the interfaces, implementation and applications of Microsoft Repository. We also focus on two interesting aspects of the latest release: version and configuration management and prefetching based on the context of recent navigational operations.
  • Location: Roseburgh (RS) 211
  • Time: 10:00am
Roundtable: Meet and talk with the speaker at an informal roundtable discussion.
  • Location: DL Pratt 378
  • Time: Tuesday, Oct 19th 2pm
[Return to Toronto DB Seminar Index]

October 21, 1999

Title: WHOWEDA - Warehouse of Web Data
Speaker: Sanjay Kumar Madria, Purdue University
Abstract: The growth of the internet has dramatically changed the way in which information is managed and accessed. Information on the WWW is important not only to individual users, but also to business organizations especially when decision making is concerned. These information are placed independently by different organization, thus documents containing related information may appear at different web-sites. It is established that serach engines have several limitations to retrieve useful information. To overcome limitations of search engines and provide the user with a powerful and friendly query mechanism for accessing information on the web, the critical problem is to find effective ways to build web data models and query languages. Also, to provide effective mechanism to manipulate these information of interest to garner additional useful information. The talk deals with the web data model, web algebra, query language and knowledge discovery in the context of WHOWEDA (Warehouse of Web Data) project. The key objective is to design and implement a web warehouse that materializes and manages useful information from the web. In particular, we discuss building a web warehouse using database approach of managing and manipulating the warehouse containing strategic information coupled from the web. Depending on time, some other aspects of our talk may include web change management, web data mining and discussion on important open issues.
  • Location: GB 248
  • Time: 10:00am
[Return to Toronto DB Seminar Index]

October 25, 1999 Monday

Title: Document resemblance and related issues
Speaker: Andrei Broder
Andrei Broder is chief technology officer of the AltaVista Search division in the AltaVista Company. Previously he was a senior member of the research staff at Compaq's Systems Research Center in Palo Alto, California. He graduated from Technion, Israel's Institute of Technology, and did his Ph.D. in Computer Science at Stanford University under Don Knuth. He has written and co-authored more than 60 scientific papers and numerous patents. His main research interests are the design, analysis, and implementation of probabilistic algorithms and supporting data structures, in particular in the context of web-scale applications.

Andrei will also be giving a Colloquium on Tuesday, October 26th.

Abstract: People often claim that two web pages are "the same" or "roughly the same", even though classic distances on strings (Hamming, Levenshtein, etc.) might indicate that the two pages are far apart.

To formalize these intuitive ideas we defined the mathematical concept of document resemblance. The resemblance can be estimated using a fixed size ``sketch'' for each document. For a large collection of documents (say 200 million) the size of this sketch is of the order of a few hundred bytes per document. However, for efficient large scale web indexing it is not necessary to determine the actual resemblance value: it suffices to determine whether newly encountered documents are duplicates or near-duplicates of documents already indexed; In other words, it suffices to determine whether the resemblance is above a certain threshold. We show how this determination can be made using a "sample" of less than 50 bytes per document.

The basic approach for computing resemblance has two aspects: first, resemblance is expressed as a set intersection problem, and second, the relative size of intersections is evaluated by a process of random sampling that can be done independently for each document. The process of estimating the relative size of intersection of sets and the threshold test discussed above can be applied to arbitrary sets, and thus might be of independent interest.

The ideas for filtering near-duplicate documents discussed here have been successfully implemented and are in current use in the context of the AltaVista search engine.

This talk is tilted towards ``algorithm engineering'' rather than ``algorithm analysis'' and very little mathematical background is required.

  • Location: Room 130, Wallberg Building. 184-200 College St.
    College and St George 1/2 block East of the Fields Institute.
  • Time: 3:40pm
Roundtable: Meet and talk with the speaker at an informal roundtable discussion.
  • Location: DL Pratt 378
  • Time: Tuesday, October 26th, 2:00pm
[Return to Toronto DB Seminar Index]

October 28, 1999

Title: Personalization of push technology
Speaker: Opher Etzion
Opher Etzion is an IBM Research staff member. He joined the IBM Research Laboratory in Haifa in 1997, where he leads the activities on reactive systems. In parallel he is an adjunct senior lecturer at the Technion, where he also supervises Ph.D. and M.Sc. theses. Prior to joining IBM, he has been a full-time faculty member at the Technion, where he has been the founding head of the information systems engineering area and graduate program. Prior to his graduate studies, he held professional and managerial positions in industry and in the Israel Air-Force. His research interests include active technology (active databases and beyond), temporal databases, middleware systems and rule-base systems. He is a member of the editorial board of the IIE Transactions Journal, was a guest editor in the Journal of Intelligent Information Systems in 1994, served as a coordinating organizer of the Dagstuhl seminar on Temporal databases in 1997, and is the co-editor of the book "Temporal Databases - Research and Practice" published by Springer-Verlag. He also served in many conference program committees as well as national committees and has been program and general chair in the NGITS workshop series. He is the program chair of CoopIS'2000.
Abstract: Personalization of knowledge distribution as a result of new events that are reported by various sources is one of the major challenges in contemporary Web-based information systems. According to the Personalization paradigm, each user should get the right information at the right time in the right form, where the notion of "right" is user dependent. Current technologies support the publish/subscribe paradigm, by which a subscriber may subscribe to events that it is interested in, among those broadcasted by various channels. The major weakness of this approach is that in many cases the user is not interested in the basic events, but in a combination of events that may be quite complex. Example: if within a period of two hours from the start of the trading day, either IBM stock or Microsoft stock is up in at least 3% more than the change in the Dow Jones index then send an immediate alert". The talk describes the "Amit" (Active Middleware Technology) project of IBM Research Laboratory in Haifa, and concentrates on the concept of reactive situation, the language to define such situations, the architecture and the utilization of this concept for perosnalization of the push technology for system management, electronic commerce, awareness systems and other applications.
  • Location: GB 248
  • Time: 10:00am
[Return to Toronto DB Seminar Index]

November 9, 1999

Title: Binary String Relations: A Foundation for Spatiotemporal Knowledge Representation
Speaker: Thanasis Hadzilacos, Univ. of Patras
Abstract: The paper is concerned with the qualitative representation of spatiotemporal relations. We initially propose a multiresolution framework for the representation of relations among 1D intervals, based on a binary string encoding. We subsequently extend this framework to multiple dimensions, thus allowing the description of spatiotemporal relations at various contexts. The feasible relations at a particular resolution level are inherently permeated by a poset structure, called conceptual neighbourhood, upon which we propose efficient relation inferencing mechanisms. Finally, we discuss the application of our model to spatiotemporal reasoning, which refers to the classic problems of satisfiability and deductive closure of a set of spatiotemporal assertions.

Keywords Spatiotemporal relations, spatiotemporal reasoning, conceptual neighbourhoods.

Work with Vasilis Delis

  • Location: DL Pratt 378
  • Time: 4:00pm
[Return to Toronto DB Seminar Index]

November 12, 1999

Title: Techniques for Online Exploration of Large Object-Relational Datasets
Speaker: Peter Haas, IBM Almaden Research Labs
Peter Haas received an S.B. in Engineering and Applied Mathematics from Harvard University in 1978. In 1979 he received an M.S. in Environmental Engineering from Stanford University. From 1979 to 1981 he was a Staff Scientist at Radian Corporation, where he performed air quality modeling studies for the EPA, Texas Air Control Board, and corporate clients. Three years later, he received an M.S. in Statistics from Stanford University. He received a Ph.D. in Operations Research from Stanford University in 1986 and, after a brief stint as an Assistant Professor in the Department of Decision and Information Sciences at Santa Clara University, he became a Research Staff Member at IBM Almaden Research Center in 1987, where he has been ever since. While working for IBM, he has conducted basic research on modeling and simulation of discrete-event stochastic systems. In the course of this work, he has developed a framework for comparing the modeling power of different formalisms for discrete-event systems, provided techniques for checking applicability of regenerative simulation methods and standardized-time-series methods to specific computer, manufacturing, and telecommuncation models, and developed new procedures for measurement and estimation of random delays. Other research projects have included development of sampling-based estimates of join selectivity in relational database management systems, stochastic models of load-balancing in parallel database systems, and techniques for powering down disk drives in PC's. In 1992-93 he spent a sabbatical year at the University of Wisconsin, where he was an Honorary Fellow at the Center for the Mathematical Sciences. In 1999, he was a visiting lecturer in the Department of Engineering Economic Systems and Operations Research at Stanford University, where he taught a graduate-level course in Simulation Methodology.
Abstract: We review techniques for exploring large object-relational datasets in an interactive online manner. The idea is to provide continuously updated running estimates of the final query results to the user, along with an indication of the precision of the estimates. The user can then halt the query as soon as the answer is sufficiently precise---the time required to obtain an acceptable approximate answer can be faster by orders of magnitude than the time needed to completely process the query. We describe methods for online processing of aggregation queries (SUM, AVG, VARIANCE, etc.), online visualization, and online display of a set of result records. By way of comparison, we also give a brief review of methods that use precomputed results to rapidly obtain approximate query answers.
  • Location: GB 248
  • Time: 10:00am
[Return to Toronto DB Seminar Index]
November 15, 1999, Monday
Title: Que Sera Sera: The Coincidental Confluence of Economics, Business, and Collaborative Computing
Speaker: Michael L. Brodie, GTE Labs
Dr. Michael L. Brodie is Sr. Technologist, GTE Technology Organization, Waltham, MA and Chief Scientist (SAP Program) at GTE Corporation. He works on large-scale strategic Information Technology (IT) challenges for GTE Corporation's senior executives. His industrial and research focus is on large-scale information systems - their total life cycle, business and technical contexts, core technologies, and "integration" within in a large scale, operational telecommunications environment. Dr. Brodie has authored over 120 books, chapters, journal articles and conference papers. He has presented keynote talks, invited lectures and short courses on many topics in over twenty-five countries.
Abstract: "The World Wide Web (WWW) changes everything" but how? Amazon.com and eBay as exemplars of e-business do not begin to suggest what is possible. Technologists often think of technology change in technological terms. Technology serves to realize more significant changes such as the way business is conducted. More radical and more fundamental changes are those related to new economic models that underly, predict, and enable new business models and which define technology requirements. The potential offered by the next generation of technology will take at least a decade to understand and realize, since it involves fundamental change not only in computing models and practice but also in business and economics. The current industrial revolution will also lead to significant social and political change.

This is a time of radical change in what appears to be the parallel worlds of technology, business, and economics. They are not parallel. The intimate relationship of these domains has previously resulted in collateral homeostasis due in part to their interdependence. Current changes in these domains are leading to collateral change. This presentation focuses on the confluence of these changes. As technologists, we see technology and related business change daily. Economic change is less obvious but more radical. Current technology is designed to support 400-year-old economic models, which involve managing within organizational boundaries. It is inadequate to support new economic models, which involve going beyond those boundaries.

This presentation explores the next generation of computing based on the confluence of radical and coincidental changes in economics, business, and technology. Whereas technology is a key enabler of change, it is the servant, not the master. Without a depth of understanding of this enabling role and the context in which technology serves, technology can be misguided and its developers can lose perspective. This presentation outlines a proposal made to the US President's Office of Science and Technology for technology research for the next decade, which calls for new computational models and infrastructure to support the next generation of computing, collaborative computing. A major focus to reconsider the role of data in computing. This is only one view. Que Sera, Sera

This presentation is in support of THE NETWORKING AND INFORMATION TECHNOLOGY RESEARCH AND DEVELOPMENT ACT, an Act being introduced in the House of representatives for funding basic research for the fiscal years 2000-2004.

  • Location: MB (Mining Building, 170 College) 128
  • Time: 10:00am
Roundtable: Meet and talk with the speaker at an informal roundtable discussion.
  • Location: DL Pratt 378
  • Time: 11am
[Return to Toronto DB Seminar Index]

February 10th, 2000

Title: RDBMS: From Fantasy to Infrastructure
Speaker: Bruce Lindsay
Dr. Bruce Lindsay is an expert on most aspects of Relational Database engines. He has contributed to many areas of database technology, including concurrency control and recovery technology, distributed and parallel architectures, and Object Relational extensions to SQL. Dr. Lindsay is an IBM Fellow and has participated in the development of Relational Database systems such as SystemR (1st full function RDBMS), R* (distributed RDBMS), Starburst (extensible RDBMS), and DB2 UDB.
Details: Sponsored by the IBM Technical Innovation Speaker Series
  • Location: SF1101
  • Time: 12:30pm
[Return to Toronto DB Seminar Index]

March 23rd, 2000

Title: What Next? A Few Remaining Problems in Information Technology
Speaker: Jim Gray, Microsoft Research
Jim Gray is a specialist in database and transaction processing computer systems. At Microsoft his research focuses on scaleable computing: building super-servers and workgroup systems from commodity software and hardware. Prior to joining Microsoft, he worked at Digital, Tandem, IBM and AT&T on database and transaction processing systems including Rdb, ACMS, NonStopSQL, Pathway, System R, SQL/DS, and DB2. He is editor of the Performance Handbook for Database and Transaction Processing Systems, and coauthor of Transaction Processing Concepts and Techniques. He did has PhD dissertation at Berkeley, is a Member of the National Academy of Engineering, Fellow of the ACM, Trustee of the VLDB Foundation, and Editor of the Morgan Kaufmann series on Data Management, a member of the National Research Council's Computer Science and Telecommunications Board, and a member of the President's Information Technology Advisor Committee. He received the 1998 ACM Allan M. Turing Award.
Abstract: Babbage's vision of computing has largely been realized. We are on the verge of realizing Bush's Memex. But, we are some distance from passing the Turing Test. These three visions and their associated problems have provided long-range research goals for many of us. For example, the Scaleability problem has motivated me for several decades. This talk defines a set of fundamental research problems that broaden the Babbage, Bush, and Turing visions. They extend Babbage's computational goal to include highly-secure, highly-available, self-programming, self-managing, and self-replicating systems. They extend Bush's Memex vision to include a system that automatically organizes, indexes, digests, evaluates, and summarizes information (as well as a human might). Another group of problems extends Turing's vision to include prosthetic vision, speech, hearing, and other senses. Each problem is simply stated and each is orthogonal from the others, though they share some common core technologies.
  • Location: MC 102 (Mechanical Engineering Building, 5 King's College Road)
    Campus Map
  • Time: 11:00am
Seminar: Jim Gray will be giving an additional seminar talk for the Database Research Group.
  • Abstract: Four topics selected by the audience on the spot.
  • Location: DL Pratt 266
  • Time: 2pm
[Return to Toronto DB Seminar Index]

March 30, 2000

Title: Architecture-Conscious Database Systems
Speaker: Natassa Ailamaki, University of Wisconsin
Abstract: Modern high-performance processors employ sophisticated techniques to overlap and simultaneously execute multiple computation and memory operations. Intuitively, these techniques should help database applications, which are becoming increasingly compute and memory bound. Unfortunately, recent research indicates that, unlike scientific workloads, database systems' performance has not improved commensurately with increases in processor speeds. As the gap between memory and processor speed widens, research on database systems has focused on minimizing memory latencies for isolated algorithms. However, in order to design high-performance database systems it is important to carefully evaluate and understand the interaction between the database software and the underlying hardware.

The first part of this talk introduces a framework for analyzing query execution time on a database system running on a server with a modern processor and memory architecture. Experiments with a variety of benchmarks show that database developers should (a) optimize data placement for the second level of data cache, (b) optimize instruction placement to reduce first-level instruction cache stalls, but (c) not expect the overall execution time to decrease significantly without addressing stalls related to subtle implementation issues (e.g., branch prediction).

The second part of the talk focuses on optimizing data placement for access to the second-level cache. Most commercial DBMSs store records contiguously on disk pages, using the slotted-page approach (NSM). During single attribute scan, NSM exhibits poor spatial locality and has a negative impact on cache performance. The decomposition storage model (DSM) has better spatial locality, but incurs a high record reconstruction cost. We introduce Partition Attributes Across (PAX), a new layout for data records that is applied orthogonally to NSM pages and offers optimized cache utilization with no extra space or time penalty.

  • Location: GB 248
  • Time: 11:00am
[Return to Toronto DB Seminar Index]

Friday, April 14, 2000

Title: Safe Deals Between Strangers
Speaker: Henry Gladney
IBM Almaden Research Center
Abstract: E-business, information serving, and ubiquitous computing will create heavy request traffic from strangers or even incognitos. Such requests must be managed automatically. Two ways of doing this are well known: giving every incognito consumer the same treatment, and rendering service in return for money. However, different behavior will be often wanted, e.g., for a university library with different access policies for undergraduates, graduate students, faculty, alumni, citizens of the same state, and everyone else. For a data or process server contacted by client machines on behalf of users not previously known, we show how to provide reliable automatic access administration conforming to service agreements. Implementations scale well from very small collections of consumers and producers to immense client/server networks. Servers can deliver information, effect state changes, and control external equipment.

Consumer privacy is easily addressed by the same protocol. We support consumer privacy, but allow servers to deny their resources to incognitos. A protocol variant even protects against statistical attacks by consortia of service organizations. One e-commerce application would put the consumer's tokens on a smart card whose readers are in vending kiosks. In e-business we can simplify supply chain administration. Our method can also be used in sensitive networks without introducing new security loopholes.

  • Location: DL Pratt 266
  • Time: 11:00am
[Return to Toronto DB Seminar Index]