


| August 30, 2000 | Hans-Peter Kriegel, University of Munich |
| September 19, 2000 | Joseph Hellerstein, University of California, Berkeley |
| October 5, 2000 | John Mylopoulos, University of Toronto |
| October 31, 2000 | Prabhakar Raghavan, Verity |
| November 7, 2000 | Hector Garcia-Molina, Stanford University |
| November 21, 2000 | Avi Gal, Rutgers University |
| December 5, 2000 | Jan Chomicki, SUNY Buffalo |
| March 13, 2001 | Mitch Cherniack, Brandeis University |
| May 1, 2001 | Peter Buneman, University of Pennsylvania |
| Title: | Knowledge Discovery and Similarity Search in Databases |
| Speaker: |
Hans-Peter Kriegel
University of Munich |
| Abstract: |
Both, the number and the size of spatial and multimedia
databases, such as geographic and image databases,
respectively, are rapidly growing because of the large
amount of data obtained from satellite images, X-ray
crystallography, computer tomography or other scientific
equipment. This growth by far exceeds human capacities to
analyze the databases in order to find implicit
similarities, regularities, rules or clusters hidden in the
data. Therefore, automated knowledge discovery and efficient
similarity search becomes more and more important.
The major difference between knowledge discovery in relational databases and in spatial databases is that attributes of the neighbors of some object of interest may have an influence on the object itself. Therefore, spatial data mining algorithms heavily depend on the efficient processing of neighborhood relations since the neighbors of many objects have to be investigated in a single run of a typical algorithm. We define a small set of database primitives and we demonstrate that typical spatial data mining algorithms such as clustering, characterization and trend detection are well supported by the proposed database primitives. In the second part of the talk we focus on similarity search in multimedia databases which is highly application- and user-dependent. Therefore we derive similarity models to be adaptable to application specific requirements and individual user preferences. Examples include flexible pixel-based shape similarity, 3D shape histograms and quadratic forms resulting in ellipsoid queries in high-dimensional data spaces. We show that these ellipsoid queries can be processed efficiently. Additionally, these queries support modification of the similarity matrix at query time. This is shown using some snapshots of our similarity search system. The talk concludes with some considerations of future research topics. |
| Bio: |
Hans-Peter Kriegel is a full professor for database systems in the
Institute for Computer Science at the University of Munich. His
research interests are in database systems, particularly in query
processing, performance issues, similarity search, high-dimensional
indexing, and in parallel systems. Data Exploration using
visualization led him to the area of knowledge discovery and data
mining. Kriegel received his MS and Ph.D. in 1973 and 1976,
respectively, from the University of Karlsruhe, Germany. Hans-Peter
Kriegel has been chairman and program committee member in many
international database conferences. He has published over 150 refereed
conference and journal papers.
|
| Title: | Adaptive Dataflow: Eddies and Rivers |
| Speaker: |
Joseph Hellerstein
|
| Abstract: |
Data movement is a central theme in modern computer systems. One
tradition in data movement comes from the networking community, which has
been responsible for the low-level delivery of uninterpreted messages
across unpredictable wide-area networks. Another tradition comes from the
database community, which developed architectures to flow data through
high-level operators on "shared-nothing" clusters of computers, and to
minimize data movement across the wide area. Traditionally, the
networking solutions have been very adaptive to changes in performance of
the network, but have paid little attention to the contents of the data or
the software using it. By contrast, the database solutions have taken
great advantage of understanding both the data and the operators through
which it flows, but have been relatively rigid in adapting to volatility
in performance.
At Berkeley, we have been working on new architectures for adaptive dataflow, which intelligently route data through high-level operators on a network, and adapt to performance volatility at a very fine grain. In this talk, I present Eddies and Rivers, two adaptive dataflow mechanisms we are using in the Telegraph system at Berkeley. Eddies adapt to volatility by flexibly routing data through operators in a dataflow; Rivers adapt by flexibly balancing data loads across machines in a network. I will discuss how these ideas are being used in the Telegraph system we are developing, for scenarios including sensor networks and a novel web query system. I will also raise a number of research challenges that we face in achieving fully adaptive dataflow systems. |
| Bio: |
Joseph Hellerstein is an associate professor of
computer science at the University of California, Berkeley. He is an
Alfred P. Sloan Research University of California, Berkeley. He is an
Alfred P. Sloan Research Fellow, a recipient of the NSF CAREER award,
and was recently named one of the top 100 young technology innovators
by MIT's Technology Review magazine. Prof. Hellerstein's research
centers on database and information management systems, with a focus
on federated and adaptive information systems, interactive analysis
and transformation of massive datasets, and generalized indexing and
indexability. Hellerstein is the Chief Scientist and co-founder of
Cohera Corporation, a vendor of electronic catalog software based on
federated data management technology. He received his Ph.D. from the
University of Wisconsin, Madison, a master's degree from UC Berkeley,
and a bachelor's degree from Harvard.
|
| Roundtable | Meet and talk with the speaker
at an informal roundtable discussion focussed on pedagogic issues in
database instructions.
|
| Title: | Technologies for Managing Knowledge: Premises, Principles and a Case Study |
| Speaker: |
John Mylopoulos, Attila Barta,
Raoul Jarvis, Patricia Rodrigues-Gianolli, Shun Zhou
|
| Abstract: |
The speakers will review some of the basic premises and principles of
Knowledge Management and relevant technologies including data
management, data mining and warehousing, document management and text
processing and search engines. The presentation will then focus on a
particular case study involving the development of tools to support
strategic business analysis.
Strategic business analysts keep track of trends relevant to their organization and its strategic objectives. To accomplish this mission, they follow news stories and other reports as they become available, to ensure that organizational objectives remain on track. This seminar presents the basic features of a prototype knowledge management system under development that is intended to support such analysts in their daily work. In particular, the system offers tools for building and analyzing models of strategic objectives and their inter-relationships. The system also offers facilities to assist analysts as they search for relevant material and to classify and annotate this material in a document repository. The seminar includes a demonstration of the current version of the prototype system under development.
|
| Title: | Networks and Sub-Networks in the World-Wide Web |
| Speaker: |
Prabhakar Raghavan
Chief Scientist and Vice President of Emerging Technologies Verity, Inc. |
| Abstract: | We discuss several structural properties of graphs arising from the world-wide web including the graph of hyperlinks, and the graph induced by connections between distributed search servents. We review a number of algorithmic investigations of the structure of these networks, and conclude by proposing stochastic models for the evolution of these networks. |
| Bio: |
Dr. Prabhakar Raghavan is the Chief Scientist and Vice President of
Emerging Technologies of Verity, Inc., a leading provider of
enterprise and Internet knowledge retrieval solutions
Most recently the head of the Computer Science Principles department at IBM's Almaden Research Center (San Jose, CA), Dr. Raghavan has 14 years of extensive experience in creating innovative technologies in Web, security and cryptography, and storage architecture. Prior to joining Verity, Dr. Raghavan headed the Computer Science Principles (CSP) department at IBM's Almaden Research Center. Under Raghavan's leadership, CSP achieved global recognition for its application of fundamental ideas in creating new technologies for the Web, security and cryptography, and storage architecture. Raghavan also led IBM's Clever project on Web search and mining, highly touted in the press and scientific literature for its influential ideas in Web information and community discovery. He also worked at IBM's T.J. Watson Research Center. Dr. Raghavan is Consulting Professor of Computer Science at Stanford University, and has also taught at Yale. He has authored over 80 papers in various fields including algorithms, optimization, Web search and databases, a recent article on Web Search in Scientific American, a popular textbook on Randomized Algorithms, and several patents. He serves on the editorial boards of several prestigious scientific journals including the Journal of the ACM and the SIAM Journal on Computing, on the external advisory boards of leading universities, on National Science Foundation panels, and on the executive board of ACM's Special Interest Group on Algorithms and Computation. He is a frequent speaker at conferences and technical events, and will give the plenary talk at the prestigious ACM SIGMOD Symposium on Principles of Database Systems, May 2000 in Dallas, where he is also co-winner of the Best Paper prize for his work on privacy in databases. Dr. Raghavan holds a PhD from the University of California at Berkeley and an undergraduate degree in electrical engineering from IIT in Madras.
|
| Roundtable: |
Meet and talk with the speaker at an informal roundtable
discussion.
|
| Title: | How to Crawl the Web |
| Speaker: |
Hector
Garcia-Molina
Stanford University |
| Abstract: |
A crawler collects large numbers of web pages, to be used for
building an index or for data mining. Crawlers consume significant
network and computing resources, both at the visited web servers and
at the site(s) collecting the pages, and thus it is critical to make
them efficient and well behaved. In this talk I will discuss how to
build a "good" crawler, addressing questions such as:
I will also summarize results from an experiment conducted on more than half million web pages over 4 months, to estimate how web pages evolve over time. Joint work with Junghoo Cho |
| Bio: |
Hector
Garcia-Molina is the Leonard Bosack and Sandra Lerner Professor in
the Departments of Computer Science and Electrical Engineering at
Stanford University, Stanford, California. From August 1994 to
December 1997 he was the Director of the Computer Systems Laboratory
at Stanford. From 1979 to 1991 he was on the faculty of the Computer
Science Department at Princeton University, Princeton, New Jersey.
His research interests include distributed computing systems and
database systems. He received a BS in electrical engineering from the
Instituto Tecnologico de Monterrey, Mexico, in 1974. From Stanford
University, Stanford, California, he received in 1975 a MS in
electrical engineering and a PhD in computer science in 1979.
Garcia-Molina is a Fellow of the ACM, received the 1999 ACM SIGMOD
Innovations Award, and is a member of the President's Information
Technology Advisory Committee (PITAC).
|
| Title: | An Authorization Model for Temporal Data |
| Speaker: |
Avi Gal
Rutgers University |
| Abstract: |
The use of temporal data has become wide-spread in recent years,
within applications such as data warehouses and spatiotemporal
databases. In this paper, we extend the basic authorization model by
facilitating it with the capability to express authorizations based on the
temporal attributes associated with data, such as transaction time and
valid time. In particular, a subject can specify authorizations based on
data validity or data update time, using either absolute or relative time
references. Such a specification is essential in providing access control
for predictive data, or in constraining access to data based on currency
considerations. We provide an expressive language for specifying such
access control to temporal data, using a variation of temporal logic for
specifying complex temporal constraints. We also introduce an easy-to-use
access control mechanism for stream data.
A joined work with Vijay Atluri, Rutgers University.
|
| Title: | Consistent Query Answers in Inconsistent Databases |
| Speaker: |
Jan Chomicki, SUNY Buffalo
|
| Abstract: |
As the amount of information available in online data sources
explodes, there is a growing concern about the consistency and quality
of answers to user queries. This talk addresses the issue of using
logical integrity constraints to gauge the consistency and quality of
query answers. Although it is impractical to enforce global integrity
constraints across different data sources and correct integrity
violations by updating individual sources, integrity constraints are
still useful because they express important semantic properties of
data.
This talk introduces formal notions of database repair and consistent query answer: A consistent answer is true in every minimal repair of the database. The information about answer consistency serves as an important indication of its quality and reliability. I will show a general method for transforming first-order queries in such a way that the transformed query computes consistent answers to the original query. I will characterize the scope of this method: the classes of queries and integrity constraints it handles. For functional dependencies, I will provide a complete characterization of the computational complexity of computing consistent query answers to scalar aggregation queries (those involving the SQL operators COUNT, MIN, MAX, SUM, or AVG). I will also show that if the set of functional dependencies is in Boyce-Codd Normal Form, the boundary between tractable and intractable can be pushed further using the tools of perfect graph theory. (Joint work with Marcelo Arenas and Leopoldo Bertossi, with a contribution from Vijay Raghavan and Jeremy Spinrad.)
|
| Title: | Profile-Driven Data Management |
| Speaker: |
Mitch Cherniack
|
| Abstract: |
Large amounts of data populate the Internet, yet it is largely
unmanaged. Access to this data requires a wide array of network
resources including server capacity, bandwidth, and cache space.
Applications compete for these resources with little overarching
support for its intelligent allocation.
The database community has seen this problem before. Data management techniques (e.g., indexing, data organization, buffer management) have long been used in DBMS's to offset the demands of limited resources made by competing applications' requests for data. Such techniques are application-driven; a database administrator (DBA) consults with users to determine their data needs, and tweaks the knobs of data management tools accordingly. But the Internet environment, consisting of autonomous data sources and global in scale, cannot be tamed by a DBA. We take the view that in this environment, the DBA role must be automated, and the specification of application-level data requirements must be made formal by way of processible user {\em profiles}. In this talk, I will discuss the Profile-Driven Data Management project: a multi-institution endeavor funded by the National Science Foundation in October, 2001. I will demonstrate the goals and scope of the project, present some early results, and pose some of the many challenges that lie ahead. Joint Work with: Stan Zdonik (Brown University) and Michael J. Franklin (UC Berkeley)
|
Last modified: Mon Dec 31 10:02:56 EST 2001