Home

Members

Projects
Publications
Seminars
Courses
Group Affairs
../htmldocs/Links

Active Research Projects

Clio

Clio is a tool for generating mappings (queries) between relational and XML Schemas. The user is presented with the structure and constraints of two schemas and is asked to draw correspondences between the parts of the schemas that represent the same real world entity. Correspondences can also be inferred by Clio and verified by the user. Given the two schemas and the set of correspondences between them, clio can generates the (SQL, XSLT, or XQueries) queries that drive the translation of data conforming to the first (source) schema to data conforming to the the second (target) schema.
Clio technology forms the foundation for IBM's new Rational Data Architect.

ConQuer

Although integrity constraints have long been used to maintain data consistency, there are situations in which they may not be enforced or satisfied. ConQuer is a system for efficient and scalable answering of SQL queries on databases that may violate a set of constraints. ConQuer permits users to postulate a set of key constraints together with their queries. The system rewrites the queries to retrieve all (and only) data that is consistent with respect to the constraints. The rewriting is into SQL, so the rewritten queries can be efficiently optimized and executed by commercial database systems.

Data Exchange

Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this project, we address foundational and algorithmic issues related to the semantics of data exchange and to the query answering problem in the context of data exchange. These issues arise because, given a source instance, there may be many target instances that satisfy the constraints of the data exchange problem, or none at all.

Hyperion

The Hyperion project investigates the data management issues that are raised by the peer-to-peer paradigm.

Limbo

The problem of clustering becomes more challenging when the data is categorical, that is, when there is no inherent distance measure between data values. LIMBO is a scalable hierarchical categorical clustering algorithm that builds on the Information Bottleneck (IB) framework for quantifying the relevant information preserved when clustering. We use the IB framework to define a distance measure for categorical tuples. LIMBO handles large data sets by producing a memory bounded summary model for the data.

ToMAS

To achieve interoperability, modern information systems and e-commerce applications use mappings to translate data from one representation to another. In dynamic environments like the Web, data sources may change not only their data but also their schemas, their semantics, and their query capabilities. Such changes must be reflected in the mappings. Mappings left inconsistent by a schema change have to be detected and updated. Manually maintaining large numbers of mappings (even simple mappings like view definitions) is becoming impractical. ToMAS investigates methods to automatically and incrimentally adapt the mappings that are affected by a schema change when schemas evolve. In this process, it tries to preserve as much as possible design decisions that are encoded in the mappings themselves.

ToX

Repository for XML data and metadata, which supports real and vitual XML documents.

ToXgene

ToXgene is a template-based generator for large, consistent collections of synthetic XML documents, developed as part of the ToX (the Toronto XML Server) project. ToXgene was designed to be declarative, and thus speed up the data generation cycle; general enough to produce fairly complex XML content; and powerful enough to capture the most common kinds of integrity constraints in popular benchmarks.