|
|
Active Research Projects
|
Clio
Clio is a tool for generating mappings (queries) between relational
and XML Schemas. The user is presented with the
structure and constraints of two schemas and is asked
to draw correspondences between the parts of the
schemas that represent the same real world
entity. Correspondences can also be inferred by Clio
and verified by the user. Given the two schemas and
the set of correspondences between them, clio can
generates the (SQL, XSLT, or XQueries) queries that
drive the translation of data conforming to the first
(source) schema to data conforming to the the second
(target) schema.
Clio technology forms the foundation for IBM's new Rational Data Architect.
|
|
|
ConQuer
Although integrity constraints have long been used to
maintain data consistency, there are situations in which
they may not be enforced or satisfied. ConQuer is a system
for efficient and scalable answering of SQL queries on
databases that may violate a set of constraints. ConQuer
permits users to postulate a set of key constraints together
with their queries. The system rewrites the queries to
retrieve all (and only) data that is consistent with respect
to the constraints. The rewriting is into SQL, so the
rewritten queries can be efficiently optimized and executed
by commercial database systems.
| |
|
Data Exchange
Data exchange is the problem of taking data structured under a source
schema and creating an instance of a target schema that reflects the
source data as accurately as possible. In this project, we address
foundational and algorithmic issues related to the semantics of data
exchange and to the query answering problem in the context of data
exchange. These issues arise because, given a source instance, there
may be many target instances that satisfy the constraints of the data
exchange problem, or none at all.
|
|
|
Hyperion
The Hyperion project investigates the data management issues that are raised by the peer-to-peer paradigm.
|
|
|
Limbo
The problem of clustering becomes more challenging when the data is
categorical, that is, when there is no inherent distance
measure between data values. LIMBO is a scalable hierarchical
categorical clustering algorithm that builds on the
Information Bottleneck (IB) framework for quantifying the
relevant information preserved when clustering. We use the IB
framework to define a distance measure for categorical
tuples. LIMBO handles large data sets by producing a memory
bounded summary model for the data.
|
|
|
ToMAS
To achieve interoperability, modern information systems and e-commerce applications use mappings to translate data from one
representation to another. In dynamic environments like the Web, data sources may change not only their data but also their
schemas, their semantics, and their query capabilities. Such changes must be reflected in the mappings. Mappings left inconsistent
by a schema change have to be detected and updated. Manually maintaining large numbers of mappings (even simple mappings like
view definitions) is becoming impractical. ToMAS investigates methods to automatically and incrimentally adapt the mappings
that are affected by a schema change when schemas evolve. In this process, it tries to preserve as much as possible design
decisions that are encoded in the mappings themselves.
|
|
|
ToX
Repository for XML data and metadata, which supports real and vitual XML documents.
|
|
|
ToXgene
ToXgene is a template-based generator for large, consistent collections of synthetic XML documents, developed as part of the
ToX (the Toronto XML Server) project. ToXgene was designed to be declarative, and thus speed up the data generation cycle;
general enough to produce fairly complex XML content; and powerful enough to capture the most common kinds of integrity constraints
in popular benchmarks.
|
|
|