Home
Links
People
Publications
|
Dynamic Schema Mapping and Data Integration
To achieve interoperability, information systems, e-commerce applications,
as well as program comprehension and reengineering tools use mappings to
translate data from one schema to another. Similarly, it is envisioned that
interoperability between data sources on the Semantic Web can be achieved by
defining mappings from data sources to ontologies. But both the generation and
maintenance of mappings in a dynamic environment like the web are laborious,
error prone and require expertise. We are attacking the schema mapping problem
from two angles.
In one approach, we have constructed two tools that deal with mappings
directly between schemas. One tool, Clio, generates such mappings and
another tool, ToMAS, manages
them. Both tools focus on mappings between any combination of XML and
relational schemas. Clio allows non-expert users to provide a high-level
specification of how two schemas correspond and automatically translates these
specifications into semantically meaningful queries that transform data
conforming to the source schema into data conforming to the target schema. The
process consists of two phases. In the first phase, the high-level
specifications, expressed as a set of inter-schema correspondences, are
translated into a set of mappings that capture the design choices made in the
two schemas. The design choices include the hierarchical organization of the
data as well as schema constraints (i.e., foreign key constraints). During the
second phase, these mappings are translated into queries (SQL, XQuery, or XSLT)
over the source schema that generate data to populate the target schema. An
important feature of the mapping algorithm is that it takes into consideration
target schema constraints in order to guarantee that the generated data will
not violate the integrity of the target schema.
In dynamic environments like the Web, not only may the data maintained by
information sources change, but so may their schemas, semantics, and query
capabilities. These changes must be reflected in the mappings. Mappings left
invalid or inconsistent by such changes must be detected and updated. As large,
complicated schemas become more prevalent, and as data is reused in more and
more applications, manually maintaining mappings (even simple mappings like
view definitions) is becoming impractical. ToMAS is a novel framework and tool
for automatically adapting mappings as schemas evolve. It continuously
monitors mappings and automatically detects mappings that are affected by
modifications of schemas. Such mappings are then rewritten in accordance with
the modified schemas. An important feature of ToMAS is that it treats mappings
and schemas as first class citizens of a repository and a query language. This
means that schemas and mappings can be used in queries, thus enabling their
management.
In the second approach, the focus is on semantic mappings from database
schemas to ontologies. Such mappings will help make the Semantic Web a reality
by facilitating access to the content of legacy databases made available on the
web -- part of the "deep web". Although ontologies with rich semantics
provide a way to improve
interoperability between heterogeneous data sources, constructing the
mappings is difficult, prone to error and may require both technical
and domain expertise. As part of the MAPONTO project, we are
developing a prototype interactive tool to address the problem of finding
logical connections between entire database tables and domain ontologies. The
tool works by first finding and expressing correspondences between table
columns and ontology components, and then using these correspondences and
heuristics to find logical formulas that are reasonable candidates to express
the semantic connections between the ontology components and the table. We have
evaluated our tool over existing relational schemas and ontologies, with good
results.
|