A Logic-Based Approach to XML Data Integration
Habilitation thesis, Wolfgang May,
Universität
Freiburg, April 2001
Abstract:
In this work, a logic-based framework for XML data integration is
proposed. XPath-Logic extends the XPath language with variable
bindings and embeds it into first-order logic, interpreted over an
edge-labeled graph-based data model. XPathLog is then the
Horn fragment of XPath-Logic, providing a Datalog-style, rule-based
language for manipulating and integrating XML data. In contrast to
other approaches, the XPath syntax and semantics is also used for a
declarative specification how the database should be updated:
when used in rule heads, XPath filters are interpreted as
specifications of elements and properties which should be added to
the database.
Due to the close relationship with XPath, the semantics of rules is
easy to grasp. In addition to the logic-based semantics of
XPath-Logic, we give an algebraic semantics for evaluating XPathLog
queries based on answer-sets. The formal semantics is defined wrt.
a graph-based model which covers the XML data model, tailored to the
requirements of XML data integration. It is not based on the notion
of XML trees, but represents an XML-style (i.e., based on elements
and attributes) database which simultaneously
represents individual, overlapping XML trees as views
of the database.
The ``pure'' XPathLog data model is extended with expressive
modeling concepts such as a class hierarchy, nonmonotonic
inheritance, and a lightweight signature concept. Information
integration in this approach is based on linking elements
from the sources into one or more result trees, creating
elements, fusing elements, and defining access paths by
synonyms. By these operations, the separate source trees are
developed into a multiply linked graph database in which one
or more result tree views can be distinguished by
projections. The combination of data and metadata reasoning is
supported by seamlessly adding XML Schema trees and even ontology
descriptions to the internal database.
XPathLog has been implemented in LoPiX. The practicability of the
approach is demonstrated by a case study which also serves as a
running example. The first part of the essay is dedicated to an
overview of the development of XML-related concepts which also
motivates the design decisions of the XPathLog framework.
|