Uni Goettingen Uni GOEIFIDBISW. May

A Logic-Based Approach to XML Data Integration

Habilitation thesis, Wolfgang May, Universität Freiburg, April 2001

Abstract:

In this work, a logic-based framework for XML data integration is proposed. XPath-Logic extends the XPath language with variable bindings and embeds it into first-order logic, interpreted over an edge-labeled graph-based data model. XPathLog is then the Horn fragment of XPath-Logic, providing a Datalog-style, rule-based language for manipulating and integrating XML data. In contrast to other approaches, the XPath syntax and semantics is also used for a declarative specification how the database should be updated: when used in rule heads, XPath filters are interpreted as specifications of elements and properties which should be added to the database.

Due to the close relationship with XPath, the semantics of rules is easy to grasp. In addition to the logic-based semantics of XPath-Logic, we give an algebraic semantics for evaluating XPathLog queries based on answer-sets. The formal semantics is defined wrt. a graph-based model which covers the XML data model, tailored to the requirements of XML data integration. It is not based on the notion of XML trees, but represents an XML-style (i.e., based on elements and attributes) database which simultaneously represents individual, overlapping XML trees as views of the database.

The ``pure'' XPathLog data model is extended with expressive modeling concepts such as a class hierarchy, nonmonotonic inheritance, and a lightweight signature concept. Information integration in this approach is based on linking elements from the sources into one or more result trees, creating elements, fusing elements, and defining access paths by synonyms. By these operations, the separate source trees are developed into a multiply linked graph database in which one or more result tree views can be distinguished by projections. The combination of data and metadata reasoning is supported by seamlessly adding XML Schema trees and even ontology descriptions to the internal database.

XPathLog has been implemented in LoPiX. The practicability of the approach is demonstrated by a case study which also serves as a running example. The first part of the essay is dedicated to an overview of the development of XML-related concepts which also motivates the design decisions of the XPathLog framework.