Section: New Results

Ontology-Based Query Answering with Existential Rules

Participants : Jean-François Baget, Fabien Garreau, Mélanie König, Michel Leclère, Marie-Laure Mugnier, Swan Rocher, Federico Ulliana.

Ontolology-based query answering (and more generally Ontology-Based Data Access, OBDA) is a new paradigm in data management, which takes into account inferences enabled by an ontology when querying data. In other words, the notion of a database is replaced by that of a knowledge base, composed of data (also called facts) and of an ontology. In this context, existential rules (also called Datalog+) have been proposed to represent the ontological component [59] , [58] . This expressive formalism generalizes both description logics used in OBDA (such as ℰℒ and DL-Lite), which form the cores of so-called tractable profiles of the Semantic Web ontological language OWL2) and Datalog, the language of deductive databases. Since about five years, we have been studying the theoretical foundations of this framework (mainly concerning decidability and complexity) and developing associated algorithmic techniques. We have started the development of a platform dedicated to OBDA with existential rules (see section 5.2 ).

Before presenting this year' results, we recall the two classical ways of processing rules, namely forward chaining and backward chaining, also known as “materialization” and “query rewriting” in the OBDA setting. In forward chaining, the rules are applied to enrich the initial data and query answering can then be solved solved by evaluating the query against the “saturate” database (as in a classical database system i.e., with forgetting the rules). The backward chaining process can be divided into two steps: first, the initial query is rewritten using the rules into a first-order query (typically a union of conjunctive queries, UCQ); then the rewritten query is evaluated against the initial database (again, as in a classical database system). Since entailment is not decidable with general existential rules, both forward and backwards processes may not halt.

Improvement of Query Rewriting Algorithms

These last two years, we designed and implemented a query rewriting algorithm that takes as input a set of existential rules and a UCQ q and outputs a UCQ, which is a sound and complete rewriting of q, whenever such a rewriting exists [60] , [61] , [62] . This year's main improvement to this algorithm is the implementation of a unifier able to process rules without decomposing their head into single atoms. This improvement appeared to be have a very high impact on the efficiency of query rewriting (up to 274 quicker on an ontology where 32% of the rules have a head composed of two atoms instead of a single one). Beside, much effort has been devoted to experiments: to find appropriate benchmarks, to build a translator from the Semantic Web format OWL/OWL2 to our existential rule format dlgp (since most existing ontologies are available in OWL/OWL2 format), to select existing tools to compare with, run them, finally compare tools on several criteria.

  • Results partially published in the Semantic Web Journal [22] .

Query rewriting techniques have the interest of being independent from the data. However, a main bottleneck is that the size of the rewritten query can be exponential in the size of the original query, hence the produced rewriting maybe not usable in practice. A well-known source of combinatorial explosion are some very simple rules, which form the core of any ontology, typically expressing concept and relation hierarchies, concept properties and relation signatures. We have proposed a rewriting technique, which consists in compiling these rules into a preorder on atoms and embedding this preorder into the rewriting process. This allows to compute compact rewritings that can be considered as “pivotal” representations, in the sense that they can be easily translated into different kinds of queries that can be evaluated by different kinds of database systems. The provided algorithm computes a sound, complete and minimal UCQ rewriting, if one exists. Experiments show that this technique leads to substantial gains in the query rewriting process, in terms of size and runtime, and scales on very large ontologies (several ten thousands of rules).

  • Results not published yet. Reported in Mélanie König's PhD thesis [17] .

A Better Approximation of Chase Termination for Existential Rules and their Extension to Non-monotonic Negation

Forward chaining with existential rules is known as the chase in databases. Various acyclicity notions ensuring chase termination have been proposed in the knowledge representation and databases. Acyclicity conditions found in the literature can be classified into two main families: the first one constrains the way existential variables are propagated during the chase and the second one constrains dependencies between rules i.e., the fact that a rule may lead to trigger another rule. These conditions are based on different graphs, but all of them can be seen as forbidding “dangerous” cycles in the considered graph. We defined a new family of graphs that allows to unify and strictly generalize these acyclicity notions without increasing worst-case complexity.

Second, we considered the extension to existential rules with nonmonotonic negation under stable model semantics and further extended acyclicity results obtained in the positive case by exploiting negative information.

  • This work is part of Fabien Garreau and Swan Rocher's PhD theses. Results published at the European Conference on Artificial Intelligence (ECAI 2014)[30] (long version as an arXiv report) and at the Workshop on Non-monotonic Reasoning (NMR 2014) [31] .

Detailed Results and Complements on Query Answering under Greedy Bounded-Treewidth Sets of Existential Rules

The family of greedy bounded-treewidth sets of existential rules (gbts) is an expressive class of rules for which entailment is decidable. This decidability property relies on a structural property of the saturation by the chase (i.e., the set of inferred facts): for any initial set of facts, the saturation of these facts has a bounded treewidth (where the treewidth is computed on a graph associated with a set of atoms). Moreover, a tree decomposition of bounded width can be incrementally built during the chase. This family generalizes the important family of guarded existential rules, which itself generalizes Horn description logics used in OBDA.

In papers published at IJCAI 2011 and KR 2012, we studied the complexity of entailment under gbts rules as well as under known subclasses of gbts (with respect to data, combined and query complexity) and provided a generic algorithm with optimal worst-case complexity. This year, we finally completed a long report (75 pages) containing the detailed proofs of the results, some of them being very technical; in this report, we also clarified and reformulated the description of the generic algorithm, according to Michael Thomazo's PhD thesis (defended in October 2013); finally, we complemented the landscape of gbts classes by studying the complexity of all subclasses obtained by combining the syntactic criteria which define already known classes.

  • Results available as an arXiv report [56] . Submitted to a major journal in Artificial Intelligence. In collaboration with Sebastian Rudolph (TU Dresden) and Michael Thomazo (now postdoctoral student in Sebastian Rudolph's group).

Extracting Bounded-level Modules from Deductive RDF Triplestores

The Semantic Web is consolidating a legacy of well-established knowledge bases spanning from life sciences, to geographic data and encyclopedical repositories. Today, reusing knowledge and data available online is vital to ensure a coherent development of the Semantic Web, thereby capitalizing on the efforts made in the last years by many institutions and domain experts to publish quality information.

In this paper we studied how to extract modules from RDF knowledge bases equipped with Datalog inference rules, we called Deductive RDF Triplestores. A module is a Deductive RDF Triplestore entailed from the reference system, which is defined upon a restricted vocabulary (or signature). We proposed a new semantics for bounded-level modules allowing to control their size, and then presented extraction algorithms compliant with the novel semantics. This feature is helpful since many ontologies are extremely large, while users often need to reuse only a small part of resources in their work.

This work was partially carried out before the arrival of Federico Ulliana at GraphIK. For the future, we plan to study module extraction for knowledge bases equipped with existential rules, which extend the rules considered here.

  • Results published at the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI 15) [44] . In collaboration with Marie-Christine Rousset from LIG (University of Grenoble).

Axiomatisation of Consistent Query Answering via Belief Revision

This work takes place in the OBQA setting where a query is being asked over a set of knowledge bases defined over a common ontology. When the union of knowledge bases along with the ontology is inconsistent, several semantics have been defined which are tolerant to inconsistency. These semantics all rely on computing repairs, i.e., maximal (in terms of set inclusion) consistent subsets of the data set. They have been studied from a productivity point of view and a complexity point of view. We take a new point of view to define axiomatic characterisations of two such semantics, namely IAR (Intersection of All Repairs) and ICR ((Intersection of Closed Repairs). We argue that such characterisation can provide an alternative way of comparing the semantics and new insights into their properties. Furthermore such axiomatisation can be used when proposing a generalisation of inconsistency tolerant semantics. In order to provide the axiomatic characterisations we define belief revision operators that correspond to IAR and ICR.

  • Work published at [43] . In collaboration with Ricardo Rodriguez from University of Buenos Aires.