EN FR
EN FR


Section: New Results

Ontology-Mediated Query Answering

Participants : Jean-François Baget, Meghyn Bienvenu, Efstathios Delivorias, Michel Leclère, Marie-Laure Mugnier, Olivier Rodriguez, Federico Ulliana.

Ontolology-mediated query answering (OMQA) is the issue of querying data while taking into account inferences enabled by ontological knowledge. From an abstract viewpoint, this gives rise to knowledge bases, composed of an ontology and a factbase (in database terms: a database instance under incomplete data assumption). Answers to queries are logically entailed from the knowledge base.

This year, we obtained two kinds of results: theoretical results on fundamental issues raised by OMQA, and practical algorithms for OMQA on key-value stores and RDF integration systems.

Fundamental issues on OMQA with existential rules

Existential rules (a.k.a. datalog+, as this framework generalizes the deductive database language datalog) have emerged as a new ontological language in the OMQA context. Techniques for query answering under existential rules mostly rely on the two classical ways of processing rules, namely forward chaining and backward chaining. In forward chaining, known as the chase in database theory, the rules are applied to enrich the factbase and query answering can then be solved by evaluating the query against the saturated factbase (as in a classical database system, i.e., with forgetting the ontological knowledge). The backward chaining process is divided into two steps: first, the query is rewritten using the rules into a first-order query (typically a union of conjunctive queries, but it can be a more compact form) or into a datalog query; then the rewritten query is evaluated against the factbase (again, as in a classical database system). Depending on the considered class of existential rules, the chase and/or query rewriting may terminate or not.

Decidability of chase termination for linear existential rules.

Several chase variants have long been studied in database theory. These chase variants yield logically equivalent results, but differ in their ability to detect redundancies possibly caused by the introduction of unknown individuals (nulls, blank nodes). Given a chase variant, the chase termination problem takes as input a set of existential rules and asks if this set of rules ensures the termination of the chase for any factbase. It is well-known that this problem is undecidable for all known chase variants. Hence, a crucial issue is whether chase termination becomes decidable for some known subclasses of existential rules. We considered linear existential rules, a simple yet important subclass of existential rules that generalizes database inclusion dependencies. We showed the decidability of the chase termination problem on linear rules for three main chase variants, namely skolem (a.k.a. semi-oblivious), restricted (a.k.a. standard) and core chase. The restricted chase is the most used in practice, however its study is notoriously tricky because the order in which rule applications are performed matters. Indeed, for the same factbase, some restricted chase sequences may terminate, while others may not. To obtain our results, we introduced a novel approach based on so-called derivation trees and a single notion of forbidden pattern. The simplicity of these structures make them subject to implementation. Besides the theoretical interest of a unified approach and new proofs, we provided the first positive decidability results (and complexity upper bounds) concerning the termination of the restricted chase, proving that chase termination on linear existential rules is decidable for both versions of the problem: Does every chase sequence terminate? Does some chase sequence terminate?

  • ICDT 2019 [29]. In collaboration with Michael Thomazo (Inria VALDA).

Boundedness: Enforcing both chase termination and first-order rewritability.

We carried out the first studies on the boundedness problem for existential rules. This problem asks whether a given set of existential rules is bounded, i.e., whether there is a predefined bound on the “depth” of the chase independently from any factbase (for breadth-first chase versions, the depth corresponds to the number of breadth-first steps). It has been deeply studied in the context of datalog, where it is key to query optimization, although boundedness is undecidable in general. For datalog rules, boundedness is equivalent to a desirable property, namely first-order rewritability: a set of rules is called first-order rewritable if any conjunctive query can be rewritten into a union of conjunctive queries, whose evaluation on any factbase yields the expected answers (i.e., the relevant part of the ontology can be compiled into the rewritten query, which allows to reduce query answering to a simple query evaluation task). This equivalence does not hold for existential rules. Moreover, the notion of boundedness has to be parametrized by the chase variant, as they all behave differently with respect to termination. Beside potential practical use, the notion of boundedness is closely related to an interesting theoretical question on existential rules: what are the relationships between chase termination and first-order query rewritability? With respect to this question, we obtained the following salient result: for the oblivious and skolem (semi-oblivious) chase variants, a set of existential rules is bounded if and only if it ensures both chase termination for any factbase and first-order rewritability for any conjunctive query.

  • IJCAI 2019 [22]. In collaboration with Pierre Bourhis (Inria SPIRALS) and Sophie Tison (Inria LINKS).

Practical Algorithms for OMQA on key-value stores and RDF integration systems

Ontology-mediated query answering on top of key-value stores.

Ontology-mediated query answering was mainly investigated so far based on the assumption that data conforms to relational structures (we include here RDF) and that the paradigm can be deployed on top of relational databases with conjunctive queries at the core (e.g., in SQL or SPARQL). However, this is not the prominent way on which data is today stored and exchanged, especially in the Web. Whether OMQA can be developed for non-relational structures, like those shared by increasingly popular NOSQL languages sustaining Big-Data analytics, has just begun to be investigated. Since 2016, we have been studying OMQA for key-values stores, which are systems providing fast and scalable access to JSON records. We proposed a rule language to express domain knowledge, with rules being directly applicable to key-value stores, without any translation of JSON into another data model (results published at AAAI 2016 and IJCAI 2017). In 2018-2019, we implemented a prototype for MongoDB, with a restricted part of this rule language (featuring key inclusions and mandatory keys) and tree-pattern queries, and devised optimization techniques based on parallellizing query rewriting and query answering. This work is pursued within a starting PhD thesis (Olivier Rodriguez).

  • Rule-ML 2019 [31]. In collaboration with Reza Akbarinia (Inria ZENITH).

Ontology-mediated query answering in RDF integration systems

Within the iCODA project devoted to data journalism and the co-supervision of Maxime Buron's PhD thesis, we are considering the so-called Ontology-Based Data Access framework, which is composed of three components: the data level, the ontological level and mappings that relate data to facts described in the vocabulary of the ontology. Our framework more precisely considers heterogeneous data sources integrated through mappings into a (possibly virtual) RDF graph, provided with an RDFS ontology and RDFS entailment rules. The innovative aspects with respect to the state of the art are (i) SPARQL queries that extend classical conjunctive queries by the ability of querying data and ontological triples together, and (ii) Global-Local-As-View (GLAV) mappings, which can be seen as source-to-target existential rules. GLAV mappings enable the creation of unknown entities (blank nodes), which increases the amount of information accessible through the integration system. In particular, they allow one to palliate missing data values, by stating the existence of data whose values are not known in the sources. We devised, implemented and experimentally compared several query answering techniques in this setting.

  • ESWC 2019 [23], technical report [36] basis of a paper accepted to EDBT 2020. In collaboration with Maxime Buron and Ioana Manolescu (Inria CEDAR), and François Goasdoué (IRISA).