Section: New Results

Logics and Graph-Based Languages for Ontology-Mediated Query Answering

Participants : Jean-François Baget, Meghyn Bienvenu, Efstathios Delivorias, Michel Leclère, Marie-Laure Mugnier, Federico Ulliana, Arthur Boixel, Marin Julien, Benjamin Boisson, Thibault Bondetti.

Ontolology-mediated query answering (OMQA) is the issue of querying data while taking into account inferences enabled by an ontology. In other words, the notion of a database is replaced by that of a knowledge base, composed of data (also called facts) and of an ontology. Two families of formalisms for representing and reasoning with the ontological component are considered in this context: description logics (DLs) and existential rules (aka Datalog+). Both frameworks correspond to fragments of first-order logics, which are incomparable in general, but closely related in the context of OMQA: indeed, the DLs considered for OMQA (so-called Horn-DLs) are naturally translated into specific classes of existential rules. Compared to existential rules, Horn-DLs feature lower complexity classes and allow for specific algorithmic techniques. A well-known Horn-DL is the lightweight description logic family DL-Lite. Importantly, the foundational work carried by the KR community led to the definition of several W3C standards for Semantic Web languages, namely the family of OWL languages. For example, DL-Lite corresponds to OWL 2 QL, a dialect of OWL 2 with polynomial conjunctive query answering (in terms of data complexity; conjunctive queries are the basic and most frequent relational database queries). Furthermore, the ontology-based paradigm for data access is also supported by commercial systems, such as Oracle 11g, which offers a module dedicated to Semantic Web technologies (https://docs.oracle.com/cd/B28359_01/appdev.111/b28397/toc.htm).

This year, we further investigated OMQA with both description logics and existential rules. We also broadened this research line, by investigating ontological languages for non-relational data, hereby continuing the work initiated last year on OMQA for key-value stores.

Ontology-Mediated Query Answering in the Description Logics Framework

The OWL 2 QL profile, based upon the DL-Lite family, is a popular ontology language for applications involving large amounts of data. OWL 2 QL possesses the first-order rewritability property, meaning that conjunctive query answering can be reduced to database query evaluation by means of query rewriting. However, query rewriting can be costly and/or produce rewritten queries that are hard to evaluate, so it is important to understand when and how one can construct small and efficient rewritings, and more generally, under which conditions can OWL 2 QL be queried effectively. Building upon our earlier work, we explored these questions together with colleagues from Birkbeck College and the Free University of Bozen-Bolzano.

First, we studied the overhead of answering ontology-mediated queries (OMQs) in ontology-based data access compared to evaluating their underlying tree-shaped and bounded treewidth conjunctive queries (CQs). We showed that OMQs with bounded-depth ontologies have nonrecursive datalog (NDL) rewritings that can be constructed and evaluated in LOGCFL (a strict subclass of PTIME) for combined complexity, even in NL if their CQs are tree-shaped with a bounded number of leaves, and so incur no overhead in complexity-theoretic terms. For OMQs with arbitrary ontologies and bounded-leaf CQs, NDL-rewritings are constructed and evaluated in LOGCFL. We conducted experiments that demonstrate feasibility and scalability of our rewritings compared to standard NDL-rewritings.

  • These results were published at PODS 2017 [22]

We investigated the parameterised complexity of answering tree-shaped ontology-mediated queries in OWL 2 QL under various restrictions on their ontologies and CQs. We proved that answering OMQs with tree-shaped CQs is not fixed-parameter tractable if the ontology depth is regarded as the parameter, and that answering OMQs with a fixed ontology (of infinite depth) is NP-complete for tree-shaped and LOGCFL for bounded-leaf CQs. Moreover, we constructed an ontology T such that answering OMQs (T, q) with tree-shaped CQs q is W[1]-hard if the number of leaves in q is regarded as the parameter. The number of leaves had previously been identified as an important characteristic of CQs as bounding it leads to tractable OMQ answering. Our result shows that treating it as a parameter does not make the problem fixed-parameter tractable, even for a fixed ontology.

  • These results were published at DL 2017 [23]

Ontology-Mediated Query Answering in the Existential Rule Framework

The class of existential rules that naturally generalizes OWL 2 QL is called linear existential rules. Such rules have a body restricted to a single atom. Linear existential rules are in turn generalized by guarded existential rules, one of the main classes of existential rules.

Building upon our work on OWL 2 QL (reported Section 7.1.1), we developed optimal rewriting-based methods for answering ontology-mediated queries (O,q) where O is a set of linear existential rules and q is a CQ of bounded hypertree width. Assuming that the arity of predicates is bounded, we show that polynomial-size nonrecursive Datalog rewritings can be constructed and executed in (i) LOGCFL for OMQs with ontologies of bounded existential depth; (ii) NL for OMQs with ontologies of bounded depth and CQs whose hypertree decompositions have a bounded number of leaves; (iii) LOGCFL for OMQs with acyclic CQs whose join trees have a bounded number of leaves.

  • These results were published at DL 2017 [24]

While most work on ontology-mediated query answering considers conjunctive queries, navigational queries are gaining increasing attention. Last year, we conducted a first study of such queries in the setting of existential rules, focusing on linear rules and regular path queries. This year, in a continued collaboration with Michael Thomazo (Inria CEDAR), we have significantly extended these results by considering the problem of answering two-way conjunctive regular path queries (CRPQs) over knowledge bases whose ontology is given by a set of guarded existential rules. We first showed that for the subclass of linear existential rules, CRPQ answering is EXPTIME-complete in combined complexity and NL-complete in data complexity, matching the recently established bounds for answering non-conjunctive RPQs. For guarded rules, we gave a non-trivial reduction to the linear case, which allowed us to show that the complexity of CRPQ answering is the same as for CQs, namely 2EXPTIME-complete in combined complexity and PTIME-complete in data complexity.

  • These results were published at IJCAI 2017 [20]

Besides, three internships (L3, Master 1 and Master 2) explored different aspects related to existential rules.

Ontology-Mediated Query Answering on top of Key-Value Stores

Ontology-mediated query answering has been mainly investigated so far based on the assumption that data conforms to relational structures (including RDF) and that the paradigm can be deployed on top of relational databases with conjunctive queries at the core (e.g., in SQL or SPARQL). However, this is not the prominent way on which data is today stored and exchanged, especially in the Web. Whether OMQA can be developed for non-relational structures, like those shared by increasingly popular NOSQL languages sustaining Big-Data analytics, has just begun to be investigated. Last year, we carried out the first study of OMQA for key-values stores, which are systems providing fast and scalable access to JSON records [46]. We proposed a rule language to express domain knowledge, with rules being directly applicable to key-value stores, without any translation of JSON into another data model. However, some limitations of our proposal were (1) the absence of correspondence with logic, the semantics remaining operational, and (2) the need to drastically restrict the rules to ensure decidability.

Building on this previous work, we pursued the investigation of a rule language for JSON records, together with colleagues from Inria Lille. This yielded a novel rule language, with a natural translation into first-order logics, and more precisely into guarded existential rules. From known results on existential rules, we got the decidability of query answering in our framework but only rough complexity bounds. By establishing an interesting and non-trivial connection to word rewriting, we were able to pinpoint the exact combined complexity of query answering in our framework and obtain promising tractability results for data complexity. The upper bounds were proven using a query reformulation technique, which can be implemented on top of key-value stores, thereby exploiting their querying facilities.

  • These results were published at IJCAI 2017 [21]

A master student project led to an implementation of OMQA for MongoDB tree-pattern queries and a subset of the proposed rule language featuring key inclusions and mandatory keys. The system contains query rewriting procedures for data access as well as an optimization module for parallelizing the query reformulation process.

  • Demo paper at BDA 2017 [31]

Applications to Computer Aided Design

Participant : Federico Ulliana.

Complementing the theoretical work on the OMQA issue, the team also participated in the building of OMQA-based systems applied to the field of CAD (Computer Aided Design). We developed a system for querying and exploring complex 3D CAD models corresponding to the assembly of manufactory products (for example, an airplane wing). Our system features a pipeline of two modules : a geometric analysis module which reasons on numeric features of the CAD model and a knowledge-based module which reasons on symbolic information which is extracted by the former module. The knowledge-based module exploits a geometry-ontology for manufactory assemblies, that we developed in collaboration with an expert. This allows for an automatic classification of the solids that appear in a 3D scene (for example, for labelling screws and bolts), but also for associating them their functional role (for example, planar supports, seals, rotating guides). By automatically annotating objects, we minimize the errors usually introduced by the manual processes of annotation. Complex CAD models can therefore be queried by selecting objects and components based on the types (e.g., select all bolts of an airplane wing) or functions (e.g., planar supports) and the results of queries visualized in a 3D browser. This work is performed in the context of our collaboration with the Inria Imagine Team. A website http://3dassblyanlysis.gforge.inria.fr/3d/ gives a public access to a knowledge-based assembly example.

  • These results were published in EGC 2017 and received a prize for Best Application Paper