Section: New Results

Querying Heterogeneous Linked Data


Aggregation refers to the computation of aggregates in databases, that is, the computation of a function of the answer of a query, such as counting the number of answers, finding the optimal one for a given objective function or enumerating all of them with a small delay between two distinct answers. The goal of aggregation is typically to compute such aggregates without explicitly generating the whole set of answers. We study aggregation problem within the ANR project Aggreg coordinated by Niehren.

At ICALP Bourhis (with Amarilli, Jachiet and Mengel) [13] developped a new algorithm to efficiently enumerates the solutions of certain type of circuits. They apply their result to give new proofs previous results on efficient enumeration for queries defined by tree automata or FO queries over structures with bounded tree width by using these circuits as aggregates to represent the set of all solutions of a query and then enumerating them.

Again at ICALP [15] Bacquey in an collaboration with Caen and Marseille (Grandjean and Olive) prove that linear time complexity on cellular automata is exactly characterized by inductive first-order Horn formulas. The method of proof also implies the following result: the enumeration of the ground atoms that are consequences of any inductive first-order Horn formula on a given structure can be performed in linear time (in the cardinality of the domain of the structure) by a cellular automaton (of appropriate dimension).


Provenance is a type of aggregates that aims at exhibiting the contributions of tuples of a database to a query answer. This allows to give an explanation of the query answers, that can help to judge their reliability. Provenance is studied within the ANR project Aggreg.

In a paper at Icdt [14], Bourhis (with Amarilli, Monet and Senellart) studies the combined complexity for computing circuit representation of the provenance, which were used to efficiently evaluate aggregations tasks. In particular, they exhibit a recursive language of queries capturing path queries that compute a compact representation of the provenance.

Recursive Queries

At PODS [21], P. Bourhis proposed a formalisation of JSON documents, query languages and schema. This work is a collaboration with Chile. After having defined a clean theoretical framework to study JSON documents, Bourhis and his co-authors study the decidability and complexity of navigational query answering for different languages, relating each of them with existing implementations. Finally, they extend the documents with recursion together with a suitable querying language and study the complexity of query evaluating and query answering in this case.

At ICALP [17], P. Bourhis studied in a collaboration with Oxford the problem of definability in decidable fixpoint logic. Bourhis and his co-authors gives new characterisation of formulas that can be expressed in decidable logic with fixpoint. One of their main result is an effective characterisation of the formulas of the guarded negation fragment with fixpoint that can be expressed in the guarded fragment with fixpoint. Their techniques are then extended to effectively characterise the first order formulas that can be defined in the guarded fragment.

A. Lemay contributed at Icde [16] the gMark benchmark, a tool to generate large size graph database and an associated set of queries. This work was done in cooperation with Eindhoven and previous members of Links that are now in Lyon and Clérmont-Ferrant. Its main interest is a great flexibility (the generation of the graph can be done from a simple schema, but can also incorporate elaborate a parameters), an ability to generate recursive queries, and the possibility to generate large sets of queries of a desired selectivity. This benchmark allowed for instance to highlight difficulties for the existing query engines to deal with recursive queries of high selectivity.

Data Integration

P. Bourhis and S. Tison presented at IJCAI [18] — the top conference in Artificial Intelligence — a new ontology mediated query answering system (OMQA) for JSON document. This work is a collaboration with researchers from the University of Montpellier. The strength of their contribution lies in the fact that their ontology is very expressive and yet gives a tractable query answering system. Moreover, they establish a non-trivial connection between their query answering system and term rewriting, allowing them to pinpoint the exact complexity of query answering and to evaluate it directly over KV-stores.

Also a IJCAI  [20], P. Bourhis studied guarded ontology languages that are compatible with cross product. This work was done in cooperation with Edinburgh and Vienna. Cross product is a useful modelling tool that allow to connect every element of one relation to every element of another relation. However, in this paper, Bourhis and his co-authors show that its introduction into guarded ontology – even when it is limited to two relations – quickly leads to the undecidability of query evaluation and query answering. However, they isolate fragments where one can add cross products without losing the decidability of these problems by either restricting the queries or the ontology.

Schema Validation

I. Boneva presented at Iswc  [19] her work on ShEx 2.0 (Shape Expression Language 2.0), a language to describe the vocabulary and the structure of an RDF graph. This work is a collaboration with Oviedo and MIT. The language is based on the notion of shapes, a typing system supporting algebraic operations, recursive references to other shapes or Boolean combination. In the paper, Boneva and her co-authors give efficient algorithms to test if an RDF graph satisfies a shapes schema together with implementation guidelines. Her research on the topic has also led to the publication of a book [25] on the validation of RDF data, containing among other things her contribution to ShEx.

Json documents are basically unordered data trees. Schemas for unordered data trees can thus be defined by appropriate notions of tree automata for unordered trees, as studied in a systematic manner by Boiret, Hugot, and Niehren [11] in cooperation with Treinen from Paris 7. Alternatively, schemas can be defined by closed logic formulas in the logics proposed by the same authors in [12] .They showed that logics for unordered data trees with equality tests of data values of siblings nodes remain decidable, and thus the equivalence problems of the corresponding tree automata. In contrast, the problem becomes undecidable when comparing cousins for equality of data values.