EN FR
• Legal notice
• Accessibility - non conforme
##### BOREAL - 2022

2022
Activity report
Project-Team
BOREAL
RNSR: 202224285F
Research center
In partnership with:
Université de Montpellier, INRAE
Team name:
Knowledge Representation and Rule-Based Languages for Reasoning on Data
In collaboration with:
Laboratoire d'informatique, de robotique et de microélectronique de Montpellier (LIRMM), Ingénierie des Agropolymères et Technologies Emergentes (IATE)
Domain
Perception, Cognition and Interaction
Theme
Data and Knowledge Representation and Processing
Creation of the Project-Team: 2022 June 01

# Keywords

• A3. Data and knowledge
• A7.2. Logic in Computer Science
• A9. Artificial intelligence
• A9.8. Reasoning
• B3.5. Agronomy
• B6.5. Information systems

# 1 Team members, visitors, external collaborators

## Research Scientists

• Federico Ulliana [Team leader, Inria (on leave from Montpellier University), Researcher]
• Jean-Francois Baget [Inria, Researcher]
• Pierre Bisquert [INRAE, Researcher]
• Nofar Carmeli [Inria, from Oct 2022]
• David Carral Martinez [Inria, Researcher]

## Faculty Members

• Madalina Croitoru [Montpellier University, Professor, HDR]
• Michel Leclère [Montpellier University, Associate Professor]
• Marie-Laure Mugnier [Montpellier University, Professor, HDR]

## Post-Doctoral Fellow

• Maxime Buron [Inria, from Mar 2022 until Aug 2022]

## PhD Students

• Martin Jedwabny [Montpellier University]
• Elie Najm [Inria]
• Guillaume Perution-Kihli [Inria]
• Olivier Rodriguez [Inria]
• Mohamed Aziz Sfar Gandoura [EDF, CIFRE]

## Technical Staff

• Lucas Rouquette [CNRS, from Sep 2022 until Nov 2022]
• Florent Tornil [Inria, Engineer]

## Interns and Apprentices

• David Camarazo [Montpellier University, from Feb 2022 until Jul 2022]
• Paul Fontaine [Montpellier University, from May 2022 until Jul 2022]
• Laura Gruson [ENS, from Jun 2022 until Jul 2022]
• Lucas Rouquette [Montpellier University, until Jul 2022]
• Sandra Victor [Montpellier University, until Jul 2022]

• Annie Aliaga [Inria]

## External Collaborators

• Patrice Buche [INRAE, HDR]
• Alain Gutierrez [CNRS]

# 2 Overall objectives

Current information systems are grounded on the exploitation of data coming from an increasing number of heterogeneous sources. Today, coping with the variety of data requires novel paradigms for effectively accessing and querying information that adapt to the different types of sources, as well as declarative high-level languages to drive the data processing and data quality tasks.

BOREAL is a team working at the crossroads of knowledge representation and reasoning and database theory. The team focuses on the study of foundational and applied issues of reasoning in a context of data variety. More specifically, the team aims at deriving a better understanding the logical fragments that are at the foundations of the frameworks used for exploiting corporate and Web data - and in particular rule-based languages. This will pave the way to novel automated-reasoning and graph-based techniques that can be put at service of data-centric applications exploiting heterogeneous and federated data. The team also aims at combining solid foundational and algorithmic work with software development and applications, with an emphasis on the field of agronomy.

# 3 Research program

The BOREAL team pursues a knowledge-based data management (KBDM) approach for attacking the grand challenges posed by data variety, with an important focus on the framework of existential rules. The idea of knowledge-based data management is to orchestrate the access to a complex information system made by federated databases through a three-layer architecture - also common to data-integration and ontology-based data access (OBDA). Under this prism, a set of heterogeneous data sources is connected to a knowledge base via a layer of mappings. The idea of KBDM is to define the business logic for data-centric applications at the knowledge base level, and then automatically translate the data-services towards the heterogeneous sources - through reasoning. This approach paves the way to a more principled use of complex information systems, with benefits to both data scientists, data curators, and administrators. What really characterizes the KBDM approach is the leveraging on formalized domain-specific knowledge, for the abstracting on heterogeneous data and achieving high-quality of data-integration, and on expressive rule-base languages like existential rules (and extensions thereof), to drive the effective exploitation of data through reasoning.

Our project focuses on a set of issues related to knowledge-based data management we now describe.

#### Foundations of rule languages

A great deal of the power of a KBDM system comes from its rule base. A prominent research direction for the team is the analysis and design of rule languages for reasoning on data. It is well understood that enriching a language with novel features can sensibly increase the complexity of the reasoning tasks. Our goal is hence to identify rules featuring decidable query answering and static analysis, and at the same time find good tradeoffs between their expressivity and complexity, so as to devise novel and practically useful rule-based frameworks.

#### Fine-grained complexity of query answering

The query answering problem is at the heart of many reasoning tasks in KBDM. From a complexity analysis point of view, since the database to query can be voluminous, it is not always enough to know that a certain task can be done in polynomial time. Hence, an important goal for us is to study the fine-grained complexity (that is, to find the degree of the polynomial that bounds the number of operations required) as well as the enumeration complexity of the query answering problem. The aim of this research direction is to obtain the theoretical knowledge required for practical query optimization.

#### Algorithms and optimizations for query answering

Reasoning-driven data management needs optimization to effectively exploit large data. We target the design of efficient and scalable algorithms for query answering. Our goal is to devise novel hybrid approaches that combine materialization and virtualization strategies and account for the interplay between the components of the KBDM system (data, mappings, rules). Our ambition is also to build new bridges between knowledge representation and data-management by exploring the range of possibilities opened by the reuse of existing database technology to develop new reasoning systems.

#### Architectures for knowledge-based data integration

The realm of possibilities in heterogeneous data integration leads to the offspring of a family of KBDM architectures, one for each applicative context. Our goal is to study architectures inspired from emerging practical use-cases, including federations of independent sources as well as multi-level architectures where KBDM systems are stacked to progressively distill information and achieve high-value data. We also focus on the type of mappings required to cope with heterogeneity, because data may differ along several dimensions such as its format, refinement, dynamicity, and certainty; this is required to build a unified view of a complex information system.

#### Quality of knowledge-based data integration

Knowledge-based data management systems can result in high quality data for users and applications. Yet, they also need mechanisms to assist data curators to constantly evaluate and improve all of their components towards the ultimate goal of matching the desired data integration level. Our aim is to investigate explanation mechanisms able to justify answers to queries and to point out inconsistencies in the data. We are also interested in techniques for deriving, within a knowledge-base, equivalent formulations of queries that are expressed outside of it, at the source level; these are critical for the verification of mappings and rules.

# 4 Application domains

## 4.1 Agronomy and Agroecology

Agronomy is today more and more at the center of important debates around questions of environmental impact related to the practice of intensive agriculture, especially at large scale. Through our research collaborations with INRAE (National Research Institute for Agriculture, Food and Environment) and DFKI (German Institute for Artificial Intelligence) our goal is to contribute and defining new models, techniques, and applications, enabling a better exploitation of data generated in these fields so as to put it at the service of decision-making processes.

Agronomy is a strong expertise domain in the area of Montpellier. And indeed, BOREAL is a joint team with INRAE, and we closely collaborate with two Montpellier research laboratories (UMR, “Unités Mixte de Recherche”), namely IATE and ABSys. These collaborations can also reach a larger extent, for example, in the contex of the #DigitAg (Institute Convergences Agriculture Numérique, Section 9.3) our team participated to the joint Inria-INRAE “White Book” on digital agriculture which can be considered a manifesto of the current challenging posed by digital agriculture 16.

A major issue for IATE (Engineering of Agro-polymers and Emerging Technologies) is to model the transformation of products in agrifood chains (i.e., the chain of all processes leading from some raw material, such as plants, to the final products, including waste treatment). This modeling has several objectives. It provides better understanding of the processes from begin to end, which aids in decision making, with the aim of improving the quality of the products and decreasing the environmental impact (e.g., reducing waste, choosing right food packaging). There is a need for tools for integrating and rendering easier for data scientists to acquire the heterogeneous data resulting from agrifood chains, which is needed for their analysis.

A major issue for ABSys (Biodiversified Agrosystems) is the study of sustainable farming systems. It is now established that the restoration of sustainable farming systems requires the adoption of agroecological practices supporting the reintroduction of biodiversity in agroecosystems. Indeed, an agroecosystem should provide not only cash crops but also ecosystem services that support the durability of the farming systems itself. This leads to more complex agroecosystems including a higher number of plant species. There is thus a crucial need for tools that would assist users in the design of such new agroecosystems, from researchers in agronomy to agricultural advisors and farmers.

Beside INRAE, our team also collaborates with two DFKI teams located in Osnabrück and Kaiserslautern in the context of a bilateral project Inria-DFKI (“R4Agri”, Section 9). From an applicative perspective, the major issue targeted by this project is the development of monitoring tools based on reasoning which can equip robotic or mechanic devices used in agricultural exploitations. This can be used to enhance the agricultural processes but also to enforce regulations, for instance by assessing that the spraying of pesticides remains at safe distance from river borders. In this context, there is a need for tools allowing one to interpret and analyze the number of types of sensor data that are generated in exploitations.

# 5 Highlights of the year

• The BOREAL Inria team has been officially created on June 1st, 2022.
• The team had 5 conference papers in top venues (A*) of knowledge representation and reasoning and databases (PODS, VLDB, LICS, KR, IJCAI).
• In 2022, one of our papers 23 from 2021 been recognized as a highlight of the year of the INRAE department Transform.

## 5.1 Awards

One of our papers at KR 2021 22 received a best paper award from the conference BDA 2022 (38ème conférence sur la gestion de données), the annual event of the French research community in data management.

# 6 New software and platforms

Participants: Jean-François Baget, Pierre Bisquert, Michel Leclère, Marie-Laure Mugnier, Guillaume Pérution-Kihli, Olivier Rodriguez, Federico Ulliana.

## 6.1 Software of the Team

### InteGraal

BOREAL is committed in a team effort to develop a platform for knowledge-based data management on heterogeneous and federated data, called InteGraal. The development of the tool started towards the end of 2020 with the goal of providing a major version of the former Graal software.

The main developer of the InteGraal library is Florent Tornil (INRIA engineer), who closely collaborates with several permanent and non-permanent members of the team. Since fall 2022, the work of Florent Tornil is also supervised by the SED (Service d'Expérimentation et de Développement) of the Inria centre at Université Côte d’Azur. The role of the SED is to complement the team activity in structuring software development cycles by following standard practices and guidelines on matters such as project management, continuous integration, and software distribution.

The InteGraal tool is available online on the Inria gitlab (gitlab.inria.fr/rules/integraal) with Apache 2 licence. Overall, InteGraal is a tool aimed at accelerating the implementation and test of results issued by fundamental research, but not only. This tool is also a concrete means to structure discussions and collaborations with other academic and industrial partners.

#### InteGraal for Research Collaborations.

This year, the tool has been used extensively in the context of Elie Najm's PhD thesis 13, 19, 20 to define a knowledge-based data-integration system. InteGraal also revealed functional for two Master 2 internships of Lucas Rouquette and David Camarazo (Section 10.2.2). First, the tool has been used by our interns to implement and test novel reasoning algorithms. Then, the tool has been used to consolidate their research outcomes and integrate them within the platform so as to make them further reusable by the team. The tool has been also used to build two prototype demonstrators in the context of two research projects with other teams. The first has been done in the context of a collaboration with INRAE on the integration of heterogeneous data for supporting machine learning tasks for decision making in food package selection (Section 10.1.3). The second is a collaboration with DFKI (German Institute for Artificial Intelligence) in the context of the R4Agri project (Section 9) on knowledge-based integration and reasoning on sensor data. As a side note, the tool genericity and extensibility has been tested by bridging it with two other Inria softwares, namely CORESE (developed by WIMMICS) and Tatooine (developed by CEDAR). Finally, the tool served as basis of demonstrations and discussions among the team and companies like Anabasis Assets, and more recently with Naval Group.

### Other Tools

In the context of the PhD of Olivier Rodriguez, we started the development of a prototype system for reasoning on NoSQL systems, notably document-oriented key-value stores (or simply, document stores). This led to the creation in 2021 and 2022 of the TreeForce library, which shall be detailed next.

## 6.2 New software

### 6.2.1 InteGraal

• Name:
InteGraal : Knowledge-Representation and Reasoning for Data Integration
• Keywords:
Knowledge Bases, Data integration, Knowledge representation, Automated Reasoning, Heterogeneous Data, Knowledge Graphs
• Scientific Description:
InteGraal is a tool for integrating and reasoning on heterogeneous and federated data. The tool embodies algorithms and techniques developed at the crossroads between the fields of knowledge representation and reasoning and data management. From the historic point of view, this tool is the result of a complete re-engineering of the Graal tool, whose API and functionalities have been completely updated. Also, with respect to Graal, the tool is very much oriented towards data integration.
• Functional Description:
InteGraal has been designed in a modular way, in order to facilitate software reuse and extension. It should make it easy to test new scenarios and techniques, in particular by combining algorithms. The main features of Graal are currently the following: (1) internal storage to store data by using a SQL or RDF representation (Postgres, MySQL, HSQL, SQLite, Remote SPARQL endpoints, Local in-memory triplestores) as well as a native in-memory representation (2) data-integration capabilities for exploiting federated heterogeneous data-sources through mappings able to target systems such as SQL, RDF, and black-box (eg. Web-APIs) (3) algorithms for query-answering over heterogeneous and federated data based on query rewriting and/or forward chaining (or chase)
• Release Contributions:
2022: First release, software deposit with Apache 2 licence. 2021: Functional specification, design and development of a major improved version of the tool. Started refactoring of the API, and of several modules for knowledge base representation, data storage, query answering and forward-chaining reasoning (chase). Started the development of new modules for handling heterogeneous data: mappings and federations.
• News of the Year:
2022. Finalized novel modules, namely (1) storage and mapping, (2) forward chaining module. Started novel backward chaining module (end planned in first half 2023). Bidirectional bridge with the CORESE platform (https://project.inria.fr/corese/) and the query mediator Tatooine (https://hal.inria.fr/hal-01321201v2). Tool already used by the team for PhDs, master internship, and to build demostrator for research projects.
• URL:
• Authors:
Florent Tornil, Guillaume Perution-Kihli, Clément Sipieter, Federico Ulliana, Jean-Francois Baget, Pierre Bisquert, David Carral Martinez, Michel Leclère, Marie-laure Mugnier
• Contact:
Federico Ulliana

### 6.2.2 TreeForce

• Keywords:
JSon, Databases, Knowledge Bases, Automated Reasoning, Rewriting, NoSQL, Data integration, Knowledge representation, Heterogeneous Data
• Functional Description:
TreeForce is a java tool for reasoning on tree data. It leverages on query rewriting techniques and NoSQL document oriented key-value stores. This library can be seen as a general toolbox for implementing reasoning techniques tailored for tree-shaped data and rules. It is composed of two main modules. The first includes generic data structures and algorithms for trees and tree-automata. The second includes automata-based query rewriting techniques as well as efficient evaluation techniques for large sets of rewritings.
• Release Contributions:
2021: First version of TreeForce. Automata for unordered tree languages. Automata-based query-rewriting algorithms. MongoDB wrapper.
• News of the Year:
2022 Added novel instance-aware rewriting and evaluation algorithms. Introduced summarization, partitioning and parallelization techniques.
• Contact:
Federico Ulliana

# 7 New results

Before presenting this year's results, we introduce some general notions about our main research focus, namely Knowledge-Based Data Management with existential rules (Section 7.1). This allows us to put into context our results on foundations and algorithms for rule-based reasoning (Sections 7.2 and 7.3).

## 7.1 Knowledge-Based Data Management with Existential Rules

This broad topic encompasses research areas such as ontology-mediated query answering (OMQA), data integration (DI), and ontology-based data access (OBDA) because of the expressivity of existential rule languages and the complexity of integration architectures it embraces.

#### Existential Rules.

Existential rules are first-order-logic formulas representing implications of the form $\forall X\left[\mathrm{𝐵𝑜𝑑𝑦}\left(X,Y\right)\to \exists Z.\phantom{\rule{3.33333pt}{0ex}}\mathrm{𝐻𝑒𝑎𝑑}\left(X,Z\right)\right]$ where Body and Head are positive conjunctions of atoms without functional symbols, and Head can have existentially quantified variables. These rules allow one to model complex relationships over the domains of interest, and at the same time dispose of a value invention mechanism through existentially quantified variables. This makes them suitable for many data and knowledge tasks on both open and closed domains. As a result, existential rules are ubiquitous in many fields. They are used to model dependencies, schema mappings, and expressive queries in databases. They are used as ontological languages as a valid complement to Description Logics, and at the same time as a generalization of so-called Horn Description Logics which lay at the foundations of important Semantic Web standards.

Given a query $Q$, a database $D$, and a set of rules $R$, query answering asks to determine whether $D,R\vDash Q$ (where $\vDash$ denotes standard first-order logic entailment), that is if the query $Q$ is a logical consequence of the knowledge base made by the database $D$ and the rules $R$. In the field of knowledge representation and reasoning, rule-based query answering is studied for rules expressing ontologies and referred as ontology-mediated query answering (OMQA). Formalisms such as Description Logics and Existential Rules (a.k.a, Tuple-Generating-Dependencies, or Datalog$±$) are typically targeted for expressing ontologies. Overall, the main emphasis of this topic is in the study of rule languages and the role they play in query answering.

#### Rule-based Query Answering over Heterogeneous and Federated Data

In this context, the problem formulation remains similar, however the database $D$ is replaced by a more complex notion of federation$\left(𝒟,ℳ,𝒮\right)$ where $𝒟$ is a collection of heterogeneous data sources, $𝒮$ is a global integration schema, and $ℳ$ is a set of mappings linking the datasources in $𝒟$ to the global schema $𝒮$. This framework is at the foundations of data integration (DI) in databases and of ontology-based data access (OBDA) in knowledge representation and reasoning. OBDA focuses on global integration schemes and rules built on ontologies enabling query rewriting, while DI is more concerned with rules representing data-dependencies. Overall, both give more emphasis to heterogeneous and federated data in rule-based query answering.

#### Reasoning Strategies for Query Answering

The two prominent strategies for rule-based query answering are materialization (also known as saturation, or forward-chaining) and virtualization (also known as query rewriting, or backward chaining). Both can be seen as ways of reducing query answering (which involves reasoning) into classical query evaluation. Materialization amounts at storing the inferences enabled by rules, thereby obtaining an extended database, on which queries are evaluated. Query rewriting amounts at compiling relevant rules into the query, thereby obtaining a rewritten query (usually a union of queries), which is evaluated on the (unaltered) database. Both approaches have their own strengths, and at the basis of this duality is the fact that while materialization is independent of queries, rewriting is independent of the database. Hence, each strategy better suits certain applicative scenarios, and both can possibly be combined thereby resulting in hybrid approaches.

#### Contributions

This year, we studied a number of theoretical, algorithmic, and applied questions of knowledge-based data management with existential rules. Our main contributions cover the following topics:

• foundational issues related to the termination of reasoning strategies;
• novel rule-transformation procedures leading to efficient reasoning algorithms;
• the use of rules and graphs to model and reason on complex systems (notably in agronomy).

This work led to the publications presented next.

Beside our main publications, it is worth noting that the team also supervised a number of student internship projects (Section 10.2.2) investigating other foundational and applied issues of reasoning. Finally, complementing methodological work, we also conducted an important team effort in the development of the InteGraal tool for rule-based query answering over heterogeneous and federated data (Section 6.2).

#### Note

Next, we do not present our IJCAI 2022 publication on the expressivity of chase terminating rule sets 10. Indeed, this is an extended abstract of a 2021 paper 21 that also received the best paper award at the international KR Conference (A*). Also, we do not present another paper on the composition of mapping rules 22 that received the best paper award at the 2022 French Database Conference (BDA), which was also published at the international KR Conference in 2021.

## 7.2 Foundations of Rule-based Reasoning

Participants: Jean-François Baget, Pierre Bisquert, Maxime Buron, David Carral, Michel Leclère, Marie-Laure Mugnier, Guillaume Pérution-Kihli, Olivier Rodriguez, Federico Ulliana.

Our main contributions for this year revolved around the static analysis problems related to the termination of reasoning strategies. Indeed, it is well understood that as both materialization and virtualization rely on a fixpoint operator, they may not terminate. Thus, it then becomes essential to be able to decide whether a given set of rules enjoys the termination of a given reasoning strategy before this can actually be executed.

#### Chase Termination

The most widespread materialization procedure for existential rules is called the chase. There are several variants of the chase which differ in terms on their computational costs and termination capabilities. The most powerful variant is the core-chase which is known to terminate if and only if the knowledge base admits a finite universal model (a model that can be embedded into any other model of the knowledge base, hence is sufficient to answer unions of conjunctive queries). Other variants include the restricted and (semi-)oblivious chase. These are less aggressive on the suppression of redundancies produced by materialization and may not terminate even if a finite universal model of the knowledge base exists. A ruleset that enjoys the termination of the core-chase over any possible database instance, is also called a finite expansion set (FES).

#### Rewriting Termination

First-Order rewritability is a property enjoyed by the sets of existential rules for which every input conjunctive query (CQ) admits a finite equivalent first-order-logic formula, called its rewriting. Here, equivalent is intended in the sense of query answering, and it means that whenever there is a database that (together with the rules) entails the input query, then the same database (without the rules) directly entails the rewriting. This notion coincides with that of finite unification set (FUS), which are the sets of existential rules for which breadth-first query rewriting based on piece-unifiers terminate. Interestingly, this query-rewritability property has a dual formulation in terms of materialization, which is called the bounded-depth derivation property (BDD).

### Normalisation of Rules

In the literature, it is often assumed that existential rules are in some normal form that simplifies technical developments. For instance, a common assumption is that rule heads are atomic, i.e., restricted to a single atom. Such assumptions are usually considered to be made without loss of generality as long as all sets of rules can be normalised while preserving logical entailment. However, an important question is whether the properties that ensure the decidability of reasoning are preserved as well. We carried out a systematic study of the impact of normalisation procedures on two properties that underpin most known classes of decidable rules: the termination of the chase (and, actually, of different variants of the chase) and the first-order rewritability of conjunctive queries.

• Published at KR 2022 11 with Michael Thomazo (Inria Paris, Valda Team).

### Query Rewritability

The expressivity of existential rules enjoying first-order rewritability is still not fully understood. An open conjecture on this matter states that any theory that is a finite expansion set (FES) and admits finite query rewriting (FUS) must be uniformly bounded (i.e., makes the depth of reasoning independent of any input data). Starting from the FES/FUS conjecture, we showed that this holds for a large class of FUS theories, which we call “local”. Upon investigating how “non-local” FUS theories can actually get, we discovered that these can lead to rewritings containing elements that are exponentially larger than the input query. This unexpected phenomenon, we think, is at odds with prevailing intuitions on FUS rulesets.

• Published at PODS 2022 14 with Piotr Ostropolski-Nalewaja (University of Wrocław),  Jerzy Marcinkowski (University of Wrocław),  and Sebastian Rudolph (TU Dresden).

As a side topic, this year we also investigated the satisfiability problem for expressive temporal logic formalisms.

### Satisfiability of Hyper-LTL

Among the most studied fragments of first-order logic, we find LTL (Linear Temporal Logic). This logic is used to specify properties of programs and reason about them, and more precisely their execution traces. The fundamental problem in this context is the satisfiability problem which asks to decide the existence of some model in which a formula holds. We studied the satisfiability problem for HyperLTL, which extends LTL with so called trace quantification, which allows one for instance to express the non-interference of two simultaneous execution traces. We characterized the complexity of the satisfiability problem for several fragments of HyperLTL thereby also showing new interesting decidable fragments.

• Published at LICS 2022 9 with Raven Beutner (CISPA Helmholtz Center for Information Security), Bernd Finkbeiner (CISPA Helmholtz Center for Information Security), Jana Hofmann (CISPA Helmholtz Center for Information Security), and Markus Krötzsch (TU Dresden).

## 7.3 Algorithms for Rule-based Reasoning

Participants: Jean-François Baget, Pierre Bisquert, Maxime Buron, David Carral, Michel Leclère, Marie-Laure Mugnier, Guillaume Pérution-Kihli, Olivier Rodriguez, Federico Ulliana.

An existential rule that has no existentially quantified variable in its head is called Datalog. This language has long being studied, and several efficient Datalog engines are nowadays available. This year, we studied new rule transformation procedures whose output targets the Datalog language. Our results show that they can lead to novel efficient algorithms for completing reasoning tasks in both Description Logics (precisely, EL with Nominal Schemas) and existential rules (precisely, guarded rules) which can be efficiently implemented by leveraging on existing Datalog reasoners.

### EL Ontologies with Nominal Schemas

Nominal schemas have been proposed as an extension to Description Logics (DL), the knowledge representation paradigm underlying the Web Ontology Language (OWL). They provide for a very tight integration of DL and rules. Nominal schemas can be understood as syntactic sugar on top of OWL. However, this naive perspective leads to inefficient reasoning procedures. In order to develop an efficient reasoning procedure for the language ELV++, which results from extending the OWL profile language OWL EL with nominal schemas, we propose a transformation from ELV++ ontologies into Datalog-like rule programs that can be used for satisfiability checking and assertion retrieval. The use of this transformation enables the use of powerful Datalog engines to solve reasoning tasks over ELV++ ontologies. We implemented and evaluated our approach on several real-world, data-intensive ontologies, and found that it can outperform state-of-the-art DL reasoners such as Konclude and ELK.

• Published in the Journal of Logic and Computation 5 with Joseph Zalewski (Kansas State University) and Pascal Hitzler (Kansas State University).

### Guarded Existential Rules

Guarded existential rules, also known as guarded tuple-generating dependencies (GTGDs), are a natural extension of Description Logics and referential constraints. It has long been known that queries over GTGDs can be answered by a variant of the chase. However, there has been little work on concrete algorithms and even less on implementation. To address this gap, we revisit Datalog rewriting approaches to query answering, where GTGDs are transformed into a Datalog program that entails the same base facts on each base instance. We show that the rewriting can be seen as containing "shortcut" rules that circumvent certain chase steps. We present several algorithms that compute the rewriting by simulating specific types of chase steps, and we discuss important implementation issues. Finally, we show empirically that our techniques can process complex GTGDs derived from synthetic and real benchmarks and are thus suitable for practical use.

• Published at VLDB 2022 8 with Mickael Benedikt (Oxford University), Stefano Germano (Oxford University), Kevin Kappelmann (Technical University of Munich) and Boris Motik (Oxford University).

## 7.4 Applications of Rule-based Languages

Participants: Jean-François Baget, Pierre Bisquert, Madalina Croitoru, Martin Jedwabny, Marie-Laure Mugnier, Elie Najm, Mohammed Aziz Sfar Gandoura.

The main team contribution on this topic include the use of rule-based languages to solve problems arising in agronomy and agroecology, stemming from the collaboration with INRAE realized in the context of Elie Najm's PhD thesis (funded by #DigitAg). Furthermore, again in the context of agronomy, Marie-Laure Mugnier and Pierre Bisquert participated to the joint Inria-INRAE “White Book” on digital agriculture 16. In the context of Mohammed Aziz Sfar Gandoura PhD's thesis, which is done in collaboration with EDF research, we did some steps towards the development of graph-based knowledge representation tools for modeling and reasoning on power-plants. Finally, as part of Martin Jedwabny's PhD thesis we also investigated the use of ontologies and rules in the context of planning.

### Integrating Data and Knowledge for Decision Making in Agroecology

As part of Elie Najm's PhD thesis and in collaboration with the INRAE ABSys research lab in agroecology, we considered the issue of selecting service plant species according to their potential to provide ecosystem services. To tackle that issue, we adopted an approach based both on a formalized representation of domain knowledge as logical rules and on the exploitation of available data, which was typically collected independently from the targeted application. The architecture of our system reprises that of rule-based query answering over heterogeneous and federated data we already presented. We carried out an experimental evaluation of our system on this use case, which showed that very satisfactory results can be obtained as long as the proportion of missing data values is not too high.

• Published at RuleML & Rule Challenge 2022 13 and as part of Elie Najm's PhD Thesis 19.
• Preprint submitted for publication 20 as a joint work with Christian Gary (INRAe), Raphael Metral (INRAe) and Léo Garcia (INRAe)

### Graph-based Approaches for Monitoring Power-plants Controllers

As part of Mohammed Aziz Sfar Gandoura's PhD thesis in collaboration with EDF research, we considered the issue of monitoring power-plant controllers. The function of power-plants controllers is typically described by using formal representation of functional specifications given in logical diagrams (LD), which is used for verification and test purposes. Motivated by a scenario from a real world power plant specification, our work consisted at defining a formal structure that explicitly encodes the semantics and behavior of a LD. We put in a complete transformation procedure of the non-formal LD specifications into a directed state graph such that properties like oscillatory behavior become formally verifiable on LDs.

• Published at FoIKS 2022 15 with Dina Irofiti (EDF).

### Ontology-based Explanations for Planning

As part of Martin Jedwadbny's PhD thesis, we investigated the use of ontologies for the explanation and scrutability of an artificial agent's decisions in the context of planning. In a nutshell, the idea is to provide a framework allowing to select the level of precision needed for a user to understand the agent's sequence of actions: the more knowledgeable the user, the more precise the plan can be laid out by unfolding actions into several layers of complexity. We provided a method to adjust on the fly the level of detail of actions (and corresponding fluents) using the concept of "transitional description". Such a work is relevant for user-adaptable explainable AI (XAI) exploiting both knowledge of the world and user's level of expertise.

• Published at ICCS 2022 12 as part of Martin Jedwabny PhD Thesis 18.

# 8 Bilateral contracts and grants with industry

## 8.1 Bilateral Grants with Industry

Participants: Madalina Croitoru, Mohammed Aziz Sfar Gandoura.

Madalina Croitoru supervises a CIFRE PhD founded by EDF-Research Paris. The thesis started in January 2022. The topic is the automatic generation of testing scenarios for the verification of complex power plant systems.

# 9 Partnerships and cooperations

## 9.1 International initiatives

#### R4Agri

Participants: Pierre Bisquert, David Carral, Marie-Laure Mugnier, Federico Ulliana.

• Title:
“R4Agri”- Reasoning on Agricultural Data: Integrating Metrics and Qualitative Perspectives
• Partner Institution(s):
$\phantom{\rule{3.33333pt}{0ex}}$
• Inria
• DFKI, Germany
• Date/Duration:
42 months, started in 01/01/2022

AI tools supporting competitive and sustainable agriculture need to exploit highly diverse kinds of data and knowledge, from raw data provided by sensors to high level expertise knowledge. Taking numerical agriculture as the targeted application domain, the overall goal of the R4Agri project is to provide a framework for reasoning about knowledge based on heterogeneous data, with a focus on multi-modal and multi-scale sensor data. Main challenges include context-dependent interpretation of sensor data, which involves reasoning about prior knowledge, and query answering techniques that exploit domain knowledge and accommodate the specificities of data sources in a flexible manner. The application potential in this field of world-wide societal and ecological impact will be demonstrated in realistic use cases. In 2022, Maxime Buron has been recruited as a post-doc (from March to August) funded by this project to work on optimizations of query answering based on mappings to heterogeneous data sources as well as their implementation within the InteGraal tool developed by the team.

## 9.2 International research visitors

### 9.2.1 Visits of international scientists

#### Other international visits to the team

##### Nofar Carmeli
• Status:
Postdoctoral Researcher
• Institution of origin:
ENS Paris; Valda Inria Research Team
• Country:
France
• Dates:
three days, June 2021
• Context of the visit:
Research stay
##### Lucía Gómez Álvarez
• Status: Postdoctoral Researcher
• Institution of origin:
TU Dresden
• Country:
Germany
• Dates:
two weeks, June 2022
• Context of the visit:
Research stay

### 9.2.2 Visits to international teams

##### David Carral
• Visited institution:
TU Dresden
• Country:
Germany
• Dates:
three weeks, December 2022
• Context of the visit:
Research collaboration
• Mobility program/type of mobility:
research stay

## 9.3 National initiatives

#### CQFD (ANR PRC, Jan. 2019-Dec. 2024)

Participants: Jean-François Baget, Pierre Bisquert, Nofar Carmeli, David Carral, Michel Leclère, Marie-Laure Mugnier, Guillaume Pérution-Kihli, Olivier Rodriguez, Florent Tornil, Federico Ulliana.

CQFD (Complex ontological Queries over Federated heterogeneous Data), coordinated by Federico Ulliana (BOREAL), involves participants from Inria Saclay (CEDAR team), Inria Paris (VALDA team), Inria Nord Europe (SPIRALS team), IRISA, LIG, LTCI, and LaBRI. The aim of this project is tackle two crucial challenges in OMQA (Ontology Mediated Query Answering), namely, heterogeneity, that is, the possibility to deal with multiple types of data-sources and database management systems, and federation, that is, the possibility of cross-querying a collection of heterogeneous datasources. By featuring 8 different partners in France, this project aims at consolidating a national community of researchers around the OMQA issue.

#### Convergence institute #DigitAg (2017-2023)

Participants: Jean-François Baget, Patrice Buche, Madalina Croitoru, Marie-Laure Mugnier, Elie Najm, Federico Ulliana.

Located in Montpellier, #DigitAg (for Digital Agriculture) gathers 17 founding members: research institutes, including Inria, the University of Montpellier and higher-education institutes in agronomy, transfer structures and companies. Its objective is to support the development of digital agriculture. BOREAL is involved in this project on the issues of designing data and knowledge management systems adapted to agricultural information systems, and of developing methods for integrating different types of information and knowledge (generated from data, experts, models). A PhD thesis (Elie Najm) investigates knowledge representation and reasoning for the design of new agroecological systems, in collaboration with the research laboratory ABSys - Biodiversified Agrosystems (formerly UMR SYSTEM).

# 10 Dissemination

## 10.1 Promoting Scientific Activities

### 10.1.1 Scientific Event : Organization

At the LIRMM level, Madalina Croitoru is co-chair of the transversal axis and workshop on Artificial Intelligence for human-robot interaction.

### 10.1.2 Scientific Events: Selection

#### Chair of conference program committees

Marie-Laure Mugnier has been program co-chair of the 35th International Workshop on Description Logics (DL 2022), part of the Federated Logic Conference (FLoC 2022). DL is the main international event of the description logic research community.

#### Scientific chair

Teams members have acted as area chairs in the following conferences:

• AAMAS 2022 (21st International Conference on Autonomous Agents and Multiagent Systems): Area chair - Madalina Croitoru
• ICCS 2022 (27th International Conferences on Conceptual Structures ): Steering Committe - Madalina Croitoru

#### Member of International conference program committees

• KR 2022 (19th International Conference on Principles of Knowledge Representation and Reasoning) PC member - Michel Leclère, Marie-Laure Mugnier
• AAAI 2023 (37th AAAI Conference on Artificial Intelligence): PC member - Federico Ulliana
• PODS 2023 (International Conference on the Principles of Data Management): PC member - Nofar Carmeli
• IJCAI 2022 (31st International Joint Conference on Artificial Intelligence): PC member - Madalina Croitoru, Marie-Laure Mugnier
• ICCS 2022 (27th International Conferences on Conceptual Structures ): PC member - Madalina Croitoru

#### Member of National conference program committees

• IC 2022 (Ingénierie des Connaissances 2022) - PC member - Michel Leclère
• JIAF 2022 (Journées d’Intelligence Artificielle Fondamentale): - PC member - Marie-Laure Mugnier

### 10.1.3 Invited Talks

• Nofar Carmeli - Invited talk Enumeration and Related Problems in Query Answering: - WEPA 2022 (Workshop on Enumeration Problems and Applications, 22-25/11/2022
• Marie-Laure Mugnier - Invited talk Intégration de données hétérogènes dirigée par des ontologies - Journées plénières du GDR IA, 14/10/2022.
• Federico Ulliana - Invited talk An Introduction to the Boundedness Problem for Existential Rules - LaHDAK team seminar (online), LRI, 14/02/2022.
• Federico Ulliana - Invited talk Rule-based Languages for Reasoning on Data : The Virtual Approach for Key-Value Stores - SPARKS team seminar (online), I3S, 30/03/2022.
• Florent Tornil - Invited talk T-CALIS-FAIR: Un exemple d'Intégration de données hétérogènes dans l'environnement InteGraal - CATI DIISCICO (online), INRAE, UMET, Villeneuve d’Ascq, 09/11/2022.

### 10.1.4 Scientific expertise

• Federico Ulliana was a member of the jury for the awarding the best PhD thesis award of the National BDA 2022 conference.
• Marie-Laure Mugnier was a member of the jury for the awarding the best PhD thesis award SIF (Gilles Kahn) 2022.
• Marie-Laure Mugnier was a member of the recruitment committee for a Professorship position, Université Toulouse Capitole.

• Madalina Croitoru has been deputy member of the CNU section 27 (Computer Science) since September 2019.
• Madalina Croitoru has been deputy director of the Computer Science Department at the Faculty of Science, University of Montpellier since September 2021.
• Marie-Laure Mugnier has been president of the “Section 27 Commitee” (Computer Science) of the University of Montpellier since July 2021.
• Marie-Laure Mugnier has been a member of the Council of the Scientific Department MIPS (Mathematics Informatics Physics and Systems) of the University of Montpellier since 2016.
• Federico Ulliana is part of the Inria CDT (Commission Développement Technologique) since May 2022. This committee is in charge of reviewing and following software development projets within the Inria centre at Université Côte d’Azur.

## 10.2 Teaching - Supervision - Juries

### 10.2.1 Teaching

Madalina Croitoru, Michel Leclère, and Marie-Laure Mugnier, do an average of 200 teaching hours per year at the Computer Science department of the Science Faculty. They are in charge of courses in Logics (Licence), Artificial Intelligence (Master), Knowledge Representation and Reasoning (Master), Theory of Data and Knowledge Bases (Master), Social and Semantic Web (Master) and Multi-Agent Systems (Master). Concerning full-time researchers in 2022, Jean-François Baget, David Carral, and Federico Ulliana (on secondment from Montpellier University) both taught in the Computer Science Master; topics include Theory of Data and Knowledge Bases, Datawarehouses, Big-Data and NoSQL systems.

Madalina Croitoru has also been in charge of international relations for the Computer Science department of the Science Faculty since September 2019.

### 10.2.2 Supervision

#### PhD

The following PhD thesis started in 2022:

• Mohammed Aziz Sfar Gandoura, “Génération de scénarios de tests pour les systèmes de contrôle-commande logique : une application pour les centrales nucléaires palier N4”. Supervisors: Madalina Croitoru. Started January 2022.

The following PhD theses have been defended in December 2022.

• Martin Jedwabny, “Argumentation and ethical decision making”. Supervisors: Madalina Croitoru and Pierre Bisquert. Defended on 02/12/2022.
• Elie Najm, “Knowledge Representation and Reasoning for innovating agroecological systems”. Supervisors: Marie-Laure Mugnier, Christian Gary (INRAE, UMR ABSys), Jean-François Baget and Raphaël Metral (Supagro, UMR ABSys). Defended on 13/12/2022.

The following PhD theses are in progress:

• Olivier Rodriguez, “Querying key-value store under semantic constraints”. Supervisors: Federico Ulliana and Marie-Laure Mugnier. Started February 2019.
• Guillaume Pérution-Kihli, “Des données aux connaissances : un cadre unifié pour l’intégration sémantique de données hétérogènes et l’amélioration de leur qualité”. Supervisors: Michel Leclère and Marie-Laure Mugnier. Started September 2020.

#### Postdoc

Marie-Laure Mugnier supervised Maxime Buron's postdoc from March to August (6 months). This postdoc took place within the R4Agri project and was devoted to the optimization of query answering based on mappings to heterogeneous data sources.

#### Engineers

Many team members (Federico Ulliana, Michel Leclère, Pierre Bisquert, Marie-Laure Mugnier, and Jean-François Baget) follow on a regular basis the development of the InteGraal software and jointly supervise Florent Tornil's work as an engineer.

#### Interns

This year, the team has welcomed a number of interns.

• David Camarazo (Master 2 U. Montpellier, 6 months) worked on a characterization of hybrid reasoning for existential rules. Supervisors: Federico Ulliana, Michel Leclère, Pierre Bisquert.
• Lucas Rouquette (Master 2 U. Montpellier, 6 months) worked on an optimization procedure for disjunctive datalog reasoning. Supervisors: David Carral, Pierre Bisquert, Federico Ulliana.
• Sandra Victor (Master 2 U. Montpellier, 6 months) worked on a computational model for human / robot negotiation. Supervisors: Madalina Croitoru, Pierre Bisquert, Ganesh Gowrishankar.
• Sara Gruson (L3 ENS ULM, 2 months) worked on a characterization of bounded existential rules. Supervisors: Michel Leclère, Federico Ulliana.
• Paul Fontaine (L2 ENS U. Montpellier, 1 month) worked on an algorithm for forgetting predicates for a fragment of existential rules. Supervisors: Michel Leclère, Federico Ulliana.

### 10.2.3 Articles and contents

The BOREAL team participated to the event organized on September 2022 for the 30th anniversary of the creation of the LIRMM laboratory (with 3 stands).

Marie-Laure Mugnier and Pierre Bisquert participated to the writing of a “white-book” on digital agriculture issued from a joint work between Inria and INRAE 16.

# 11 Scientific production

## 11.1 Major publications

• 1 inproceedingsCapturing Homomorphism-Closed Decidable Queries with Existential Rules (Extended Abstract).Proceedings of the Thirty-First International Joint Conference on Artificial IntelligenceIJCAI-ECAI 2022 - 31st International Joint Conference on Artificial Intelligence - 25th European Conference on Artificial IntelligenceVienna, AustriaJuly 2022, 5269-5273
• 2 inproceedingsD.David Carral, L.Lucas Larroque, M.-L.Marie-Laure Mugnier and M.Michaël Thomazo. Normalisations of Existential Rules: Not so Innocuous!KR 2022 - 19th International Conference on Principles of Knowledge Representation and ReasoningHaÏfa, Israel2022, 102-111
• 3 inproceedingsP.Piotr Ostropolski-Nalewaja, J.Jerzy Marcinkowski, D.David Carral and S.Sebastian Rudolph. A Journey to the Frontiers of Query Rewritability.PODS 2022 - 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsPhiladelphia, United StatesJune 2022, 359–367

## 11.2 Publications of the year

### International journals

• 4 articleC.Cedric Baudrit, P.Patrice Buche, N.Nadine Leconte, C.Christopher Fernandez, M.Maëllis Belna and G.Geneviève Gésan-Guiziou. Decision support tool for the agri-food sector using data annotated by ontology and Bayesian network: a proof of concept applied to milk microfiltration..International Journal of Agricultural and Environmental Information Systems1312022
• 5 articleAn efficient algorithm for reasoning over OWL EL ontologies with nominal schemas.Journal of Logic and ComputationMay 2022, exac032
• 6 articleCombining ontology and probabilistic models for the design of bio-based product transformation processes.Expert Systems with Applications203October 2022, 117406
• 7 articleR.Rallou Thomopoulos, P.Pierre Bisquert, B. v.Bart van Der Burg and E.Erwan Engel. Good practices and ethical issues in food safety related research.Global Pediatrics2December 2022, #100016

### International peer-reviewed conferences

• 8 inproceedingsRewriting the Infinite Chase.VLDB 2022 - 48th International Conference on Very Large Databases1511Sydney, Australia2022, 3045-3057
• 9 inproceedingsR.Raven Beutner, D.David Carral, B.Bernd Finkbeiner, J.Jana Hofmann and M.Markus Krötzsch. Deciding Hyperproperties Combined with Functional Specifications.LICS 2022 - 37th Annual ACM/IEEE Symposium on Logic in Computer ScienceA56Haifa, IsraelACM2022, 1-13
• 10 inproceedingsCapturing Homomorphism-Closed Decidable Queries with Existential Rules (Extended Abstract).Proceedings of the Thirty-First International Joint Conference on Artificial IntelligenceIJCAI-ECAI 2022 - 31st International Joint Conference on Artificial Intelligence - 25th European Conference on Artificial IntelligenceVienna, AustriaJuly 2022, 5269-5273
• 11 inproceedingsD.David Carral, L.Lucas Larroque, M.-L.Marie-Laure Mugnier and M.Michaël Thomazo. Normalisations of Existential Rules: Not so Innocuous!KR 2022 - 19th International Conference on Principles of Knowledge Representation and ReasoningHaÏfa, Israel2022, 102-111
• 12 inproceedingsScrutable Robot Actions Using a Hierarchical Ontological Model.Lecture Notes in Computer ScienceICCS 2022 - 27th International Conference on Conceptual StructuresLNCS-13403Graph-Based Representation and ReasoningMünster, GermanySpringer International PublishingSeptember 2022, 11-24
• 13 inproceedingsE.Elie Najm, J.-F.Jean-François Baget and M.-L.Marie-Laure Mugnier. Rule-Based Data Access: A Use Case in Agroecology.http://ceur-ws.org/Vol-3229/RuleML+RR 2022 - 16th International Rule Challenge3229RuleML+RR-Companion 2022 International Rule Challenge and Doctoral ConsortiumBerlin, GermanyCEUR-WS.orgSeptember 2022
• 14 inproceedingsP.Piotr Ostropolski-Nalewaja, J.Jerzy Marcinkowski, D.David Carral and S.Sebastian Rudolph. A Journey to the Frontiers of Query Rewritability.PODS 2022 - 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsPhiladelphia, United StatesJune 2022, 359–367
• 15 inproceedingsA graph based semantics for Logical Functional Diagrams in power plant controllers.FoIKS 2022 - 12th International Symposium on Foundations of Information and Knowledge SystemsLecture Notes in Computer ScienceHelsinki, FinlandSpringer International PublishingJuly 2022, 55-74

### Scientific book chapters

• 16 inbookFoundations and state of the art.Agriculture and Digital Technology: Getting the most out of digital technology to contribute to the transition to sustainable agriculture and food systemsWhite book Inrira6INRIA2022, 30-75

### Edition (books, proceedings, special issue of a journal)

• 17 proceedingsO.Offer ArieliM.Martin HomolaJ.Jean JungM.-L.Marie-Laure MugnierProceedings of the 35th International Workshop on Description Logics (DL 2022) co-located with Federated Logic Conference (FLoC 2022).3263CEUR Workshop Proceedings2022

### Doctoral dissertations and habilitation theses

• 18 thesisA preference-based approach to machine ethics for automated planning.Universite de MontpellierDecember 2022
• 19 thesisE.Elie Najm. Reasoning on data in agroecology: application to the selection of service plant species.Université de MontpellierDecember 2022

### Reports & preprints

• 20 miscIntegrating Data and Knowledge to Support the Selection of Service Plant Species in Agroecology.November 2022

## 11.3 Cited publications

• 21 inproceedingsC.Camille Bourgaux, D.David Carral, M.Markus Krötzsch, S.Sebastian Rudolph and M.Michaël Thomazo. Capturing Homomorphism-Closed Decidable Queries with Existential Rules.KR 2021 - 18th International Conference on Principles of Knowledge Representation and ReasoningVirtual, VietnamNovember 2021, 141--150
• 22 inproceedingsM.Maxime Buron, M.-L.Marie-Laure Mugnier and M.Michaël Thomazo. Parallelisable Existential Rules: a Story of Pieces.Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning, KR 2021, Online event, November 3-12, 20212021, 162--173
• 23 articleJ.Joshua Sohn, P.Pierre Bisquert, P.Patrice Buche, A.Abdelraouf Hecham, P. P.Pradip P Kalbar, B.Ben Goldstein, M.Morten Birkved and S. I.Stig Irving Olsen. Argumentation Corrected Context Weighting-Life Cycle Assessment: A Practical Method of Including Stakeholder Perspectives in Multi-Criteria Decision Support for LCA.Sustainability126March 2020, 2170