EN FR
EN FR
TYREX - 2024

2024Activity reportProject-TeamTYREX

RNSR: 201221059T
  • Research center Inria Centre at Université Grenoble Alpes
  • In partnership with:CNRS, Université de Grenoble Alpes
  • Team name: Types and Reasoning for the Web
  • In collaboration with:Laboratoire d'Informatique de Grenoble (LIG)
  • Domain:Perception, Cognition and Interaction
  • Theme:Data and Knowledge Representation and Processing

Keywords

Computer Science and Digital Science

  • A2.1.1. Semantics of programming languages
  • A2.1.4. Functional programming
  • A2.1.10. Domain-specific languages
  • A2.2.8. Code generation
  • A2.4. Formal method for verification, reliability, certification
  • A3.1.1. Modeling, representation
  • A3.1.2. Data management, quering and storage
  • A3.1.4. Uncertain data
  • A3.1.6. Query optimization
  • A3.1.9. Database
  • A3.1.11. Structured data
  • A3.2.1. Knowledge bases
  • A3.2.2. Knowledge extraction, cleaning
  • A3.2.3. Inference
  • A3.2.5. Ontologies
  • A3.2.6. Linked data
  • A3.3.3. Big data analysis
  • A3.4. Machine learning and statistics
  • A3.4.1. Supervised learning
  • A6.3.3. Data processing
  • A7. Theory of computation
  • A7.1. Algorithms
  • A7.2. Logic in Computer Science
  • A9.1. Knowledge
  • A9.2. Machine learning
  • A9.7. AI algorithmics
  • A9.8. Reasoning
  • A9.10. Hybrid approaches for AI

Other Research Topics and Application Domains

  • B2. Health
  • B6.1. Software industry
  • B6.5. Information systems
  • B9.5.1. Computer science
  • B9.5.6. Data science
  • B9.7.2. Open data

1 Team members, visitors, external collaborators

Research Scientists

  • Pierre Genevès [Team leader, CNRS, Researcher]
  • Nabil Layaïda [INRIA, Senior Researcher]
  • Chandan Sharma [INRIA, Starting Research Position, from Nov 2024]

Faculty Members

  • Ugo Comignani [GRENOBLE INP, Associate Professor]
  • Nils Gesbert [GRENOBLE INP, Associate Professor]

Post-Doctoral Fellows

  • Chandan Sharma [INRIA, Post-Doctoral Fellow, from Feb 2024 until Sep 2024]
  • Chandan Sharma [CNRS, Post-Doctoral Fellow, until Jan 2024]

PhD Students

  • Richard Casetta [BNP PARIBAS , CIFRE, from Sep 2024]
  • Guillaume Delplanque [UGA]
  • Luisa Werner [UGA, until Oct 2024]
  • Maroua Zeblah [OPENSEE SAS, CIFRE]

Technical Staff

  • Sarah Chlyah [INRIA, Engineer]
  • Luisa Werner [INRIA, Engineer, from Nov 2024]

Interns and Apprentices

  • Alexandre Lagier [Inria, Intern, from Jun 2024 until Jul 2024]
  • Thomas Valentin [ENS PARIS-SACLAY, Intern, from Jun 2024 until Aug 2024]

Administrative Assistant

  • Helen Pouchot-Rouge-Blanc [INRIA]

External Collaborator

  • Laurent Carcone [W3C (ERCIM)]

2 Overall objectives

2.1 Objectives

We develop the foundations for the next generation of information extraction, data analysis and neuro-symbolic programming systems. Our research extends ideas from data management, artificial intelligence, programming languages and logic.

Extracting value from data increasingly requires sophisticated algorithms to represent, query, process, analyze and interpret data. We develop the foundations of data processing systems and neuro-symbolic programming, with a focus on extracting information from graph structures. These graph structures are obtained from raw data that may be more or less structured, noisy, uncertain or incomplete. Challenges include robust, efficient and scalable processing of large graphs obtained from such data. We study and build new information extraction methods, as well as new robust and scalable programming methods for rich graph data structures.

3 Research program

3.1 Algebraic Foundations for Robust Expressive and Efficient Information Extraction

We investigate intermediate languages based on algebraic foundations for the representation, characterization, transformations and compilation of queries. We develop the algebraic and logical foundations of advanced data programming languages (extended relational algebras, algorithms, compilers) for more expressive and efficient query languages, in particular through aspects such as recursion, types, analytics, and provenance.

3.2 Neuro-Symbolic Programming

We investigate neuro-symbolic programming methods with graphs. This includes studying the integration between neural networks and symbolic logic and/or algebra. Challenges include bridging the gap between neural networks and symbolic logic, injecting knowledge in learning processes, supporting rich knowledge and property graphs, and dealing with scalability issues for large graphs.

4 Application domains

4.1 Querying Large Graphs

The increasing availability of large-scale graph-structured data presents both opportunities and challenges. Our research focuses on efficient methods for evaluating graph queries at scale, particularly in knowledge graphs structured in the Resource Description Framework (RDF) and property graphs.

We design advanced query languages to extract insights from these graphs and compile queries into algebraic representations. These representations are then translated into executable low-level code, optimized for various backends, including relational database management systems like PostgreSQL, and big data frameworks like Apache Spark.

Graph querying has applications across diverse domains, including large knowledge bases, social networks, road networks, trust and fraud detection in cryptocurrencies, citation and web graphs, and recommendation systems.

4.2 Predictive Analytics for Healthcare

A major expectation of data science in healthcare is the ability to leverage digitized health information and computer systems to better apprehend and improve care. The availability of clinical data and in particular electronic health records opens the way to the development of models for patients that can be used to predict health status, as well as to help prevent disease and adverse effects.

In collaboration with the Grenoble University Hospital (CHUGA), we explore solutions to the problem of predicting important clinical outcomes such as risks of adverse effects, nosocomial infections or inpatient mortality, based on large amounts of clinical data.

5 Social and environmental responsibility

5.1 Impact of research results

Our work on graph query optimization helps in reducing resource consumption in information extraction. Our work in neuro-symbolic programming helps in reducing the amount of data required when training accurate artificial intelligence models, thanks to the integration of symbolic concepts and reasoning rules.

6 New software, platforms, open data

6.1 New software

6.1.1 MuIR

  • Name:
    Mu Intermediate Representation System
  • Keywords:
    Optimizing compiler, Querying
  • Functional Description:
    This is a prototype of an intermediate language representation, i.e. an implementation of algebraic terms, rewrite rules, query plans, cost model, query optimizer, and query evaluators. This includes query evaluators for a variety of RDBMS backends including PostgreSQL as well a distributed evaluator of algebraic terms using Apache Spark. This also includes an implementation of an efficient enumerator for recursive query plans, cost estimations, and compilers for recursive graph queries. The overall system is described in the CIKM 2023 demonstration paper.
  • Publications:
  • Contact:
    Pierre Genevès

6.1.2 KeGNN

  • Name:
    Knowledge Enhanced Graph Neural Networks
  • Functional Description:
    We propose KeGNN, a neuro-symbolic framework for learning on graph data that combines both paradigms and allows for the integration of prior knowledge into a graph neural network model. In essence, KeGNN consists of a graph neural network as a base on which knowledge enhancement layers are stacked with the objective of refining predictions with respect to prior knowledge. We instantiate KeGNN in conjunction with two standard graph neural networks: Graph Convolutional Networks and Graph Attention Networks, and evaluate KeGNN on multiple benchmark datasets for node classification.
  • URL:
  • Publication:
  • Contact:
    Pierre Genevès

6.1.3 Reproducibility-aaai24

  • Functional Description:
    This is a re-implementation of the experiments conducted with Knowledge Enhanced Neural Networks (KENN) on the Citeseer Dataset, including the re-implementation of the Experiments in PyTorch and PyTorch Geometric. We also extended the experiments to the datasets Cora and PubMed.
  • URL:
  • Publication:
  • Contact:
    Pierre Genevès

6.1.4 MedAnalytics

  • Keywords:
    Big data, Predictive analytics, Distributed systems
  • Functional Description:
    We implemented a method for the automatic detection of at-risk profiles based on a fine-grained analysis of prescription data at the time of admission. The system relies on an optimized distributed architecture adapted for processing very large volumes of medical records and clinical data. We conducted practical experiments with real data of millions of patients and hundreds of hospitals. We demonstrated how the various perspectives of big data improve the detection of at-risk patients, making it possible to construct predictive models that benefit from volume and variety.
  • Publications:
  • Contact:
    Pierre Genevès
  • Partner:
    CHU Grenoble

7 New results

7.1 Efficient Enumeration of Recursive Plans in Transformation-based Query Optimizers

Participants: Amela Fejza, Sarah Chlyah, Nils Gesbert, Pierre Genevès, Nabil Layaïda.

Query optimizers built on the transformation-based Volcano/Cascades framework are used in many database systems. Transformations proposed earlier on the logical query dag (LQDAG) data structure, which is key in such a framework, focus only on recursion-free queries. We propose the recursive logical query dag (RLQDAG) which extends the LQDAG with the ability to capture and transform recursive queries, leveraging recent developments in recursive relational algebra. Specifically, this extension includes: (i) the ability of capturing and transforming sets of recursive relational terms thanks to (ii) annotated equivalence nodes used for guiding transformations that are more complex in the presence of recursion; and (iii) RLQDAG rewrite rules that transform sets of subterms in a grouped manner, instead of transforming individual terms in a sequential manner; and that (iv) incrementally update the necessary annotations. Core concepts of the RLQDAG are formalized using a syntax and formal semantics with a particular focus on subterm sharing and recursion. The result is a clean generalization of the LQDAG transformation-based approach, enabling more efficient explorations of plan spaces for recursive queries 4. An implementation of the proposed approach shows significant performance gains compared to the state-of-the-art 5 [6.1.1].

7.2 Schema-Based Query Optimisation for Graph Databases

Participants: Chandan Sharma, Nils Gesbert, Pierre Genevès, Nabil Layaïda.

Recursive graph queries are increasingly popular for extracting information from interconnected data found in various domains such as social networks, life sciences, and business analytics. Graph data often come with schema information that describe how nodes and edges are organized. We propose a type inference mechanism that enriches recursive graph queries with relevant structural information contained in a graph schema. We show that this schema information can be useful in order to improve the performance when evaluating acylic recursive graph queries. Furthermore, we prove that the proposed method is sound and complete, ensuring that the semantics of the query is preserved during the schema-enrichment process 6. Experimental results with a complete implementation of the approach show very drastic performance gains for query evaluations over property graphs 6 [6.1.1].

7.3 Efficient Iterative Programs with Distributed Data Collections

Participants: Sarah Chlyah, Nils Gesbert, Nabil Layaïda, Pierre Genevès.

Big data programming frameworks have become increasingly important for the development of applications for which performance and scalability are critical. In those complex frameworks, optimizing code by hand is hard and time-consuming, making automated optimization particularly necessary. In order to automate optimization, a prerequisite is to find suitable abstractions to represent programs; for instance, algebras based on monads or monoids to represent distributed data collections. Currently, however, such algebras do not represent recursive programs in a way which allows for analyzing or rewriting them. In this paper, we extend a monoid algebra with a fixpoint operator for representing recursion as a first class citizen and show how it enables new optimizations. Experiments with the Spark platform illustrate performance gains brought by these systematic optimizations 3.

7.4 Reproduce, Replicate, Reevaluate. The Long but Safe Way to Extend Machine Learning Methods

Participants: Luisa Werner, Nabil Layaïda, Pierre Genevès.

Reproducibility is a desirable property of scientific research. On the one hand, it increases confidence in results. On the other hand, reproducible results can be extended on a solid basis. In rapidly developing fields such as machine learning, the latter is particularly important to ensure the reliability of research. We present a systematic approach to reproducing (using the available implementation), replicating (using an alternative implementation) and reevaluating (using different datasets) state-of-the-art experiments. This approach enables the early detection and correction of deficiencies and thus the development of more robust and transparent machine learning methods. We detail the independent reproduction, replication, and reevaluation of initially published experiments with a method that we want to extend. For each step, we identify issues and draw lessons learned. We further discuss solutions that have proven effective in overcoming the encountered problems. This work can serve as a guide for further reproducibility studies and generally improve reproducibility in machine learning 7 [6.1.3].

7.5 Approximate weighted model counting for neural probabilistic reasoning

Participants: Thomas Valentin, Pierre Genevès, Luisa Werner.

Neural probabilistic reasoning is a neuro-symbolic artificial intelligence method that has shown promising results, especially with systems like Scallop which have achieved state-of-the-art results for various tasks, such as visual question answering. During the probabilistic reasoning part, the computation of logical provenance formula’s prob- abilities is a major bottleneck. To address this problem, this work proposes a new approximation algorithm based on DPLL. Despite admitting an exponential complex- ity lower bound and being closely related to knowledge compilation methods commonly used, this algorithm performs better in practice. In addition, this algorithm makes it possible to avoid the complexity of the logical provenance computation phase, enabling new possibilities 9.

8 Bilateral contracts and grants with industry

8.1 Bilateral contracts with industry

Participants: Pierre Genevès, Maroua Zeblah, Richard Casetta, Nils Gesbert, Sarah Chlyah, Nabil Layaïda.

We collaborate with the French fintech startup Opensee, based in Paris, on query optimization for multidimensional data through a CIFRE-funded PhD thesis.

Additionally, we work with BNP Paribas, a major financial group, on logical and algebraic methods to assist in the development and verification of more robust software architectures, also through a CIFRE-funded PhD thesis.

9 Partnerships and cooperations

9.1 National initiatives

9.1.1 ANR

GraphRec

Participants: Pierre Genevès, Nabil Layaïda, Nils Gesbert, Sarah Chlyah, Ugo Comignani, Luisa Werner, Chandan Sharma.

  • Title: GraphRec: Efficient and Scalable Recursive Programming with Graphs
  • ANR, Appel à projets générique 2023 – CE23 – Intelligence artificielle et science des données, PRME
  • Coordinator: Pierre Genevès
  • Abstract: This project seeks to design and develop novel methods for expressive and efficient information extraction from graphs, based on recursive graph queries and neuro-symbolic programming.
  • GraphRec website
Newcare

Participants: Pierre Genevès, Nabil Layaïda, Luisa Werner.

  • Title: Network for hEalth Workers : Covid And oRganization of Emergency teams – NEWCARE
  • Duration: January 2021 – December 2024
  • Coordinator: Marie-Estelle BINET (Laboratoire d'Economie Appliquée de Grenoble)
  • Abstract: This research project has several objectives. The first one is to create an original database to describe the characteristics and interactions between caregivers working in healthcare teams in the emergency department. These data will be extracted (or desilated) from the PREDIMED clinical data warehouse (CDW), which gathers health and administrative data from patients and healthcare professionals working at Grenoble University Hospital. Then, the analysis of social networks will allow us to identify the modes of collaboration in place between caregivers and their ability to adapt to their environment. Impact evaluation methods will allow us to estimate the impact of the organizational changes caused by the covid-19 health crisis on the quality of work and the well-being of healthcare professionals.
Participation to MIAI Chairs

Participants: Pierre Genevès, Nabil Layaïda, Amela Fejza, Luisa Werner.

P. Genevès is member of the board of the DeepCare MIAI Chair. A. Fejza has participated to the DeepCare MIAI Chair. N. Layaïda, L.Werner and P. Genevès also participate to the Knowledge communication and evolution MIAI Chair.

10 Dissemination

Participants: Sarah Chlyah, Nils Gesbert, Ugo Comignani, Pierre Genevès, Nabil Layaïda.

10.1 Promoting scientific activities

10.1.1 Scientific events: selection

Member of the conference program committees
  • Pierre Genevès has been PC member of SIGMOD 2025.
  • Ugo Comignani has been PC member of SIGMOD 2025.

10.1.2 Scientific expertise

Pierre Genevès has been referee for the Agence Nationale de la recherche (ANR), in charge of reviewing ANR project proposals.

Pierre Genevès has been expert reviewer for the Qatar National Research Fund.

10.1.3 Research administration

Pierre Genevès is co-responsible for the Computer Science Specialty at the MSTII Doctoral School of University Grenoble Alpes (ED 217).

Pierre Genevès is member of the board at Grenoble Informatics Laboratory (LIG), responsible for the research axis on formal methods, models and languages.

Nabil Layaïda is a member of the scientific committee of the LabEx PERSYVAL-lab (Pervasive Systems and Algorithms).

Nabil Layaïda is a member of the Scientific Board of Digital League, the digital cluster of Auvergne-Rhône-Alpes.

10.2 Teaching - Supervision - Juries

10.2.1 Teaching

  • Master: P. Genevès is co-responsible and teacher of the M2-level course “Fundamentals of Data Processing and Distributed Knowledge” of the MOSIG program at UGA (36h)
  • Master: P. Genevès is co-responsible and teacher of the M2-level course “Accès à l'information: du web des données au web sémantique” in the ENSIMAG ISI 3A program at Grenoble-INP (30h)
  • Master : N. Gesbert, “Analyse et Conception Objet de Logiciels”, 30 h eq TD, M1, Grenoble INP
  • Master : N. Gesbert, “Construction d'applications Web”, 27 h eq TD, M1, Grenoble INP
  • Master : N. Gesbert, “Principes des systèmes de gestion des bases de données”, 54 h eq TD, M1, Grenoble INP
  • Licence : N. Gesbert, “Logique pour l’informatique”, 45 h eq TD, L3, Grenoble INP
  • N. Gesbert is in charge of the L3-level course “logique pour l'informatique” at Grenoble INP Ensimag.
  • N. Gesbert is responsible of the pedagogical team “Gestion de données” at Grenoble INP Ensimag.
  • Master : U. Comignani is co-responsible of the “BigData” master, co-accredited between Grenoble Ecole de Management and Grenoble INP
  • Master : U. Comignani is in charge of the “Projets fil rouge”, 10 h eq TD, MS BigData, Grenoble INP
  • Master : U. Comignani, “Principes des systèmes de gestion de bases de données”, 99.5 h eq TD, M1, Grenoble INP
  • Master : U. Comignani is in charge of the “Projet BD”, 64 h eq TD, M1, Grenoble INP
  • Master : U. Comignani, “Stockage et traitement de données à grande échelle”, 34 h eq TD, M2, Grenoble INP
  • Master : U. Comignani, academic tutorship of an apprentice, 10 h eq TD, M1, Grenoble INP

10.2.2 Supervision

Nabil Layaïda and Pierre Genevès have co-supervised Luisa Werner's PhD thesis entitled “Neural Symbolic Integration for Knowledge Graphs”, defended in December 2024.

PhD in progress: Maroua Zeblah, Query Optimisation for column oriented databases, PhD started in April 2023, co-supervised by Pierre Genevès and Nabil Layaïda.

PhD in progress: Guillaume Delplanque, Differentiable programming for Knowledge Graphs, PhD started in September 2023, co-supervised by Pierre Genevès and Nabil Layaïda.

PhD in progress: Richard Casetta, Formal verification of cloud applications, PhD started in 2024, co-supervised by Pierre Genevès and Nils Gebsert.

Pierre Genevès has supervised the internship of Thomas Valentin (ENS Paris-Saclay).

10.2.3 Juries

Pierre Genevès has been referee for the PhD thesis of Lyes Attouche at Université Paris-Dauphine, defended in December 2024, on the topic of Data Generation for JSON Schema.

10.3 Popularization

10.3.1 Productions (articles, videos, podcasts, serious games, ...)

Nabil Layaïda a participé à la rédaction d'un livre de vulgarisation intitulé “Interrogation du web sémantique. Du big data à l'IA: 60 ans d'expérience en traitement des données, des informations et des connaissances à Grenoble” 8.

11 Scientific production

11.1 Major publications

11.2 Publications of the year

International journals

International peer-reviewed conferences

Scientific book chapters

  • 8 inbookJ.Jérôme David, J.Jérôme Euzenat, N.Nabil Layaïda, N.Nabil Layaïda and M.-C.Marie-Christine Rousset. Interrogation du web sémantique.Du big data à l'IA: 60 ans d'expérience en traitement des données, des informations et des connaissances à GrenobleUGA Éditions2024, 603-623HALback to text

Other scientific publications

Scientific popularization