VALDA

VALDA - 2025

2025Activity report‌Project-TeamVALDA

RNSR: 201622223R‌

Research center Inria Paris‌‌ Centre
In partnership with:Ecole normale supérieure de‌ Paris, CNRS
Team name:‌ Value from Data
In‌‌ collaboration with:Département d'Informatique de l'Ecole Normale Supérieure‌

Creation of the Project-Team:‌ 2018 January 01

Each‌‌ year, Inria research teams publish an Activity Report‌ presenting their work and‌ results over the reporting‌‌ period. These reports follow a common structure, with‌ some optional sections depending‌ on the specific team.‌‌ They typically begin by outlining the overall objectives‌ and research programme, including‌ the main research themes,‌‌ goals, and methodological approaches. They also describe the‌ application domains targeted by‌ the team, highlighting the‌‌ scientific or societal contexts in which their work‌ is situated.

The reports‌ then present the highlights‌‌ of the year, covering major scientific achievements, software‌ developments, or teaching contributions.‌ When relevant, they include‌‌ sections on software, platforms, and open data, detailing‌ the tools developed and‌ how they are shared.‌‌ A substantial part is dedicated to new results,‌ where scientific contributions are‌ described in detail, often‌‌ with subsections specifying participants and associated keywords.

Finally,‌ the Activity Report addresses‌ funding, contracts, partnerships, and‌‌ collaborations at various levels, from industrial agreements to‌ international cooperations. It also‌ covers dissemination and teaching‌‌ activities, such as participation‌ in scientific events, outreach, and supervision. The document‌ concludes with a presentation of scientific production, including‌ major publications and those produced during the year.‌

Keywords

Computer Science and Digital Science

A3.1. Data‌
A3.1.1. Modeling, representation
A3.1.2. Data management, quering and‌ storage
A3.1.3. Distributed data
A3.1.4. Uncertain data
A3.1.5.‌ Control access, privacy
A3.1.6. Query optimization
A3.1.7. Open‌ data
A3.1.8. Big data (production, storage, transfer)
A3.1.9.‌ Database
A3.1.10. Heterogeneous data
A3.1.11. Structured data
A3.2.‌ Knowledge
A3.2.1. Knowledge bases
A3.2.2. Knowledge extraction, cleaning‌
A3.2.3. Inference
A3.2.4. Semantic Web
A3.2.5. Ontologies
A3.2.6.‌ Linked data
A3.3. Data and knowledge analysis
A3.3.1.‌ On-line analytical processing
A3.3.2. Data mining
A3.3.3. Big‌ data analysis
A3.5.1. Analysis of large graphs
A4.7.‌ Access control
A7.2. Logic in Computer Science
A7.3.‌ Calculability and computability
A9.1. Knowledge
A9.2.3. Reinforcement learning‌
A9.2.5. Bayesian methods
A9.8. Reasoning

1 Team members, visitors,‌ external collaborators

Research Scientists

Serge Abiteboul [Inria‌, Emeritus, HDR]
Paul Boniol [‌Inria, ISFP]
Camille Bourgaux [CNRS‌, Researcher]
Luc Segoufin [Inria,‌ Senior Researcher, HDR]
Michael Thomazo [‌Inria, Researcher, HDR]

Faculty Member‌

Pierre Senellart [Team leader, ENS-PSL,‌ Professor, HDR]

PhD Students

Felix Chavelli‌ [Inria]
Anatole Dahan [Université Paris-Cité‌, until Jul 2025]
Antoine Gauquier [‌ENS-PSL]
Robin Jean [CNRS]
Lucas‌ Larroque [ENS-PSL]
Magali Parrino [EDF‌, CIFRE, from Jul 2025]
Aryak‌ Sen [CNRS & Université de Grenoble]‌
Marijan Soric [Inria, from Mar 2025‌]
Emmanouil Sylligardos [ENS-PSL]

Technical Staff‌

Louis Chanaron [Inria, Engineer, from‌ Oct 2025]

Interns and Apprentices

Arushi Goyal‌ [IIT Delhi & ENS-PSL, Intern,‌ until May 2025]
Adam Rozzio [ENS‌ Paris-Saclay & ENS-PSL, Intern, from Feb‌ 2025 until Jul 2025]
Marijan Soric [‌Centrale Lyon & Inria, Intern, until‌ Feb 2025]

Administrative Assistant

Meriem Guemair [‌Inria]

Visiting Scientist

Victor Vianu [UC‌ San Diego, from Jun 2025]

2‌ Overall objectives

Valda's focus is on both foundational‌ and systems aspects of complex data management,‌ especially human-centric data. The data we are‌ interested in is typically heterogeneous, massively distributed, rapidly‌ evolving, intensional, and often subjective, possibly erroneous, imprecise,‌ incomplete. In this setting, Valda is in particular‌ concerned with the optimization of complex resources such‌ as computer time and space, communication, monetary, and‌ privacy budgets. The goal is to extract value‌ from data, beyond simple query answering.

Data‌ management 50, 52 is now an old,‌ well-established field, for which many scientific results and techniques have been accumulated‌ since the sixties. Originally,‌ most works dealt with‌‌ static, homogeneous, and precise data. Later, works were‌ devoted to heterogeneous data‌ 4951, and‌‌ possibly distributed 56 but at a small scale.‌

However, these classical techniques‌ are poorly adapted to‌‌ handle the new challenges of data management. Consider‌ human-centric data, which is‌ either produced by humans,‌‌ e.g., emails, chats, recommendations, or produced by systems‌ when dealing with humans,‌ e.g., geolocation, business transactions,‌‌ results of data analysis. When dealing with such‌ data, and to accomplish‌ any task to extract‌‌ value from such data, we rapidly encounter the‌ following facets:

Heterogeneity:‌ data may come in‌‌ many different structures such as unstructured text, graphs,‌ data streams, complex aggregates,‌ etc., using many different‌‌ schemas or ontologies.
Massive distribution: data may‌ come from a large‌ number of autonomous sources‌‌ distributed over the web, with complex access patterns.‌
Rapid evolution: many‌ sources may be producing‌‌ data in real time, even if little of‌ it is perhaps relevant‌ to the specific application.‌‌ Typically, recent data is of particular interest and‌ changes have to be‌ monitored.
Intensionality1:‌‌ in a classical database, all the data is‌ available. In modern applications,‌ the data is more‌‌ and more available only intensionally, possibly at some‌ cost, with the difficulty‌ to discover which source‌‌ can contribute towards a particular goal, and this‌ with some uncertainty.
Confidentiality‌ and security: some‌‌ personal data is critical and need to remain‌ confidential. Applications manipulating personal‌ data must take this‌‌ into account and must be secure against linking.‌
Uncertainty: modern data,‌ and in particular human-centric‌‌ data, typically includes errors, contradictions, imprecision, incompleteness, which‌ complicates reasoning. Furthermore, the‌ subjective nature of the‌‌ data, with opinions, sentiments, or biases, also makes‌ reasoning harder since one‌ has, for instance, to‌‌ consider different agents with distinct, possibly contradicting knowledge.‌

These problems have already‌ been studied individually and‌‌ have led to techniques such as query rewriting‌54 or distributed query‌ optimization55.

Among‌‌ all these aspects, intensionality is perhaps the one‌ that has least been‌ studied, so let us‌‌ expand a bit on this. Consider a user's‌ query, taken in a‌ very broad sense: it‌‌ may be a classical database query, some information‌ retrieval search, a clustering‌ or classification task, or‌‌ some more advanced knowledge extraction request. Because of‌ intensionality of data, solving‌ such a query is‌‌ a typically dynamic task: each time new data‌ is obtained, the partial‌ knowledge a system has‌‌ of the world is revised, and query plans‌ need to be updated,‌ as in adaptive query‌‌ processing 53 or aggregated search 59. The‌ system then needs to‌ decide, based on this‌‌ partial knowledge, of the best next access to‌ perform. This is reminiscent‌ of the central problem‌‌ of reinforcement learning 58 (train an agent to‌ accomplish a task in‌ a partially known world‌‌ based on rewards obtained)‌ and of active learning 57 (decide which action‌ to perform next in order to optimize a‌ learning strategy) and we intend to explore this‌ connection further.

Uncertainty of the data interacts with‌ its intensionality: efforts are required to obtain more‌ precise, more complete, sounder results, which yields a‌ trade-off between processing cost and data quality.‌

Other aspects, such as heterogeneity and massive distribution,‌ are of major importance as well. A standard‌ data management task, such as query answering, information‌ retrieval, or clustering, may become much more challenging‌ when taking into account the fact that data‌ is not available in a central location, or‌ in a common format. We aim to take‌ these aspects into account, to be able to‌ apply our research to real-world applications.

3 Research‌ program

3.1 Research axis 1: Foundations of data‌ management

This axis covers the theory of data‌ management, broadly taken, and in particular the fields‌ of database theory, knowledge representation, and‌ some symbolic aspects of artificial intelligence (especially, reasoning‌ on data).

The goal is to define‌ solid and high-level foundations of data management tasks‌ (query evaluation and optimization of various forms of‌ queries, counting, reasoning, verification of data-centric processes, etc.)‌ through formal tools, such as logics (esp., finite‌ model theory), automata theory, complexity theory; we occasionally‌ have contributions in these areas as well, though‌ most of our work is motivated by data‌ applications. We are especially interested in clean specifications‌ of key aspects of database systems and data‌ management tasks (e.g, confidentiality, access control, robustness), whether‌ they are properties of the data or appropriate‌ (query) languages for these tasks. We study expressive‌ power of languages, computability and complexity of deciding‌ or computing results, as well as the design‌ of appropriate structures (e.g., indexes) to optimize these‌ tasks.

3.2 Research axis 2: Uncertainty, provenance, and‌ explainability in data management

This research axis deals‌ with the modeling and efficient management of data‌ that come with some uncertainty (probabilistic distributions, logical‌ incompleteness, missing values, inconsistencies, open-world assumption, etc.) and‌ with provenance information (indicating where the data originates‌ from), as well as with the extraction of‌ uncertainty and provenance annotations from real-world data. Provenance‌ is also linked to explainability: determining where the‌ result of a data management task comes from,‌ how and why it was produced, helps explaining‌ it. Interestingly, the foundations and tools for uncertainty‌ management often rely on provenance annotations. For example,‌ a typical way to compute the probability of‌ query results in probabilistic databases is the so-called‌ intensional approach: first generate the provenance of these‌ query results (in some appropriate framework, e.g., that‌ of Boolean functions or of provenance semirings), and‌ then compute the probability of the resulting provenance‌ annotation. For this reason, we deal with uncertainty‌ and provenance in a unified manner, and with‌ explainability as an application thereof.

3.3 Research axis‌ 3: Knowledge discovery at scale

Our final axis deals with knowledge discovery‌ at scale. The goal‌ is to use techniques‌‌ such as data mining, information extraction, data cleaning,‌ information integration, machine learning,‌ to derive knowledge from‌‌ raw, dirty, inconsistent, heterogeneous, rapidly changing, data from‌ real-world application scenarios.

We‌ intend to leverage our‌‌ expertise on data management to focus on the‌ scalability of the approaches‌ and tools developed. This‌‌ is also in some sense an application axis‌ for techniques developed in‌ the other two axes;‌‌ in particular, we have a focus on intensionality‌ of data (i.e., cost‌ to data access), on‌‌ the trade-off between data uncertainty and its cost,‌ on data provenance and‌ explanations.

This axis is‌‌ typically very changing in subtopics, depending on projects,‌ collaborations, application partners.

4‌ Application domains

A large‌‌ part of Valda's research is foundational in nature‌ and not tailored to‌ any specific application domain.‌‌ Some applied works target certain application domains however:‌

Web data
in a‌ broad-sense (semi-structured, structured or‌‌ unstructured content extracted from Web databases; knowledge bases‌ from the Semantic Web;‌ social networks; Web archives‌‌ and Web crawls; Web applications and deep Web‌ databases; crowdsourcing platforms). This‌ is a historical domain‌‌ of interest of Valda researchers, and we have‌ expertise in the acquisition,‌ extraction, and management of‌‌ this kind of data.
Open science
(publication databases,‌ scientific publications, open-source software).‌
Clinical data
(notably inconsistent‌‌ or incomplete hospital records).
Energy
(notably data from‌ power stations, in collaboration‌ with industrial partners).
Geoscience‌‌
(seismology or vulcanology time series, structured data about‌ geological campaigns).
Data journalism‌
(statistical datasets, fact checking‌‌ data).

Finally, transversal concerns which occur in different‌ applications area and motivate‌ some of our theory‌‌ work are ethics of data management and privacy.‌

5 Highlights of the‌ year

The Inria–BRGM Géolaug‌‌ challenge, which Valda contributes to, was launched in‌ September 2025.

5.1 Awards‌

Camille Bourgaux , Anton‌‌ Gnatenko (Free University of Bozen–Bolzano), and Michael Thomazo‌ have received a best‌ contribution award at DL‌‌ 2025 and an outstanding paper award at ECAI‌ 2025 for their work‌ on analysing temporal reasoning‌‌ in description logics using formal grammars 27,‌ 28.

6 Latest‌ software developments, platforms, open‌‌ data

6.1 Latest software developments

6.1.1 ProvSQL

Keywords:‌
Databases, Provenance, Probability
Scientific‌ Description:
ProvSQL is a‌‌ general and easy-to-deploy provenance tracking and probabilistic database‌ system implemented as a‌ PostgreSQL extension. ProvSQL’s data‌‌ and query models closely reflect that of a‌ large core of SQL,‌ including multiset semantics, the‌‌ full relational algebra, and aggregation. A key part‌ of its implementation relies‌ on generic provenance circuits‌‌ stored in memory-mapped files.
Functional Description:
The goal‌ of the ProvSQL project‌ is to add support‌‌ for (m-)semiring provenance and uncertainty management to PostgreSQL‌ databases, in the form‌ of a PostgreSQL extension/module/plugin.‌‌
News of the Year:
Compatibility with PostgreSQL 18.‌ Support for PROV-XML output.‌ Partial support of HAVING‌‌ queries. Support for compiled semirings, including the counting,‌ Boolean, and Why semirings.‌ Basic documentation infrastructure. Temporal‌‌ semiring and temporal database‌ support. Various minor enhancements and bug fixes.
URL:‌
https://github.com/PierreSenellart/provsql
Publications:
hal-05037471, hal-05072212, hal-04930705,‌ hal-04911715, hal-04561331, hal-04393781, hal-01672566,‌ hal-01851538
Contact:
Pierre Senellart
Participants:
Aryak Sen, Pierre‌ Senellart
Partners:
Université Grenoble Alpes, CNRS, National University‌ of Singapore

6.1.2 VUS

Name:
Volume Under the‌ Surface
Keywords:
Time Series, Anomaly detection, Measures, Performance‌ measure, Python
Scientific Description:
Anomaly detection (AD) is‌ a fundamental task for time-series analytics with important‌ implications for the downstream performance of many applications.‌ In contrast to other domains where AD mainly‌ focuses on point-based anomalies (i.e., outliers in standalone‌ observations), AD for time series is also concerned‌ with range-based anomalies (i.e., outliers spanning multiple observations).‌ Nevertheless, it is common to use traditional point-based‌ information retrieval measures, such as Precision, Recall, and‌ F-score, to assess the quality of methods by‌ thresholding the anomaly score to mark each point‌ as an anomaly or not. However, mapping discrete‌ labels into continuous data introduces unavoidable shortcomings, complicating‌ the evaluation of range-based anomalies. Notably, the choice‌ of evaluation measure may significantly bias the experimental‌ outcome. Despite over six decades of attention, there‌ has never been a large-scale systematic quantitative and‌ qualitative analysis of time-series AD evaluation measures. This‌ paper extensively evaluates quality measures for time-series AD‌ to assess their robustness under noise, misalignments, and‌ different anomaly cardinality ratios. Our results indicate that‌ measures producing quality values independently of a threshold‌ (i.e., AUC-ROC and AUC-PR) are more suitable for‌ time-series AD. Motivated by this observation, we first‌ extend the AUC-based measures to account for range-based‌ anomalies. Then, we introduce a new family of‌ parameter-free and threshold-independent measures, VUS (Volume Under the‌ Surface), to evaluate methods while varying parameters. Our‌ findings demonstrate that our four measures are significantly‌ more robust in assessing the quality of time-series‌ AD methods.
Functional Description:
The receiver operator characteristic‌ (ROC) curve and the area under the curve‌ (AUC) are widely used to compare the performance‌ of different anomaly detectors. They mainly focus on‌ point-based detection. However, the detection of collective anomalies‌ concerns two factors: whether this outlier is detected‌ and what percentage of this outlier is detected.‌ The first factor is not reflected in the‌ AUC. Another problem is the possible shift between‌ the anomaly score and the real outlier due‌ to the application of the sliding window. To‌ tackle these problems, we incorporate the idea of‌ range-based precision and recall, and suggest the range-based‌ ROC and its counterpart in the precision-recall space,‌ which provides a new evaluation for the collective‌ anomalies. We finally introduce a new measure VUS‌ (Volume Under the Surface) which corresponds to the‌ averaged range-based measure when we vary the range‌ size. We demonstrate in a large experimental evaluation‌ that the proposed measures are significantly more robust‌ to important criteria (such as lag and noise)‌ and also significantly more useful to separate correctly‌ the accurate from the the inaccurate methods.
News of the Year:

We‌ recently published in 2025‌ a new paper introducing‌‌ two optimized implementations of VUS that significantly reduce‌ the execution time of‌ the initial implementation.

Publication:‌‌ https://inria.hal.science/hal-05076186
URL:
https://github.com/TheDatumOrg/VUS
Publication:
hal-05076186
Contact:
Paul Boniol‌
Participants:
Paul Boniol, Emmanouil‌ Sylligardos, 9 anonymous participants‌‌
Partners:
Ohio State University, Université Paris-Descartes

6.1.3 TSB-UAD‌

Keywords:
Time Series, Anomaly‌ detection, Python, Library
Scientific‌‌ Description:
The detection of anomalies in time series‌ has gained ample academic‌ and industrial attention. However,‌‌ no comprehensive benchmark exists to evaluate time-series anomaly‌ detection methods. It is‌ common to use (i)‌‌ proprietary or synthetic data, often biased to support‌ particular claims, or (ii)‌ a limited collection of‌‌ publicly available datasets. Consequently, we often observe methods‌ performing exceptionally well in‌ one dataset but surprisingly‌‌ poorly in another, creating an illusion of progress.‌ To address the issues‌ above, we thoroughly studied‌‌ over one hundred papers to identify, collect, process,‌ and systematically format datasets‌ proposed in the past‌‌ decades. We summarize our effort in TSB-UAD, a‌ new benchmark to ease‌ the evaluation of univariate‌‌ time-series anomaly detection methods. Overall, TSB-UAD contains 13766‌ time series with labeled‌ anomalies spanning different domains‌‌ with high variability of anomaly types, ratios, and‌ sizes. TSB-UAD includes 18‌ previously proposed datasets containing‌‌ 1980 time series and we contribute two collections‌ of datasets. Specifically, we‌ generate 958 time series‌‌ using a principled methodology for transforming 126 time-series‌ classification datasets into time‌ series with labeled anomalies.‌‌ In addition, we present data transformations with which‌ we introduce new anomalies,‌ resulting in 10828 time‌‌ series with varying complexity for anomaly detection. Finally,‌ we evaluate 12 representative‌ methods demonstrating that TSB-UAD‌‌ is a robust resource for assessing anomaly detection‌ methods. TSB-UAD provides a‌ valuable, reproducible, and frequently‌‌ updated resource to establish a leaderboard of univariate‌ time-series anomaly detection methods.‌
Functional Description:
TSB-UAD is‌‌ a new open, end-to-end benchmark suite to ease‌ the evaluation of univariate‌ time-series anomaly detection methods.‌‌ Overall, TSB-UAD contains 12686 time series with labeled‌ anomalies spanning different domains‌ with high variability of‌‌ anomaly types, ratios, and sizes. Specifically, TSB-UAD includes‌ 18 previously proposed datasets‌ containing 1980 time series‌‌ from real-world data science applications. Motivated by flaws‌ in certain datasets and‌ evaluation strategies in the‌‌ literature, we study anomaly types and data transformations‌ to contribute two collections‌ of datasets. Specifically, we‌‌ generate 958 time series using a principled methodology‌ for transforming 126 time-series‌ classification datasets into time‌‌ series with labeled anomalies. In addition, we present‌ a set of data‌ transformations with which we‌‌ introduce new anomalies in the public datasets, resulting‌ in 10828 time series‌ (92 datasets) with varying‌‌ difficulty for anomaly detection.
URL:
https://tsb-uad.readthedocs.io/en/latest/
Contact:
Paul‌ Boniol
Participants:
Paul Boniol,‌ Emmanouil Sylligardos, 5 anonymous‌‌ participants
Partners:
Université Paris-Descartes, Ohio State University

6.1.4‌ ADecimo

Name:
A Web-app‌ for the Evaluation of‌‌ Model selection for Anomaly Detection in Time Series‌
Keywords:
Time Series, Anomaly‌ detection, Web Application
Scientific‌‌ Description:
Anomaly detection is‌ a fundamental task for time-series analytics with important‌ implications for the downstream performance of many applications.‌ Despite increasing academic interest and the large number‌ of methods proposed in the literature, recent benchmark‌ and evaluation studies demonstrated that there exists no‌ single best anomaly detection method when applied to‌ heterogeneous time series datasets. Therefore, the only scalable‌ and viable solution to solve anomaly detection over‌ very different time series collected from diverse domains‌ is to propose a model selection method that‌ will choose, based on time series characteristics, the‌ best anomaly detection method to run. This paper‌ describes ADecimo, a modular and extensible web application‌ that helps users understand the performance of time‌ series classification algorithms used as model selection methods‌ for time series anomaly detection. Overall, our system‌ enables users to compare 17 different classifiers over‌ 1980 time series, and decide on the most‌ suitable time series classification method for their own‌ time series and use cases.
Functional Description:
We‌ present here ADecimo, a modular and extensible web‌ application that helps users understand the performance of‌ time series classification algorithms used as model selection‌ methods for time series anomaly detection. Overall, our‌ system enables users to compare 17 different classifiers‌ over 1980 time series, and decide on the‌ most suitable time series classification method for their‌ own time series and use cases.
URL:
https://adecimots.streamlit.app/‌
Publication:
hal-04590326
Contact:
Paul Boniol
Participants:
Paul Boniol,‌ Emmanouil Sylligardos, 3 anonymous participants

6.1.5 MSAD

Name:‌
Model Selection for Anomaly Detection
Keywords:
Time Series,‌ Machine learning, Classification, Ensemble classifier, Python
Scientific Description:‌
Anomaly detection is a fundamental task for time-series‌ analytics with important implications for the downstream performance‌ of many applications. Despite increasing academic interest and‌ the large number of methods proposed in the‌ literature, recent benchmark and evaluation studies demonstrated that‌ no overall best anomaly detection methods exist when‌ applied to very heterogeneous time series datasets. Therefore,‌ the only scalable and viable solution to solve‌ anomaly detection over very different time series collected‌ from diverse domains is to propose a model‌ selection method that will select, based on time‌ series characteristics, the best anomaly detection method to‌ run. Existing AutoML solutions are, unfortunately, not directly‌ applicable to time series anomaly detection, and no‌ evaluation of time series-based approaches for model selection‌ exists. Towards that direction, this paper studies the‌ performance of time series classification methods used as‌ model selection for anomaly detection. Overall, we compare‌ 17 different classifiers over 1800 time series, and‌ we propose the first extensive experimental evaluation of‌ time series classification as model selection for anomaly‌ detection. Our results demonstrate that model selection methods‌ outperform every single anomaly detection method while being‌ in the same order of magnitude regarding execution‌ time. This evaluation is the first step to‌ demonstrate the accuracy and efficiency of time series‌ classification algorithms for anomaly detection, and represents a‌ strong baseline that can then be used to guide the model selection‌ step in general AutoML‌ pipelines.
Functional Description:
MSAD‌‌ proposes a pipeline for model selection based on‌ time series classification and‌ an extensive experimental evaluation‌‌ of existing classification algorithms for this new pipeline.‌ Our results demonstrate that‌ model selection methods outperform‌‌ every single anomaly detection method while being in‌ the same order of‌ magnitude regarding execution time.‌‌
News of the Year:

In 2025, we published‌ a new paper that‌ extended the model selection‌‌ pipeline, improving performance in Out-of-Distribution (OoD) settings.

Paper:‌ https://inria.hal.science/hal-05343228
URL:
https://github.com/boniolp/MSAD
Publication:‌
hal-05343228
Contact:
Paul Boniol‌‌
Participants:
Emmanouil Sylligardos, Paul Boniol, Pierre Senellart, 2‌ anonymous participants
Partners:
Ohio‌ State University, Université Paris-Descartes‌‌

6.1.6 apxproof

Keyword:
LaTeX
Functional Description:
apxproof is‌ a LaTeX package facilitating‌ the typesetting of research‌‌ articles with proofs in appendix, a common practice‌ in database theory and‌ theoretical computer science in‌‌ general. The appendix material is written in the‌ LaTeX code along with‌ the main text which‌‌ it naturally complements, and it is automatically deferred.‌ The package can automatically‌ send proofs to the‌‌ appendix, can repeat in the appendix the theorem‌ environments stated in the‌ main text, can section‌‌ the appendix automatically based on the sectioning of‌ the main text, and‌ supports a separate bibliography‌‌ for the appendix material.
Release Contributions:
Fix forward‌ linking when used in‌ conjunction with aliascnt (e.g.,‌‌ in Springer classes), Compatibility with recent versions of‌ acmart.cls
News of the‌ Year:
- Fix forward‌‌ linking when used in conjunction with aliascnt (e.g.,‌ in Springer classes) -‌ Compatibility with recent versions‌‌ of acmart.cls - Support for user-defined claimproof environments‌ - Remove forward linking‌ command from PDF bookmarks‌‌
URL:
https://github.com/PierreSenellart/apxproof
Contact:
Pierre Senellart
Participant:
Pierre Senellart‌

7 New results

7.1‌ Research axis 1: Foundations‌‌ of data management

Participants: Camille Bourgaux, Anatole‌ Dahan, Jean Robin‌, Lucas Larroque,‌‌ Arthur Lombardo, Michaël Thomazo, Luc Segoufin‌.

Knowledge representation and‌ knowledge bases

In 28‌‌, 27, we establish a correspondence between‌ (fragments of) ${𝒯ℰℒ}^{◯‌}$ , a temporal extension‌‌ of the $ℰℒ$ description logic with the LTL‌ operator $◯^{k}$ ,‌ and some specific kinds‌‌ of formal grammars, in particular, conjunctive grammars (context-free‌ grammars equipped with the‌ operation of intersection). This‌‌ connection implies that ${𝒯ℰℒ}^{◯}$ does not possess‌ the property of ultimate‌ periodicity of models, and‌‌ further leads to undecidability of query answering in‌ ${𝒯ℰℒ}^{◯}$ , closing‌ a question left open‌‌ since the introduction of ${𝒯ℰℒ}^{◯}$ . Moreover,‌ it also allows to‌ establish decidability of query‌‌ answering for some new interesting fragments of ${𝒯ℰℒ‌}^{◯}$ , and to‌ reuse for this purpose‌‌ existing tools and algorithms for conjunctive grammars.

Consistent‌ query answering

In 17‌, we consider the‌‌ dichotomy conjecture for consistent query answering under primary‌ key constraints. It states‌ that, for every fixed‌‌ Boolean conjunctive query q, testing whether q is‌ certain (i.e. whether it‌ evaluates to true over‌‌ all repairs of a‌ given inconsistent database) is either polynomial time or‌ coNP-complete. This conjecture has been verified for self-join-free‌ and path queries. We propose a simple inflationary‌ fixpoint algorithm for consistent query answering which, for‌ a given database, naively computes a set $Δ‌$ of subsets of facts of the database of‌ size at most k, where k is the‌ size of the query q. The algorithm runs‌ in polynomial time and can be formally defined‌ as: (1) Initialize $Δ$ with all sets $S‌$ of at most $k$ facts such that $S‌ ⊧ q$ . (2) Add any set $S‌$ of at most k facts to $Δ$ if‌ there exists a block $B$ (i.e., a maximal‌ set of facts sharing the same key) such‌ that for every fact $a \in B$ there‌ is a set ${S}^{'} \subseteq S \cup‌ {a}$ such that $S^{'} \in‌ Δ$ . For an input database $D$ ,‌ the algorithm answers "q is certain" iff $Δ‌$ eventually contains the empty set. The algorithm correctly‌ computes certainty when the query q falls in‌ the polynomial time cases of the known dichotomies‌ for self-join-free queries and path queries. For arbitrary‌ Boolean conjunctive queries, the algorithm is an under-approximation:‌ the query is guaranteed to be certain if‌ the algorithm claims so. However, there are polynomial‌ time certain queries (with self-joins) which are not‌ identified as such by the algorithm.

The Chase‌ and Existential Rules

29 The chase is a‌ fundamental algorithm with ubiquitous uses in database theory.‌ Given a database and a set of existential‌ rules (aka tuple-generating dependencies), it iteratively extends the‌ database to ensure that the rules are satisfied‌ in a most general way. This process may‌ not terminate, and a major problem is to‌ decide whether it does. This problem has been‌ studied for a large number of chase variants,‌ which differ by the conditions under which a‌ rule is applied to extend the database. Surprisingly,‌ the complexity of the universal termination of the‌ restricted (aka standard) chase is not fully understood.‌ We close this gap by placing universal restricted‌ chase termination in the analytical hierarchy. This higher‌ hardness is due to the fairness condition, and‌ we propose an alternative condition to reduce the‌ hardness of universal termination.

In 34, we‌ address one of the fundamental open questions in‌ the realm of existential rules: the conjecture on‌ the finite controllability of bounded derivation depth rule‌ sets ( $𝖻𝖽𝖽 \Rightarrow 𝖿𝖼$ ). We take‌ a step toward a positive resolution of this‌ conjecture by demonstrating that universal models generated by‌ bdd rule sets cannot contain arbitrarily large tournaments‌ (arbitrarily directed cliques) without entailing a loop query,‌ $\exists x E ( x, x)‌$ . This simple yet elegant result narrows the‌ space of potential counterexamples to the ( $𝖻𝖽𝖽‌ \Rightarrow 𝖿𝖼$ ) conjecture.

Other aspects of theoretical‌ computer science

Our research occasionally touches other aspects of theoretical computer science‌ not related to data‌ management.

In 31,‌‌ we introduce an extension of fixed-point logic (FP)‌ with a group-order operator‌ (ord), that computes the‌‌ size of a group generated by a definable‌ set of permutations. This‌ operation is a generalization‌‌ of the rank operator (rk). We show that‌ FP + ord constitutes‌ a new candidate logic‌‌ for the class of polynomial-time computable queries (P).‌ As was the case‌ for FP + rk,‌‌ the model-checking of FP + ord formulae is‌ polynomial-time computable. Moreover, the‌ query separating FP +‌‌ rk from P exhibited by Lichter in his‌ recent breakthrough is definable‌ in FP + ord.‌‌ Precisely, we show that FP + ord canonizes‌ structures with Abelian colors,‌ a class of structures‌‌ which contains Lichter's counter-example. This proof involves expressing‌ a fragment of the‌ group-theoretic approach to graph‌‌ canonization in the logic FP + ord.

7.2‌ Research axis 2: Uncertainty,‌ provenance, and explainability in‌‌ data management

Participants: Camille Bourgaux, Robin Jean‌, Pierre Senellart,‌ Aryak Sen.

Inconsistent‌‌ knowledge bases

Repair-based semantics have been extensively studied‌ as a means of‌ obtaining meaningful answers to‌‌ queries posed over inconsistent knowledge bases (KBs). While‌ several works have considered‌ how to exploit a‌‌ priority relation between facts to select optimal repairs,‌ the question of how‌ to specify such preferences‌‌ remains largely unaddressed. This motivates us in 23‌, 22 to introduce‌ a declarative rule-based framework‌‌ for specifying and computing a priority relation between‌ conflicting facts. As the‌ expressed preferences may contain‌‌ undesirable cycles, we consider the problem of determining‌ when a set of‌ preference rules always yields‌‌ an acyclic relation, and we also explore a‌ pragmatic approach that extracts‌ an acyclic relation by‌‌ applying various cycle removal techniques. Towards an end-to-end‌ system for querying inconsistent‌ KBs, we present a‌‌ preliminary implementation and experimental evaluation of the framework,‌ which employs answer set‌ programming to evaluate the‌‌ preference rules, apply the desired cycle resolution techniques‌ to obtain a priority‌ relation, and answer queries‌‌ under prioritized-repair semantics.

In 25, 24,‌ we explore the issue‌ of inconsistency handling in‌‌ DatalogMTL, an extension of Datalog with metric temporal‌ operators. Since facts are‌ associated with time intervals,‌‌ there are different manners to restore consistency when‌ they contradict the rules,‌ such as removing facts‌‌ or modifying their time intervals. Our first contribution‌ is the definition of‌ relevant notions of conflicts‌‌ (minimal explanations for inconsistency) and repairs (possible ways‌ of restoring consistency) for‌ this setting and the‌‌ study of the properties of these notions and‌ the associated inconsistency-tolerant semantics.‌ Our second contribution is‌‌ a data complexity analysis of the tasks of‌ generating a single conflict‌ / repair and query‌‌ entailment under repair-based semantics.

Provenance and probability management‌

Ensemble methods aggregate the‌ predictions of multiple models‌‌ by some form of weighted voting. In 33‌, we consider the‌ impact of the choice‌‌ of the assignment of‌ voting power to every individual model on the‌ performance of ensemble methods. We empirically and comparatively‌ evaluate the accuracy and running time of the‌ different power voting ensemble methods using standard classifiers‌ and mainstream classification benchmarks. The results show that‌ power ensemble voting outperforms the equal-power baseline, and‌ that unsupervised learning of the voting power can‌ be competitive with respect to supervised learning; within‌ supervised approaches, learning voting power through Shapley values‌ and regression outperforms simply using accuracy.

The Shapley‌ value provides a principled framework for attributing marginal‌ contributions to players in coalitional games. While its‌ axiomatic fairness guarantees have made it a cornerstone‌ of value distribution in economics and multi-agent systems,‌ recent computational advances have extended its applicability to‌ data-driven domains. 32 bridges game-theoretic foundations with probabilistic‌ reasoning by studying Shapley-like scores in stochastic environments.‌ We prove that the expected Shapley value (EShap)‌ – player's average impact in a game with‌ an independent probabilistic setting – coincides with the‌ Shapley value of the game whose utility is‌ the expected utility of the original game (ShapE).‌ This equality, however, fails for other power indices,‌ such as the Banzhaf index, underscoring the Shapley‌ value's specificity of consistency in uncertain settings. We‌ further identify that for a certain class of‌ coefficients (including normalized Banzhaf indices) the equality persists,‌ broadening the scope of reliable attribution mechanisms.

ProvSQL‌ is a PostgreSQL extension implementing provenance management and‌ probabilistic database features. ProvSQL seamlessly extends relational database‌ functionality to support the storage, tracking through derivations‌ and transformations, and querying of metadata that explain‌ and qualify the data and query results. In‌ 40, ProvSQL is used to implement a‌ content-based image retrieval system. A deep learning object‌ detection model identifies objects of selected classes located‌ within the images of a large-scale image data‌ set. The uncertainty associated with object detection is‌ recorded. ProvSQL's provenance model incorporates this uncertainty into‌ the retrieval process, thus facilitating the generation of‌ accurate and reliable results and allowing for decision-making‌ in scenarios with incomplete or uncertain information. The‌ demonstration illustrates how ProvSQL handles query processing, uncertainty‌ tracking, and probability computation. It highlights the utility‌ of a probabilistic database for applications dealing with‌ uncertain data, compared to traditional threshold-based approaches.

In‌ 39, we further enhance ProvSQL by enabling‌ provenance tracking for update operations (DELETE, INSERT, UPDATE).‌ We illustrate the practical utility of update provenance‌ by implementing a temporal database capable of standard‌ operations, including time travel (inspecting past database states),‌ history tracking (monitoring tuple states over time), and‌ undo (reversing previous updates). These features rely on‌ a provenance formalism based on the union-of-intervals m-semiring.‌ Additionally, we emphasize a key advantage of using‌ semiring-based provenance model: its generality allows the same‌ semiring structure to seamlessly support various applications, such‌ as probabilistic databases, by simply modifying the semiring‌ definition.

7.3 Research axis 3: Knowledge discovery at‌ scale

Participants: Paul Boniol, Felix Chavelli, Antoine Gauquier, Magali‌ Parrino, Pierre Senellart‌, Marijan Soric,‌‌ Emmanouil Sylligardos.

Mining time series

Recent advances‌ in data collection technology,‌ accompanied by the ever-rising‌‌ volume and velocity of streaming data, underscore the‌ vital need for time‌ series analytics. In this‌‌ regard, time-series anomaly detection has been an important‌ activity, entailing various applications‌ in fields such as‌‌ cyber security, financial markets, law enforcement, and health‌ care. While traditional literature‌ on anomaly detection is‌‌ centered on statistical measures, the increasing number of‌ machine learning algorithms in‌ recent years calls for‌‌ a structured, general characterization of the research methods‌ for time-series anomaly detection.‌ In 36, we‌‌ present a process-centric taxonomy for time-series anomaly detection‌ methods, systematically categorizing traditional‌ statistical approaches and contemporary‌‌ machine learning techniques. Beyond this taxonomy, we conduct‌ a meta-analysis of the‌ existing literature to identify‌‌ broad research trends. Given the absence of a‌ one-size-fits-all anomaly detector, we‌ also introduce emerging trends‌‌ for time-series anomaly detection. Furthermore, we review commonly‌ used evaluation measures and‌ benchmarks, followed by an‌‌ analysis of benchmark results to provide insights into‌ the impact of different‌ design choices on model‌‌ performance. Through these contributions, we aim to provide‌ a holistic perspective on‌ time-series anomaly detection and‌‌ highlight promising avenues for future investigation.

Anomaly detection‌ is a fundamental task‌ for time series analytics‌‌ with important implications for the downstream performance of‌ many applications. Despite increasing‌ academic interest and the‌‌ large number of methods proposed in the literature,‌ recent benchmarks and evaluation‌ studies demonstrated that no‌‌ overall best anomaly detection methods exist when applied‌ to very heterogeneous time‌ series datasets. Therefore, the‌‌ only scalable and viable solution to solve anomaly‌ detection over very different‌ time series collected from‌‌ diverse domains is to propose a model selection‌ method that will select,‌ based on time series‌‌ characteristics, the best anomaly detection methods to run.‌ Existing AutoML solutions are,‌ unfortunately, not directly applicable‌‌ to time series anomaly detection, and no evaluation‌ of time series-based approaches‌ for model selection exists.‌‌ Towards that direction, 19 studies the performance of‌ time series classification methods‌ used as model selection‌‌ for anomaly detection. In total, we evaluate 234‌ model configurations derived from‌ 16 base classifiers across‌‌ more than 1980 time series, and we propose‌ the first extensive experimental‌ evaluation of time series‌‌ classification as model selection for anomaly detection. Our‌ results demonstrate that model‌ selection methods outperform every‌‌ single anomaly detection method while being in the‌ same order of magnitude‌ regarding execution time. This‌‌ evaluation is the first step to demonstrate the‌ accuracy and efficiency of‌ time series classification algorithms‌‌ for anomaly detection, and represents a strong.

In‌ contrast to other domains‌ where AD mainly focuses‌‌ on point-based anomalies (i.e., outliers in standalone observations),‌ AD for time series‌ is also concerned with‌‌ range-based anomalies (i.e., outliers spanning multiple observations). Nevertheless,‌ it is common to‌ use traditional point-based information‌‌ retrieval measures, such as‌ Precision, Recall, and F-score, to assess the quality‌ of methods by thresholding the anomaly score to‌ mark each point as an anomaly or not.‌ However, mapping discrete labels into continuous data introduces‌ unavoidable shortcomings, complicating the evaluation of range-based anomalies.‌ Notably, the choice of evaluation measure may significantly‌ bias the experimental outcome. Despite over six decades‌ of attention, there has never been a large-scale‌ systematic quantitative and qualitative analysis of time-series AD‌ evaluation measures. 15 extensively evaluates quality measures for‌ time-series AD to assess their robustness under noise,‌ misalignments, and different anomaly cardinality ratios. Our results‌ indicate that measures producing quality values independently of‌ a threshold (i.e., AUC-ROC and AUC-PR) are more‌ suitable for time-series AD. Motivated by this observation,‌ we first extend the AUC-based measures to account‌ for range-based anomalies. Then, we introduce a new‌ family of parameter-free and threshold-independent measures, Volume Under‌ the Surface (VUS), to evaluate methods while varying‌ parameters. We also introduce two optimized implementations for‌ VUS that reduce significantly the execution time of‌ the initial implementation. Our findings demonstrate that our‌ four measures are significantly more robust in assessing‌ the quality of time-series AD methods.

Motif Discovery‌ involves identifying recurring patterns and locating their occurrences‌ within a time series without prior knowledge about‌ their shape or location. In practice, Motif Discovery‌ faces several data-related challenges, leading to various definitions‌ of the problem and multiple algorithms addressing these‌ challenges to different extents. However, there has been‌ no systematic evaluation and comparison of these diverse‌ approaches. Consequently, 18 presents a comprehensive literature review‌ covering data-related challenges, motif definitions, and algorithms. We‌ also analyze the strengths and limitations of algorithms‌ carefully chosen to represent the literature diversity. The‌ analysis is structured around key research questions identified‌ from our review. Our experimental findings provide practical‌ guidelines for selecting Motif Discovery algorithms suitable for‌ a given task and suggest directions for future‌ research.

Time series clustering poses a significant challenge‌ with diverse applications across domains. A prominent drawback‌ of existing solutions lies in their limited interpretability,‌ often confined to presenting users with centroids. In‌ addressing this gap, 16 presents k-Graph, an unsupervised‌ method explicitly crafted to augment interpretability in time‌ series clustering. Leveraging a graph representation of time‌ series subsequences, k-Graph constructs multiple graph representations based‌ on different subsequence lengths. This feature accommodates variable-length‌ time series without requiring users to predetermine subsequence‌ lengths. Our experimental results reveal that k-Graph outperforms‌ current state-of-the-art time series clustering algorithms in accuracy,‌ while providing users with meaningful explanations and interpretations‌ of the clustering outcomes.

Time series clustering is‌ important for identifying patterns in these datasets. However,‌ prevailing methods often encounter obstacles in maintaining data‌ relationships and ensuring interpretability. We present in 26‌ Graphint, an innovative system based on the $k‌$ -Graph methodology that addresses these challenges. Graphint integrates‌ a robust time series clustering algorithm with an‌ interactive tool for comparison and interpretation. More precisely, our system allows users‌ to compare results against‌ competing approaches, identify discriminative‌‌ subsequences within specified datasets, and visualize the critical‌ information utilized by $k‌$ -Graph to generate outputs.‌‌ Overall, Graphint offers a comprehensive solution for extracting‌ actionable insights from complex‌ temporal datasets.

Time series‌‌ segmentation is a fundamental task in analyzing temporal‌ data across various domains,‌ from human activity recognition‌‌ to energy monitoring. While numerous state-of-the-art methods have‌ been developed to tackle‌ this problem, the evaluation‌‌ of their performance remains critically limited. Existing measures‌ predominantly focus on change‌ point accuracy or rely‌‌ on point-based measures such as Adjusted Rand Index‌ (ARI), which fail to‌ capture the quality of‌‌ the detected segments, ignore the nature of errors,‌ and offer limited interpretability.‌ In 30, we‌‌ address these shortcomings by introducing two novel evaluation‌ measures: WARI (Weighted Adjusted‌ Rand Index), that accounts‌‌ for the position of segmentation errors, and SMS‌ (State Matching Score), a‌ fine-grained measure that identifies‌‌ and scores four fundamental types of segmentation errors‌ while allowing error-specific weighting.‌ We empirically validate WARI‌‌ and SMS on synthetic and real-world benchmarks, showing‌ that they not only‌ provide a more accurate‌‌ assessment of segmentation quality but also uncover insights,‌ such as error provenance‌ and type, that are‌‌ inaccessible with traditional measures.

In recent years, electricity‌ suppliers have installed millions‌ of smart meters worldwide‌‌ to improve the management of the smart grid‌ system. These meters collect‌ a large amount of‌‌ electrical consumption data to produce valuable information to‌ help consumers reduce their‌ electricity footprint. However, having‌‌ non-expert users (e.g., consumers or sales advisors) understand‌ these data and derive‌ usage patterns for different‌‌ appliances has become a significant challenge for electricity‌ suppliers because these data‌ record the aggregated behavior‌‌ of all appliances. At the same time, ground-truth‌ labels (which could train‌ appliance detection and localization‌‌ models) are expensive to collect and extremely scarce‌ in practice. 37 introduces‌ DeviceScope, an interactive tool‌‌ designed to facilitate understanding smart meter data by‌ detecting and localizing individual‌ appliance patterns within a‌‌ given time period. Our system is based on‌ CamAL (Class Activation Map-based‌ Appliance Localization), a novel‌‌ weakly supervised approach for appliance localization that only‌ requires the knowledge of‌ the existence of an‌‌ appliance in a household to be trained.

Improving‌ smart grid system management‌ is crucial in the‌‌ fight against climate change, and enabling consumers to‌ play an active role‌ in this effort is‌‌ a significant challenge for electricity suppliers. In this‌ regard, millions of smart‌ meters have been deployed‌‌ worldwide in the last decade, recording the main‌ electricity power consumed in‌ individual households. This data‌‌ produces valuable information that can help them reduce‌ their electricity footprint; nevertheless,‌ the collected signal aggregates‌‌ the consumption of the different appliances running simultaneously‌ in the house, making‌ it difficult to apprehend.‌‌ Non-Intrusive Load Monitoring (NILM) refers to the challenge‌ of estimating the power‌ consumption, pattern, or on/off‌‌ state activation of individual‌ appliances using the main smart meter signal. Recent‌ methods proposed to tackle this task are based‌ on a fully supervised deep-learning approach that requires‌ both the aggregate signal and the ground truth‌ of individual appliance power. However, such labels are‌ expensive to collect and extremely scarce in practice,‌ as they require conducting intrusive surveys in households‌ to monitor each appliance. In 38, we‌ introduce CamAL, a weakly supervised approach for appliance‌ pattern localization that only requires information on the‌ presence of an appliance in a household to‌ be trained. CamAL merges an ensemble of deep-learning‌ classifiers combined with an explainable classification method to‌ be able to localize appliance patterns. Our experimental‌ evaluation, conducted on 4 real-world datasets, demonstrates that‌ CamAL significantly outperforms existing weakly supervised baselines and‌ that current SotA fully supervised NILM approaches require‌ significantly more labels to reach CamAL performances.

Information‌ Extraction

35, which is situated within the‌ TheoremKB 41 project, presents TheoremView, a novel framework‌ for extracting proofs and theorems from raw PDF‌ scientific papers without requiring LaTeX source files. Our‌ approach combines three modalities (font, text, and vision)‌ with sequential modeling to capture long-term dependencies and‌ layout information. By eliminating OCR preprocessing, TheoremView reduces‌ computational overhead for real-time applications while providing robust‌ automated theorem extraction.

Graphs, and notably RDF graphs,‌ are a prominent way of sharing data. As‌ data usage democratizes, users need help figuring out‌ the useful content of a graph dataset. In‌ particular, journalists with whom we collaborate are interested‌ in identifying, in a graph, the connections between‌ entities, e.g., people, organizations, emails, etc. In 14‌, we present a novel method for exploring‌ data graphs through their data paths connecting Named‌ Entities (NEs, in short); each data path leads‌ to a tabular-looking set of results. NEs are‌ extracted from the data through dedicated Information Extraction‌ modules. Our method builds upon the pre-existing ConnectionLens‌ platform and follow-up work in the Abstra project,‌ which builds simple, visual ER-style summaries of semi-structured‌ data. The contribution of the present work, and‌ its novelty, is twofold. First, we propose a‌ novel analysis of entity-to-entity paths contained in datasets‌ of any nature, and propose a new method‌ for ranking paths, leveraging a novel Information Extraction‌ module we built on top of ChatGPT. Second,‌ we present an efficient approach to enumerate and‌ compute NE paths, based on an algorithm which‌ automatically recommends sub-paths to materialize, and rewrites the‌ path queries using these subpaths. Our experiments demonstrate‌ the interest of NE paths and the efficiency‌ of our method for computing and ranking them.‌

8 Bilateral contracts and grants with industry

8.1‌ Bilateral contracts with industry

Participants: Paul Boniol,‌ Magali Parrino, Pierre Senellart.

Magali Parrino‌ started her PhD in 2025, under a CIFRE‌ agreement between Valda (Paul Boniol and Pierre‌ Senellart ) and EDF (Chatou lab).

9 Partnerships‌ and cooperations

9.1 International initiatives

9.1.1 Participation in other International Programs

DesCartes‌
Participants: Pierre Senellart.‌
- Title:
  Intelligent Modelling for‌‌ Decision-making in Critical Urban Systems
- Partner Institution(s):
  CNRS@CREATE,‌ National University of Singapore‌
- Duration:
  2021–2026
- Additional info:‌‌
  DesCartes is a project managed by CNRS@CREATE, a‌ CNRS subsidiary in Singapore‌ and funded by Singapore’s‌‌ National Research Foundation, with 50 million total budget.‌ Pierre Senellart is involved‌ in the project as‌‌ one of the French PIs, and became in‌ 2025 Lead PI for‌ one of the workpackages.‌‌
International ANR project EQUUS
Participants: Luc Segoufin.‌
- Title:
  Efficient query answering‌ under updates
- Partner Institution(s):‌‌
  TU Ilmenau, Uni. Bayreuth, HU Berlin, CNRS
- Duration:‌
  2020–2025

9.2 International research‌ visitors

9.2.1 Visits of‌‌ international scientists

Other international visits to the team‌

Anton Gnatenko

Status:
PhD‌ students
Institution of origin:‌‌
Free University of Bozen–Bolzano
Country:
Italy
Dates:
December‌ 2024 to May 2025‌
Mobility program:
PhD research‌‌ visit

Amélie Marian

Status:
Professor
Institution of origin:‌
Rutgers University
Country:
USA‌
Dates:
March 2026 to‌‌ April 2026
Mobility program:
ENS Visiting Professor

Victor‌ Vianu

Status:
Professor
Institution‌ of origin:
UC San‌‌ Diego
Country:
USA
Dates:
June 2025 to January‌ 2026
Mobility program:
Sabbatical‌

9.2.2 Visits to international‌‌ teams

Research stays abroad

Pierre Senellart was an‌ invited participant to the‌ Logic and Algorithms in‌‌ DB Theory and AI Reunion seminar at UC‌ Berkeley, CA, USA (January‌ 2025)
Camille Bourgaux was‌‌ an invited participant to the Semirings in Databases,‌ Automata, and Logic seminar‌ in Dagstuhl, Germany (February‌‌ 2025)

9.3 National initiatives

9.3.1 ANR

PRC EXPAND‌ (coordinator)
Participants: Michael Thomazo‌, Camille Bourgaux.‌‌
- Title:
  Expanding the reach of ontology-based data access:‌ EXpressivity, exPlanation, and Algorithms‌
- Partner Institution(s):
  LIRMM, LaBRI,‌‌ LIMOS, Inria Lille (SPIRALS & D-DAL), IRISA
- Duration:‌
  2025–2030
- Budget for Valda:‌
  55 k€ (Inria budget)‌‌
PR[AI]RIE-PSAI AI Cluster
Participants: Pierre Senellart, Camille‌ Bourgaux.
- Title:
  Paris‌ Artificial Intelligence Research Institute‌‌ – Paris School of AI
- Duration:
  2025–2029
- Funding‌ for Valda:
  575 k€‌ (ENS budget)
Megyn's Bienvenu‌‌ INTENDED Chair in Artificial Intelligence
Participants: Camille Bourgaux‌.
- Title:
  Intelligent handling‌ of imperfect data
- Partner‌‌ Institution(s):
  LaBRI
- Duration:
  2020–2026

9.3.2 Others

France 2030‌ i-Demo Cyberté project
Participants:‌ Paul Boniol.
- Partner‌‌ institution(s):
  Scality, Inria Rennes (CIDRE)
- Duration:
  2025–2030
- Funding‌ for Valda:
  499 k€‌ (Inria budget)
CNRS MITI‌‌ nanoNet project
Participants: Paul Boniol.
- Title:
  Méthodologie‌ avancée pour la détection‌ des nanoparticules dans les‌‌ séries temporelles spICP-ToF-MS
- Partner Institution(s):
  IPGP
- Duration:
  2025–2026‌

10 Dissemination

10.1 Promoting‌ scientific activities

10.1.1 Scientific‌‌ events: organisation

General chair, scientific chair

Paul Boniol‌ , IEEE BigData 2025,‌ Chair of the Industrial‌‌ & Government track
Paul Boniol , International Workshop‌ on Multivariate Time Series‌ Analytics (MulTiSA) 2025, Panel‌‌ Chair

Member of the organizing committees

Camille Bourgaux‌ , member of the‌ DL steering committee
Camille‌‌ Bourgaux , co-responsible for the MaDICS/RADIA RECAST working‌ group(organization of a thematic‌ day in November, and‌‌ two sessions of the GDR MaDICS symposium in‌ May)
Luc Segoufin ,‌ member of the STACS‌‌ steering committee
Pierre Senellart‌ , editorial board of the LIPIcs series of‌ conference proceedings

10.1.2 Scientific events: selection

Member of‌ the conference program committees

Paul Boniol , VLDB‌ 2025, EDBT 2025, ICDE 2025 (Industry & applications‌ track), Multisa Workshop of ICDE 2025, BDA 2025,‌ BERT2S Workshop of NeurIPS 2025
Camille Bourgaux ,‌ IJCAI 2025, KR 2025, DL 2025
Antoine Gauquier‌ , WASP 2025
Pierre Senellart , SIGMOD 2026,‌ Provenance Week 2025, SDProc 2025
Michael Thomazo ,‌ KR 2025, RuleML+RR 2025, IJCAI 2025

10.1.3 Journal‌

Member of the editorial boards

Luc Segoufin ,‌ associate editor, ACM Transactions on Computational Logic
Victor‌ Vianu , editor, Database Theory Column, SIGACT News‌

Reviewer - reviewing activities

Pierre Senellart , review‌ for Transactions on Graph Data and Knowledge

10.1.4‌ Invited talks

Paul Boniol , Anomaly Detection in‌ Time Series: Overview and New Trends, Invited‌ speaker at Orange Innovation
Paul Boniol , An‌ introduction to Time series anomaly detection (a data-driven‌ perspective), Invited speaker for SIDOS at EGC‌ 2025, Strasbourg, France (January 2025)
Pierre Senellart ,‌ Quels horizons de pratique pour la recherche en‌ IA?, Invited speaker at Printemps Couperin, Paris,‌ France (March 2025)
Silviu Maniu (Univ. Grenoble–Alpes) &‌ Pierre Senellart , Making Provenance and Probabilistic Database‌ Theory Work in Practice, Invited talk at‌ ICDT 2025 (Database Theory in Practice), Barcelona, Spain(March‌ 2025) 21
Pierre Senellart , Qualitative Evaluation of‌ Academic Careers in Computer Science at CNRS. Global‌ Forum on Development of Computer Science, Tsinghua University,‌ Beijing, Chine, Invited keynote speaker at the‌ Global Forum on Development of Computer Science of‌ Tsinghua University, Beijing, China (April 2025) 44
Pierre‌ Senellart , Artificial Intelligence. A Personal View.‌ Invited speaker at INSP Days, Paris, France (July‌ 2025)
Pierre Senellart , Les BD pourront-elles sauver‌ l'IA?, Panel participant, BDA 2025, Toulouse, France‌ (October 2025)
Pierre Senellart , Intelligence artificielle: Concepts,‌ modèles et enjeux, Invited speaker at Séminaire‌ scientifique et technique de l'Inrap, Chartres, France (November‌ 2025)
Paul Boniol , Time Series Anomaly Detection:‌ The Road to Automatic Solutions, Invited Speaker‌ at the 3rd Macau Symposium on Data Science,‌ Macau SAR, China (December 2025)

10.1.5 Leadership within‌ the scientific community

Serge Abiteboul is a member‌ of the French Academy of Sciences, of the‌ Academia Europaea, and an ACM Fellow.
Pierre Senellart‌ was until August 2025 is a junior member‌ of the Institut Universitaire de France.

10.1.6 Research‌ administration

Serge Abiteboul is a member of the‌ scientific committe of the Programme Inria Quadrant (PIQ).‌
Antoine Gauquier is an elected member of the‌ Conseil d'Administration of ENS-PSL
Antoine Gauquier is an‌ elected member of the DIENS lab council
Luc‌ Segoufin is a member of the Formation Spécialisée‌ de Site (FSS) of the Inria Paris research‌ centre.
Pierre Senellart is Vice-President of PSL University‌ in charge of Digital infrastructure and IT convergence.‌ 48
Pierre Senellart was until August 2025 the‌ president of section 6 of the National Committee for Scientific Research. 43‌ As a representative of‌ CoNRS, Pierre Senellart was‌‌ in the Hcéres evaluation committee of the IRIT‌ research unit, and president‌ of the evaluation committee‌‌ of the LIRMM research unit.
Pierre Senellart was‌ until August 2025 a‌ member of the board‌‌ of the conference of presidents of the national‌ committee (CPCN) and as‌ such a member of‌‌ the coordination of managing parties of the national‌ committee (C3N).
Pierre Senellart‌ is deputy director of‌‌ the DI ENS laboratory, joint between ENS, CNRS,‌ and Inria.
Pierre Senellart‌ is the scientific resource‌‌ person for Scientific information & edition of the‌ Inria Paris centre.
Pierre‌ Senellart is the vice-president‌‌ of the Gilles Kahn PhD award of Société‌ Informatique de France.
Pierre‌ Senellart is a member‌‌ of the strategic orientation committee of ISIMA.‌
Michael Thomazo is a‌ deputy director of the‌‌ École Doctorale Sciences Mathématiques de Paris-Centre (ED386)
We‌ participated in the following‌ hiring committee within universities:‌‌
- Camille Bourgaux , Maître de conférences, ENSEIRB-MATMECA-Bordeaux INP‌

10.2 Teaching - Supervision‌ - Juries - Educational‌‌ and pedagogical outreach

Licence: The Art of Computer‌ Programming, L1, International Bachelor‌ of Science in Artificial‌‌ Intelligence, PSL – Pierre Senellart
Licence: Algorithms, L1,‌ CPES, PSL – Antoine‌ Gauquier
Licence: Differential calculus,‌‌ L2, CPES, PSL – Antoine Gauquier
Licence: Formal‌ Languages, Computability, Complexity, L3,‌ ENS – Michael Thomazo‌‌ , Lucas Larroque
Licence: Databases, L3, ENS –‌ Pierre Senellart , Paul‌ Boniol , Lucas Larroque‌‌
Licence: Practical Computing, L3, École normale supérieure –‌ Pierre Senellart
Master: Logiques‌ de description, M1, DCI‌‌ – Camille Bourgaux
Master: Data acquisition, extraction, and‌ storage, M2, IASD –‌ Pierre Senellart
Master: Knowledge‌‌ graphs, description logics, and reasoning on data, M2,‌ IASD – Michael Thomazo‌
Master: NoSQL databases, M2,‌‌ IASD – Paul Boniol
Professional training: Web Security,‌ PESTO (Corps des Mines‌ professional training) – Pierre‌‌ Senellart

As a professor at ENS, Pierre Senellart‌ held various teaching responsibilities‌ (M2 administration, entrance competition)‌‌ at ENS. Pierre Senellart is the academic director‌ of the graduate program‌ in Computer Science of‌‌ PSL.

As an adjunct professor at PSL, Michaël‌ Thomazo is in charge‌ of PhD committees within‌‌ DI ENS and deputy director of the École‌ doctorale.

We also gave‌ invited courses in summer‌‌ schools:

Camille Bourgaux , Inconsistency-Tolerant Semantics Based on‌ (Preferred) Repairs, 21st‌ Reasoning Web Summer School‌‌ (RW 2025) – Istanbul, Turkey 20
Paul Boniol‌ , Time Series Anomaly‌ Detection, Summer school‌‌ on Artificial Intelligence for Aerospace – GSSI, L’Aquila,‌ Italy
Paul Boniol ,‌ Time Series Anomaly Detection:‌‌ Foundations and Practice, TwinODIS 1st Summer School‌ – FORTH-ICS, Heraklion, Greece‌

Most permanent members of‌‌ the group are also involved in tutoring ENS‌ students, advising them on‌ their curriculum, their internships,‌‌ etc. They are also occasionally involved with reviewing‌ internship reports, supervising student‌ projects, etc.

10.2.1 Supervision‌‌

PhD defended: Anatole Dahan, The Role of Permutation‌ Groups in the Search‌ for a Logic for‌‌ Polynomial Time, 2020–2025,‌ Arnaud Durand (Université Paris-Cité) & Luc Segoufin 42‌
PhD in progress: Antoine Gauquier , Intelligent construction‌ of a multimodal and heterogeneous data warehouse, with‌ data traceability, started in September 2023, Pierre‌ Senellart & Ioana Manolescu (Inria Cedar)
PhD in‌ progress: Lucas Larroque,Extension of rewriting procedures for‌ reasoning using existential rules, started in September‌ 2023, Michaël Thomazo
PhD in progress: Robin Jean‌ , Integration of preferences and domain knowledge in‌ inconsistency-tolerant ontology-based data access, started in October‌ 2023, Meghyn Bienvenu (CNRS LaBRI) & Camille Bourgaux‌
PhD in progress: Aryak Sen , Scalability of‌ a data provenance and probability management system,‌ started in February 2024, Silviu Maniu (Université Grenoble‌ Alpes) & Pierre Senellart
PhD in progress: Emmanouil‌ Sylligardos , Accuracy and execution time trade-off in‌ ensembling and model selection for time series analytics‌, started in February 2024, Paul Boniol PierreSenellart‌
PhD in progress: Felix Chavelli , Graph representations‌ for multivariate time series analytics, started in‌ October 2024, Paul Boniol & Michaël Thomazo
PhD‌ in progress: Pratik Karmakar , Quality, uncertainty, and‌ lineage of data, Stéphane Bressan (NUS, deceased),‌ Tan Kian-Lee (NUS), & Pierre Senellart (as he‌ is based in Singapore, he is not considered‌ a Valda member)
PhD in progress: Marijan Soric‌ , Exploitation et structuration des données et des‌ connaissances géologiques hétérogènes, started in March 2025,‌ Pierre Senellart , Ioana Manolescu (Inria Cedar), &‌ Cécile Gracianne (BRGM)
PhD in progress: Magali Parrino‌ , Détection non-supervisée d’anomalies dans des flux continus‌ de séries temporelles multivariées, started in July‌ 2025, Paul Boniol , Emmanuel Remy (EDF), &‌ Pierre Senellart
PhD in progress: Arthur Lombardo; started‌ in October 2025, Pierre Senellart , Antoine Amarilli‌ (Inria D-DAL) & Mikaël Monet (Inria D-DAL) (as‌ he is based in Lille, he is considered‌ a D-DAL member)
Master's internship: Arushi Goyal; Pierre‌ Senellart
Master's internship: Marijan Soric; Pierre Senellart and‌ Ioana Manolescu (Inria Cedar) 45
M1 research project:‌ Jeanne Coschieri; Michael Thomazo & David Carral (Inria‌ Boreal)
M1 research project: Paul Raphaël; Michael Thomazo‌ & Lucas Larroque

10.2.2 Juries

PhD: François Amat‌ [reviewer], Institut polytechnique de Paris, Pierre Senellart

10.3‌ Popularization

10.3.1 Specific official responsibilities in science outreach‌ structures

Serge Abiteboul , President of the scientific‌ steering committee of ANR
Serge Abiteboul , President‌ of the AFNIC Foundation
Pierre Senellart is a‌ scientific expert advising the Scientific and Ethical Committee‌ of Parcoursup and MonMaster, the platforms for the‌ selection of higher-education students at the first-year level‌ and the Master's level. As such, he contributed‌ to the 7th yearly report of the committee‌ to the French parliament

10.3.2 Productions (articles, videos,‌ podcasts, serious games, ...)

Serge Abiteboul , editor‌ of the binaire blog, which moved from the‌ blog platform of Le Monde to that of‌ La Recherche
Serge Abiteboul , codirector of the‌ Parlez-moi d'IA podcast on Cause commune
Serge Abiteboul‌ , co-author of articles on theatre and computer science 46, 47‌

10.3.3 Participation in Live‌ events

Serge Abiteboul ,‌‌ co-organizer with French Senator Ghislaine Senée of a‌ Colloquium at the Senate‌ : Les données au‌‌ service des territoires intelligents
Serge Abiteboul , co-organizer‌ with Isabelle Hilali from‌ Datacraft ot eh conference:‌‌ Quantum & Intelligence artificielle : vers une convergence‌ des ruptures technologiques ?‌

11 Scientific production

11.1‌‌ Major publications

1 articleM.Michael Benedikt,‌ P.Pierre Bourhis,‌ G.Georg Gottlob and‌‌ P.Pierre Senellart. Monadic Datalog, Tree Validity,‌ and Limited Access Containment‌.ACM Transactions on‌‌ Computational Logic2112020, 6:1-6:45HAL‌DOI
2 inproceedingsM.‌Meghyn Bienvenu, Q.‌‌Quentin Manière and M.Michaël Thomazo. Answering‌ Counting Queries over DL-Lite‌ Ontologies.IJCAI 2020‌‌ - Twenty-Ninth International Joint Conference on Artificial Intelligence‌Proceedings of the Twenty-Ninth‌ International Joint Conference on‌‌ Artificial Intelligence, IJCAI 2020.Reportée de juillet 2020‌ à janvier 2021 en‌ raison de la COVID‌‌Yokohama, JapanJuly 2020HAL
3 inproceedingsP.‌Paul Boniol, E.‌Emmanouil Sylligardos, J.‌‌John Paparrizos, P.Panos Trahanias and T.‌Themis Palpanas. ADecimo:‌ Model Selection for Time‌‌ Series Anomaly Detection.ICDE 2024 - IEEE‌ 40th International Conference on‌ Data EngineeringUtrecht, Netherlands‌‌May 2024HAL
4 inproceedingsC.Camille Bourgaux‌, P.Pierre Bourhis‌, L.Liat Peterfreund‌‌ and M.Michaël Thomazo. Revisiting Semiring Provenance‌ for Datalog.KR‌ 2022 - 19th International‌‌ Conference on Principles of Knowledge Representation and Reasoning‌Proceedings of the 19th‌ International Conference on Principles‌‌ of Knowledge Representation and ReasoningHaifa, IsraelJuly‌ 2022, 91–101HAL‌DOI
5 inproceedingsC.‌‌Camille Bourgaux, D.David Carral, M.‌Markus Krötzsch, S.‌Sebastian Rudolph and M.‌‌Michaël Thomazo. Capturing Homomorphism-Closed Decidable Queries with‌ Existential Rules.KR‌ 2021 - 18th International‌‌ Conference on Principles of Knowledge Representation and Reasoning‌Virtual, VietnamNovember 2021‌, 141--150HAL
6‌‌ inproceedingsM.Maxime Buron, M.-L.Marie-Laure Mugnier‌ and M.Michaël Thomazo‌. Parallelisable Existential Rules:‌‌ a Story of Pieces.KR 2021 -‌ 18th International Conference on‌ Principles of Knowledge Representation‌‌ and ReasoningVirtual, VietnamNovember 2021HAL
7‌ inproceedingsN.Nofar Carmeli‌ and L.Luc Segoufin‌‌. Conjunctive Queries With Self-Joins, Towards a Fine-Grained‌ Complexity Analysis.PODS'23‌Seattle, United StatesJune‌‌ 2023HAL
8 inproceedingsM.Marco Console,‌ P.Paolo Guagliardo,‌ L.Leonid Libkin and‌‌ E.Etienne Toussaint. Coping with Incomplete Data:‌ Recent Advances.SIGMOD/PODS‌ 2020 - International Conference‌‌ on Management of DataPortland / Virtual, United‌ StatesACMJune 2020‌, 33-47HAL DOI‌‌
9 articleN.Nathan Grosshans, P.Pierre‌ Mckenzie and L.Luc‌ Segoufin. Tameness and‌‌ the power of programs over monoids in DA‌.Logical Methods in‌ Computer Science183‌‌August 2022, 14:1–14:34HAL DOI
10 article‌P.Pratik Karmakar,‌ M.Mikaël Monet,‌‌ P.Pierre Senellart and‌ S.Stéphane Bressan. Expected Shapley-Like Scores of‌ Boolean Functions: Complexity and Applications to Probabilistic Databases‌.Proceedings of the ACM on Management of‌ Data22 (PODS)January 2024HAL DOI‌
11 articleN.Nicole Schweikardt, L.Luc‌ Segoufin and A.Alexandre Vigny. Enumeration for‌ FO Queries over Nowhere Dense Graphs.Journal‌ of the ACM (JACM)693June 2022‌, 1-37HAL DOI
12 articleP.Pierre‌ Senellart, L.Louis Jachiet, S.Silviu‌ Maniu and Y.Yann Ramusat. ProvSQL: Provenance‌ and Probability Management in PostgreSQL.Proceedings of‌ the VLDB Endowment (PVLDB)1112August 2018‌, 2034-2037HAL DOI
13 articleE.Etienne‌ Toussaint, P.Paolo Guagliardo, L.Leonid‌ Libkin and J.Juan Sequeda. Troubles with‌ nulls, views from the users.Proceedings of‌ the VLDB Endowment (PVLDB)1511July 2022‌, 2613-2625HAL DOI

11.2 Publications of the‌ year

International journals

14 articleN.Nelly Barret‌, A.Antoine Gauquier, J.-J.Jia-Jean Law‌ and I.Ioana Manolescu. Finding meaningful paths‌ in heterogeneous graphs with PathWays.Information Systems‌127January 2025, 102463HAL DOI back‌ to text
15 articleP.Paul Boniol,‌ A. K.Ashwin K. Krishna, M.Marine‌ Bruel, Q.Qinghua Liu, M.Mingyi‌ Huang, T.Themis Palpanas, R. S.‌Ruey S. Tsay, A.Aaron Elmore,‌ M. J.Michael J. Franklin and J.John‌ Paparrizos. VUS: Effective and Efficient Accuracy Measures‌ for Time-Series Anomaly Detection.The VLDB Journal‌3432February 2025HAL DOI back to‌ text
16 articleP.Paul Boniol, D.‌Donato Tiano, A.Angela Bonifati and T.‌Themis Palpanas. $k$ -Graph: A Graph Embedding‌ for Interpretable Time Series Clustering.IEEE Transactions‌ on Knowledge and Data Engineering2025, 1-14‌HAL DOI back to text
17 articleD.‌Diego Figueira, A.Anantha Padmanabha, L.‌Luc Segoufin and C.Cristina Sirangelo. A‌ Simple Algorithm for Consistent Query Answering under Primary‌ Keys.Logical Methods in Computer Science21‌1February 2025HALDOI back to text‌
18 articleV.Valerio Guerrini, T.Thibaut‌ Germain, C.Charles Truong, L.Laurent‌ Oudre and P.Paul Boniol. Time Series‌ Motif Discovery: A Comprehensive Evaluation.Proceedings of‌ the VLDB Endowment (PVLDB)187August 2025‌, 2226 - 2239HAL DOI back to‌ text
19 articleE.Emmanouil Sylligardos, J.‌John Paparrizos, T.Themis Palpanas, P.‌Pierre Senellart and P.Paul Boniol. MSAD:‌ A Deep Dive into Model Selection for Time‌ series Anomaly Detection.The VLDB Journal34‌6October 2025, 72HAL DOI back‌ to text

Invited conferences

20 inproceedingsC.Camille‌ Bourgaux. Inconsistency-Tolerant Semantics Based on (Preferred) Repairs‌.RW 2025 - 21st Reasoning Web International‌ Summer SchoolReasoning Web Summer School (RW 2025)Istanbul, TurkeySeptember 2025‌HAL DOI back to‌ text
21 inproceedingsS.‌‌Silviu Maniu and P.Pierre Senellart. Database‌ Theory in Action: Making‌ Provenance and Probabilistic Database‌‌ Theory Work in Practice (Invited Talk).ICDT‌ 2025 - International Conference‌ on Database TheoryBarcelona,‌‌ SpainMarch 2025HALDOI back to text‌

International peer-reviewed conferences

22‌ inproceedingsM.Meghyn Bienvenu‌‌, C.Camille Bourgaux, K.Katsumi Inoue‌ and R.Robin Jean‌. A Rule-Based Approach‌‌ to Specifying Preferences over Conflicting Facts and Querying‌ Inconsistent Knowledge Bases (Extended‌ Abstract).Proceedings of‌‌ the 38th International Workshop on Description Logics (DL‌ 2025)DL 2025 -‌ 38th International Workshop on‌‌ Description LogicsOpole, PolandSeptember 2025HAL back‌ to text
23 inproceedings‌M.Meghyn Bienvenu,‌‌ C.Camille Bourgaux, K.Katsumi Inoue and‌ R.Robin Jean.‌ A Rule-Based Approach to‌‌ Specifying Preferences over Conflicting Facts and Querying Inconsistent‌ Knowledge Bases.Proceedings‌ of the 22nd International‌‌ Conference on Principles of Knowledge Representation and Reasoning‌ (KR 2025)KR 2025‌ - 22nd International Conference‌‌ on Principles of Knowledge Representation and ReasoningMelbourne,‌ AustraliaNovember 2025HAL‌back to text
24‌‌ inproceedingsM.Meghyn Bienvenu, C.Camille Bourgaux‌ and A.Atefe Khodadaditaghanaki‌. Inconsistency Handling in‌‌ DatalogMTL (Extended Abstract).Proceedings of the 38th‌ International Workshop on Description‌ Logics (DL 2025)DL‌‌ 2025 - 38th International Workshop on Description Logics‌Opole, PolandSeptember 2025‌HAL back to text‌‌
25 inproceedingsM.Meghyn Bienvenu, C.Camille‌ Bourgaux and A.Atefe‌ Khodadaditaghanaki. Inconsistency Handling‌‌ in DatalogMTL.IJCAI 2025 - Thirty-Fourth International‌ Joint Conference on Artificial‌ IntelligenceMontreal, CanadaInternational‌‌ Joint Conferences on Artificial Intelligence OrganizationAugust 2025‌HAL DOI back to‌ text
26 inproceedingsP.‌‌Paul Boniol, D.Donato Tiano, A.‌Angela Bonifati and T.‌Themis Palpanas. Graphint:‌‌ Graph-based Time Series Clustering Visualisation Tool.ICDE‌ 2025 - IEEE 41th‌ International Conference on Data‌‌ EngineeringHong Kong, Hong Kong SAR ChinaMarch‌ 2025HAL back to‌ text
27 inproceedingsC.‌‌Camille Bourgaux, A.Anton Gnatenko and M.‌Michaël Thomazo. Analysing‌ Temporal Reasoning in Description‌‌ Logics Using Formal Grammars (Extended Abstract).Proceedings‌ of the 38th International‌ Workshop on Description Logics‌‌ (DL 2025)DL 2025 - 38th International Workshop‌ on Description LogicsOpole,‌ PolandSeptember 2025HAL‌‌back to text back to text
28 inproceedings‌C.Camille Bourgaux,‌ A.Anton Gnatenko and‌‌ M.Michaël Thomazo. Analysing Temporal Reasoning in‌ Description Logics Using Formal‌ Grammars.Proceedings of‌‌ the 28th European Conference on Artificial IntelligenceECAI'25‌ -- 28th European Conference‌ on Artificial IntelligenceBologna,‌‌ ItalyOctober 2025HALback to text back‌ to text
29 inproceedings‌D.David Carral,‌‌ L.Lukas Gerlach, L.Lucas Larroque and‌ M.Michaël Thomazo.‌ Restricted Chase Termination: You‌‌ Want More than Fairness.ACM digital library‌PODS 2025 - ACM‌ SIGMOD/PODS International Conference on‌‌ Management of Data3‌Proceedings of the ACM on management of data‌2Berlin, GermanyJune 2025, 1-17HAL‌DOI back to text
30 inproceedingsF.Félix‌ Chavelli, P.Paul Boniol and M.Michaël‌ Thomazo. Toward Interpretable Evaluation Measures for Time‌ Series Segmentation.NeurIPS 2025 - 39th Annual‌ Conference on Neural Information Processing SystemsSan Diego,‌ United StatesarXiv2025HAL back to text‌
31 inproceedingsA.Anatole Dahan. Group Order‌ Logic.LICS 2025 - Logic in Computer‌ ScienceSingapore, SingaporearXivMay 2025HAL DOI‌back to text
32 inproceedingsP.Pratik Karmakar‌, A.Antoine Gauquier and P.Pierre Senellart‌. Expected Shapley Value is Shapley Value for‌ Expected Utility Game.ECSQARU 2025 - 18th‌ European Conference on Symbolic and Quantitative Approaches to‌ Reasoning with UncertaintyHagen, GermanySeptember 2025HAL‌back to text
33 inproceedingsP.Pratik Karmakar‌, A.Angelo Saadeh, P.Pierre Senellart‌ and S.Stéphane Bressan. Discovering Voting Power‌ for Ensemble Methods.DEXA 2025 - International‌ Conference on Database and Expert Systems ApplicationsBangkok,‌ ThailandAugust 2025HALback to text
34‌ inproceedingsL.Lucas Larroque, P.Piotr Ostropolski-Nalewaja‌ and M.Michaël Thomazo. No Cliques Allowed:‌ The Next Step Towards BDD/FC Conjecture.Proceedings‌ of the ACM on Management of DataPODS‌ 2025 - ACM SIGMOD/PODS International Conference on Management‌ of Data3Berlin, GermanyJune 2025,‌ 1 - 20HALDOI back to text‌
35 inproceedingsS.Shrey Mishra, N.Neil‌ Sharma, A.Antoine Gauquier and P.Pierre‌ Senellart. TheoremView: A Framework for Extracting Theorem-Like‌ Environments from Raw PDFs.ECIR 2025 -‌ European Conference on Information RetrievalLucca, ItalySpringer‌April 2025, 6HAL DOI back to‌ text
36 inproceedingsJ.John Paparrizos, P.‌Paul Boniol, Q.Qinghua Liu and T.‌Themis Palpanas. Advances in Time-Series Anomaly Detection:‌ Algorithms, Benchmarks, and Evaluation Measures.KDD '25:‌ The 31st ACM SIGKDD Conference on Knowledge Discovery‌ and Data MiningToronto (ON), CanadaACMAugust‌ 2025, 6151 - 6161HAL DOI back‌ to text
37 inproceedingsA.Adrien Petralia,‌ P.Paul Boniol, P.Philippe Charpentier and‌ T.Themis Palpanas. DeviceScope: An Interactive App‌ to Detect and Localize Appliance Patterns in Electricity‌ Consumption Time Series.ICDE 2025 - IEEE‌ 41th International Conference on Data EngineeringHong Kong,‌ Hong Kong SAR ChinaMay 2025HAL back‌ to text
38 inproceedingsA.Adrien Petralia,‌ P.Paul Boniol, P.Philippe Charpentier and‌ T.Themis Palpanas. Few Labels are all‌ you need: A Weakly Supervised Framework for Appliance‌ Localization in Smart-Meter Series.ICDE 2025 -‌ IEEE 41th International Conference on Data EngineeringHong‌ Kong, Hong Kong SAR ChinaMay 2025HAL‌back to text
39 inproceedingsA. A.Albert‌ Ariel Widiaatmaja, B.Belkis Djeffal, A.‌Ashish Dandekar and P.Pierre Senellart. Demonstration of ProvSQL Update Provenance‌ through Temporal Databases.‌PW25 - ProvenanceWeekBerlin,‌‌ GermanyJune 2025HALDOI back to text‌
40 inproceedingsF.Fajrian‌ Yunus, P.Pratik‌‌ Karmakar, P.Pierre Senellart, T.Talel‌ Abdessalem and S.Stéphane‌ Bressan. Using a‌‌ Probabilistic Database in an Image Retrieval Application.‌EDBT 2025 - 28th‌ International Conference on Extending‌‌ Database TechnologyBarcelona, SpainMarch 2025HAL back‌ to text

National peer-reviewed‌ Conferences

41 inproceedingsS.‌‌Shrey Mishra, A.Antoine Gauquier and P.‌Pierre Senellart. Apprentissage‌ multimodal modulaire pour l'extraction‌‌ de théorèmes et de preuves dans des documents‌ scientifiques longs.Revue‌ des Nouvelles Technologies de‌‌ l'InformationExtraction et Gestion des Connaissances, EGC'2025Strasbourg,‌ FranceJanuary 2025HAL‌back to text

Doctoral‌‌ dissertations and habilitation theses

42 thesisA.Anatole‌ Dahan. The Role‌ of Permutation Groups in‌‌ the Search for a Logic for Polynomial Time‌.Université Paris Cité‌ (UPC)July 2025HAL‌‌back to text

Reports & preprints

43 report‌N.Nathalie Appel,‌ J.Jérémie Bourdon,‌‌ N.Nicolas Bousquet, J.Johanne Cohen,‌ A.Antoine Genitrini,‌ P.Patricia Georgeon,‌‌ Y.Yves Grandvalet, K.Katia Jaffrès-Runser,‌ A.Arnaud Legrand,‌ D.Damian Markham,‌‌ A.Anca Muscholl, A.Anastasia Paparrizou,‌ L.Loïc Paulevé,‌ M.Michael Poss,‌‌ M.Maria Gradinariu Potop-Butucaru, J.-F.Jean-Florent Raymond‌, R.Romain Rouvoy‌, Y.Yolande Sallent‌‌, P.Pierre Senellart, T.Thomas Seiller‌, Y.-Q.Ye-Qiong Song‌, A.Alain Tchana‌‌ and H.Hélène Waeselynck. Section 06 Sciences‌ de l’information : fondements‌ de l’informatique, calculs, algorithmes,‌‌ représentations, exploitations: Rapport de conjoncture 2024.CNRS‌2025, 1-20HAL‌back to text

Other‌‌ scientific publications

44 articleP.Pierre Senellart.‌ Qualitative Evaluation of Academic‌ Researchers in Computer Science:‌‌ Practices and Reflections from CNRS.Computer Education‌123722025,‌ 36-41HAL DOI back‌‌ to text
45 thesisM.Marijan Soric.‌ Understanding and Extracting Table‌ Information from BRGM Documents‌‌.Ecole centrale de LyonFebruary 2025HAL‌back to text

Scientific‌ popularization

46 articleR.‌‌Rémi Ronfard, C.Charlotte Truchet and S.‌Serge Abiteboul. Informatique‌ théâtrale.BinaireJanuary‌‌ 2025, 1-12HALback to text
47‌ articleR.Rémi Ronfard‌, C.Charlotte Truchet‌‌ and S.Serge Abiteboul. Régie, captation, mise‌ en scène… Quand l’informatique‌ s’invite au théâtre.‌‌The Conversation FranceJanuary 2025HAL back to‌ text
48 articleP.‌Pierre Senellart and N.‌‌Nathalie Vieira. OnePSL30 : transformation.Collection‌ numérique de l’AMUE, Agence‌ de mutualisation des universités‌‌ et établissements d'enseignement supérieur39December 2025HAL‌back to text

11.3‌ Cited publications

49 book‌‌S.Serge Abiteboul, P.Peter Buneman and‌ D.Dan Suciu.‌ Data on the Web:‌‌ From Relations to Semistructured Data and XML.‌Morgan Kaufmann1999back‌ to text
50 book‌‌S.Serge Abiteboul,‌ R.Richard Hull and V.Victor Vianu.‌ Foundations of Databases.Addison-Wesley1995, URL:‌ http://webdam.inria.fr/Alice/back to text
51 bookS.Serge‌ Abiteboul, I.Ioana Manolescu, P.Philippe‌ Rigaux, M.-C.Marie-Christine Rousset and P.Pierre‌ Senellart. Web Data Management.Cambridge University‌ Press2011, URL: http://webdam.inria.fr/Jorgeback to text‌
52 incollectionM.Michael Benedikt and P.Pierre‌ Senellart. Databases.Computer Science, The Hardware,‌ Software and Heart of ItSpringer2011,‌ 169-229URL: https://doi.org/10.1007/978-1-4614-1168-0_10DOIback to text
53‌ articleA.Amol Deshpande, Z. G.Zachary‌ G. Ives and V.Vijayshankar Raman. Adaptive‌ Query Processing.Foundations and Trends in Databases‌112007, 1-140URL: https://doi.org/10.1561/1900000001DOI‌back to text
54 articleA. Y.Alon‌ Y. Halevy. Answering queries using views: A‌ survey.VLDB J.1042001,‌ 270-294URL: https://doi.org/10.1007/s007780100054DOIback to text
55‌ articleD.Donald Kossmann. The State of‌ the art in distributed query processing.ACM‌ Comput. Surv.3242000, 422-469URL:‌ http://doi.acm.org/10.1145/371578.371598DOI back to text
56 bookM.‌ T.M. Tamer Özsu and P.Patrick Valduriez‌. Principles of Distributed Database Systems, Third Edition‌.Springer2011, URL: https://doi.org/10.1007/978-1-4419-8834-8DOI back‌ to text
57 bookB.Burr Settles.‌ Active Learning.Synthesis Lectures on Artificial Intelligence‌ and Machine LearningMorgan & Claypool Publishers2012‌, URL: https://doi.org/10.2200/S00429ED1V01Y201207AIM018DOIback to text
58‌ bookR. S.Richard S. Sutton and A.‌ G.Andrew G. Barto. Reinforcement learning -‌ an introduction.Adaptive computation and machine learning‌MIT Press1998, URL: http://www.worldcat.org/oclc/37293240back to‌ text
59 inproceedingsK.Ke Zhou, M.‌Mounia Lalmas, T.Tetsuya Sakai, R.‌Ronan Cummins and J. M.Joemon M. Jose‌. On the reliability and intuitiveness of aggregated‌ search metrics.22nd ACM International Conference on‌ Information and Knowledge Management, CIKM'13, San Francisco, CA,‌ USA, October 27 - November 1, 20132013‌, 689-698URL: http://doi.acm.org/10.1145/2505515.2505691DOI back to text‌

VALDA - 2025

VALDA - 2025

2025Activity report​​​‌Project-TeamVALDA

Keywords

Computer Science and​​﻿﻿ Digital Science

Other Research Topics​‌﻿﻿ and Application Domains

1 Team members, visitors,​​​‌ external collaborators

Research Scientists﻿​﻿﻿

Faculty Member​​​‌

PhD Students

Technical Staff​​​‌

Interns​​﻿﻿ and Apprentices

Administrative​​﻿﻿ Assistant

Visiting Scientist﻿​﻿﻿

2​​​‌ Overall objectives

3 Research​​​‌ program

3.1 Research axis﻿​﻿﻿ 1: Foundations of data​‌﻿﻿ management

3.2 Research axis​​﻿﻿ 2: Uncertainty, provenance, and​​​‌ explainability in data management﻿​﻿﻿

3.3 Research axis​‌﻿﻿ 3: Knowledge discovery at​​﻿﻿ scale

4﻿﻿﻿‌ Application domains

5 Highlights of the﻿﻿﻿‌ year

5.1 Awards﻿﻿﻿‌

6 Latest﻿﻿﻿‌ software developments, platforms, open﻿‌​‌ data

6.1 Latest software﻿​​﻿ developments

6.1.1 ProvSQL

6.1.2 VUS​​﻿﻿

6.1.3 TSB-UAD​​​‌

6.1.4​​​‌ ADecimo

6.1.5 MSAD

6.1.6 apxproof

7 New results

7.1﻿﻿﻿‌ Research axis 1: Foundations﻿‌​‌ of data management

Knowledge representation and﻿﻿﻿‌ knowledge bases

Consistent​​​‌ query answering

The Chase​‌﻿﻿ and Existential Rules

Other aspects of theoretical​‌﻿﻿ computer science

7.2​​​‌ Research axis 2: Uncertainty,﻿﻿﻿‌ provenance, and explainability in﻿‌​‌ data management

Inconsistent﻿‌​‌ knowledge bases

Provenance and probability management​​​‌

7.3 Research axis﻿​﻿﻿ 3: Knowledge discovery at​‌﻿﻿ scale

Mining﻿​​﻿ time series

Information​​​‌ Extraction

8 Bilateral contracts and​​﻿﻿ grants with industry

8.1​​​‌ Bilateral contracts with industry﻿​﻿﻿

9 Partnerships​‌﻿﻿ and cooperations

9.1 International​​﻿﻿ initiatives

9.1.1 Participation in﻿​​﻿ other International Programs

9.2 International research﻿﻿﻿‌ visitors

9.2.1 Visits of﻿‌​‌ international scientists

Other international﻿​​﻿ visits to the team​​​‌

Anton Gnatenko

Amélie Marian

Victor​​​‌ Vianu

9.2.2 Visits to international﻿‌​‌ teams

Research stays abroad﻿​​﻿

9.3 National initiatives﻿​​﻿

9.3.1 ANR

9.3.2 Others

10 Dissemination

10.1 Promoting﻿﻿﻿‌ scientific activities

10.1.1 Scientific﻿‌​‌ events: organisation

General chair,﻿​​﻿ scientific chair

Member of the﻿​​﻿ organizing committees

10.1.2 Scientific​​﻿﻿ events: selection

Member of​​​‌ the conference program committees﻿​﻿﻿

10.1.3 Journal​​​‌

Member of the editorial﻿​﻿﻿ boards

Reviewer - reviewing activities​​﻿﻿

10.1.4​‌﻿﻿ Invited talks

10.1.5 Leadership within​​​‌ the scientific community

10.1.6 Research​‌﻿﻿ administration

10.2 Teaching - Supervision﻿﻿﻿‌ - Juries - Educational﻿‌​‌ and pedagogical outreach

10.2.1 Supervision﻿‌​‌

10.2.2﻿​﻿﻿ Juries

10.3​​​‌ Popularization

10.3.1 Specific official﻿​﻿﻿ responsibilities in science outreach​‌﻿﻿ structures

10.3.2 Productions (articles, videos,​​​‌ podcasts, serious games, ...)﻿​﻿﻿

10.3.3 Participation in Live﻿﻿﻿‌ events

11 Scientific production

11.1﻿‌​‌ Major publications

2025Activity report‌Project-TeamVALDA

Computer Science and Digital Science

Other Research Topics‌ and Application Domains

1 Team members, visitors,‌ external collaborators

Research Scientists

Faculty Member‌

Technical Staff‌

Interns and Apprentices

Administrative Assistant

Visiting Scientist

2‌ Overall objectives

3 Research‌ program

3.1 Research axis 1: Foundations of data‌ management

3.2 Research axis 2: Uncertainty, provenance, and‌ explainability in data management

3.3 Research axis‌ 3: Knowledge discovery at scale

4‌ Application domains

5 Highlights of the‌ year

5.1 Awards‌

6 Latest‌ software developments, platforms, open‌‌ data

6.1 Latest software developments

6.1.2 VUS

6.1.3 TSB-UAD‌

6.1.4‌ ADecimo

7.1‌ Research axis 1: Foundations‌‌ of data management

Knowledge representation and‌ knowledge bases

Consistent‌ query answering

The Chase‌ and Existential Rules

Other aspects of theoretical‌ computer science

7.2‌ Research axis 2: Uncertainty,‌ provenance, and explainability in‌‌ data management

Inconsistent‌‌ knowledge bases

Provenance and probability management‌

7.3 Research axis 3: Knowledge discovery at‌ scale

Mining time series

Information‌ Extraction

8 Bilateral contracts and grants with industry

8.1‌ Bilateral contracts with industry

9 Partnerships‌ and cooperations

9.1 International initiatives

9.1.1 Participation in other International Programs

9.2 International research‌ visitors

9.2.1 Visits of‌‌ international scientists

Other international visits to the team‌

Victor‌ Vianu

9.2.2 Visits to international‌‌ teams

Research stays abroad

9.3 National initiatives

10.1 Promoting‌ scientific activities

10.1.1 Scientific‌‌ events: organisation

General chair, scientific chair

Member of the organizing committees

10.1.2 Scientific events: selection

Member of‌ the conference program committees

10.1.3 Journal‌

Member of the editorial boards

Reviewer - reviewing activities

10.1.4‌ Invited talks

10.1.5 Leadership within‌ the scientific community

10.1.6 Research‌ administration

10.2 Teaching - Supervision‌ - Juries - Educational‌‌ and pedagogical outreach

10.2.1 Supervision‌‌

10.2.2 Juries

10.3‌ Popularization

10.3.1 Specific official responsibilities in science outreach‌ structures

10.3.2 Productions (articles, videos,‌ podcasts, serious games, ...)

10.3.3 Participation in Live‌ events

11.1‌‌ Major publications

11.2 Publications of the‌ year

Invited conferences

National peer-reviewed‌ Conferences

Doctoral‌‌ dissertations and habilitation theses

Reports & preprints

Other‌‌ scientific publications

Scientific‌ popularization

11.3‌ Cited publications