EN FR
EN FR
CEDAR - 2025

2025​​Activity reportProject-TeamCEDAR​​​‌

RNSR: 201622056J

Creation of the​‌ Project-Team: 2018 April 01​​

Each year, Inria research​​​‌ teams publish an Activity​ Report presenting their work​‌ and results over the​​ reporting period. These reports​​​‌ follow a common structure,​ with some optional sections​‌ depending on the specific​​ team. They typically begin​​​‌ by outlining the overall​ objectives and research programme,​‌ including the main research​​ themes, goals, and methodological​​​‌ approaches. They also describe​ the application domains targeted​‌ by the team, highlighting​​ the scientific or societal​​​‌ contexts in which their​ work is situated.

The​‌ reports then present the​​ highlights of the year,​​​‌ covering major scientific achievements,​ software developments, or teaching​‌ contributions. When relevant, they​​ include sections on software,​​​‌ platforms, and open data,​ detailing the tools developed​‌ and how they are​​ shared. A substantial part​​​‌ is dedicated to new​ results, where scientific contributions​‌ are described in detail,​​ often with subsections specifying​​​‌ participants and associated keywords.​

Finally, the Activity Report​‌ addresses funding, contracts, partnerships,​​ and collaborations at various​​​‌ levels, from industrial agreements​ to international cooperations. It​‌ also covers dissemination and​​ teaching activities, such as​​​‌ participation in scientific events,​ outreach, and supervision. The​‌ document concludes with a​​ presentation of scientific production,​​​‌ including major publications and​ those produced during the​‌ year.

Keywords

Computer Science​​ and Digital Science

  • A3.3.​​​‌ Data and knowledge analysis​
  • A9.1. Knowledge
  • A9.2. Machine​‌ learning
  • A9.2.1. Supervised learning​​
  • A9.2.2. Unsupervised learning
  • A9.2.3.​​​‌ Reinforcement learning
  • A9.2.6. Neural​ networks
  • A9.2.8. Deep learning​‌
  • A9.4. Natural language processing​​
  • A9.13. Agentic AI
  • A9.15.​​​‌ Symbolic AI
  • A9.16. Societal​ impact of AI
  • A9.17.​‌ Cybersecurity and AI

Other​​ Research Topics and Application​​​‌ Domains

  • B2.3. Epidemiology
  • B6.5.​ Information systems
  • B8.5.1. Participative​‌ democracy
  • B9.5.6. Data science​​
  • B9.7.2. Open data
  • B9.10.​​​‌ Privacy
  • B9.11.2. Financial risks​

1 Team members, visitors,​‌ external collaborators

Research Scientists​​

  • Ioana Manolescu Goujot [​​​‌Team leader, INRIA​, Senior Researcher,​‌ HDR]
  • Oana-Denisa Balalau​​ [INRIA, ISFP​​​‌]
  • Oana Goga [​Inria, Senior Researcher​‌, HDR]
  • Madhulika​​ Mohanty [INRIA,​​​‌ Researcher]

Faculty Member​

  • Yanlei Diao [ECOLE​‌ POLY PALAISEAU, Professor​​]

Post-Doctoral Fellows

  • Garima​​​‌ Gaur [INRIA,​ Post-Doctoral Fellow]
  • Chadi​‌ Helwe [INRIA,​​ Post-Doctoral Fellow, until​​​‌ Apr 2025]
  • Guillaume​ Lachaud [ECOLE POLY​‌ PALAISEAU, Post-Doctoral Fellow​​]
  • Kun Zhang [​​​‌ECOLE POLY PALAISEAU,​ from Apr 2025 until​‌ May 2025]

PhD​​ Students

  • Ines Abdelaziz [​​INRIA, from Dec​​​‌ 2025]
  • Nardjes Amieur‌ [CNRS]
  • Gabriel‌​‌ Ben Zenou [Ministère​​ Armées]
  • Abir Benzaamia​​​‌ [CNRS]
  • Theo‌ Bouganim [INRIA,‌​‌ until Mar 2025]​​
  • Tom Calamai [INRIA​​​‌ & Amundi, CIFRE‌]
  • Salim Chouaki [‌​‌CNRS]
  • Przemyslaw Dominikowski​​ [ECOLE POLY PALAISEAU​​​‌, from Sep 2025‌]
  • Asmaa El Fraihi‌​‌ [CNRS]
  • Vincent​​ Jacob [ECOLE POLY​​​‌ PALAISEAU, until Mar‌ 2025]
  • Hritika Kathuria‌​‌ [INRIA]
  • Muhammad​​ Khan [INRIA,​​​‌ until Sep 2025]‌
  • Gabriel Lozano Pinzon [‌​‌ECOLE POLY PALAISEAU,​​ from Sep 2025]​​​‌
  • Mohamed Mezhoudi [BNP‌ PARIBAS , CIFRE]‌​‌
  • Kun Zhang [INRIA​​, until Mar 2025​​​‌]

Technical Staff

  • Ines‌ Abdelaziz [INRIA,‌​‌ Engineer, until Nov​​ 2025]
  • Simon Ebel​​​‌ [INRIA, Engineer‌, until Jun 2025‌​‌]
  • Theo Galizzi [​​INRIA, Engineer,​​​‌ until Jun 2025]‌
  • Ismail Hatim [ECOLE‌​‌ POLYTECHNIQUE, Engineer,​​ from Nov 2025]​​​‌
  • Aurelien Peden [INRIA‌, Engineer, from‌​‌ Mar 2025 until Oct​​ 2025]
  • Georgios Siachamis​​​‌ [INRIA, Engineer‌]

Interns and Apprentices‌​‌

  • Pablo Bertaud-Velten [INRIA​​, Intern, from​​​‌ Mar 2025 until Jul‌ 2025]
  • Nikola Dobricic‌​‌ [INRIA, Intern​​, until Mar 2025​​​‌]
  • Przemyslaw Dominikowski [‌INRIA, Intern,‌​‌ from Mar 2025 until​​ Aug 2025]
  • Paul​​​‌ Kronlund-Drouault [INRIA,‌ Intern, from Jun‌​‌ 2025 until Aug 2025​​]
  • Gabriel Lozano Pinzon​​​‌ [ECOLE POLY PALAISEAU‌, Intern, from‌​‌ Mar 2025 until Aug​​ 2025]
  • Maria-Justina-Adriana Mateescu​​​‌ [INRIA, Intern‌, from Jul 2025‌​‌ until Jul 2025]​​
  • Maria Jesus Mellado Tenorio​​​‌ [INRIA, Intern‌, from Mar 2025‌​‌ until May 2025]​​
  • Saba Shahsavari [INRIA​​​‌, Intern, from‌ Apr 2025 until Aug‌​‌ 2025]
  • Yanis Zaamoun​​ [ECOLE POLY PALAISEAU​​​‌, Intern, until‌ Mar 2025]

Administrative‌​‌ Assistant

  • Michael Barbosa [​​INRIA]

External Collaborators​​​‌

  • Alexandre Barlot [Radio‌ France]
  • Nelly Barret‌​‌ [ECOLE POLYT. MILAN​​, until Apr 2025​​​‌]
  • Antoine Deiana [‌Radio France, until‌​‌ May 2025]
  • Helena​​ Galhardas [Instituto Superior​​​‌ Técnico, University of Lisbon‌]
  • Emilie Gautreau [‌​‌Radio France, until​​ Apr 2025]
  • Remi​​​‌ Guillou [ECOLE POLY‌ PALAISEAU, from Jun‌​‌ 2025 until Aug 2025​​]
  • Samuel Guimaraes [​​​‌CNRS, until Mar‌ 2025]
  • Paul Kronlund-Drouault‌​‌ [ENS DE LYON​​, from Sep 2025​​​‌]
  • Chenghao Lyu [‌Univ Massachusetts Amherst,‌​‌ from Sep 2025]​​
  • Adrien Maumy [Radio​​​‌ France, until Apr‌ 2025]
  • Tobias Moller‌​‌ [TELECOM PARIS,​​ from Jul 2025 until​​​‌ Nov 2025]
  • Thomas‌ Pontillon [Radio France‌​‌, until Apr 2025​​]
  • Gerald Roux [​​​‌Radio France, until‌ Apr 2025]
  • Prajna‌​‌ Devi Upadhyay [BITS​​ PILANI HYDERABAD CAMPUS]​​​‌
  • Joanna Yakin [Radio‌ France, until Apr‌​‌ 2025]

2 Overall​​​‌ objectives

Our research aims​ at models, algorithms and​‌ tools for highly efficient,​​ easy-to-use data and knowledge​​​‌ management; throughout our​ research, performance at scale​‌ is a core concern,​​ which we address, among​​​‌ other techniques, by designing​ algorithms for a cloud​‌ (massively parallel) setting. In​​ addition, we explore and​​​‌ mine rich data via​ machine learning techniques. Our​‌ scientific contributions fall into​​ four interconnected areas:

  • Optimization​​​‌ and performance at scale.​
    We work to devise​‌ efficient and effective optimization​​ techniques which seek to​​​‌ make processing of data​ at very large scale,​‌ as efficient as possible.​​ These efforts span over​​​‌ relational, graph, and text-rich​ data, in centralized as​‌ well as in distributed​​ architectures.
  • Data discovery and​​​‌ exploration.
    Today's Big Data​ is complex; understanding and​‌ exploiting it is daunting,​​ especially to novice users​​​‌ such as journalists or​ domain scientists. We work​‌ to devise techniques for​​ allowing users to explore​​​‌ graph data, large, heterogeneous​ data lakes, as well​‌ as more subtle signals​​ hidden in the data,​​​‌ such as anomalies in​ time series and in​‌ dynamic graphs.
  • Natural language​​ understanding for analyzing and​​​‌ supporting digital arenas.
    In​ this area, we are​‌ interested in applications with​​ high social value, such​​​‌ as analysing public discourse​ with the goal of​‌ finding elements that could​​ bias the world view​​​‌ of citizens, such as​ false claims, fallacious arguments,​‌ propaganda, or greenwashing.
  • Safeguarding​​ information systems.
    Recent events​​​‌ have brought to light​ the easiness of using​‌ current online systems to​​ propagate information (that is​​​‌ sometimes false) and that​ we are facing an​‌ information war. We create​​ knowledge and technology in​​​‌ this area to make​ the online information space​‌ safer.

3 Research program​​

3.1 Multi-model querying

As​​​‌ the world's affairs get​ increasingly more digital, a​‌ large and varied set​​ of data sources becomes​​​‌ available: they are either​ structured databases, such as​‌ government-gathered data (demographics, economics,​​ taxes, elections), legal records,​​​‌ stock quotes for specific​ companies, un-structured or semi-structured,​‌ including in particular graph​​ data, sometimes endowed with​​​‌ semantics (see e.g., the​ Linked Open Data cloud).​‌ Modern data management applications,​​ such as data journalism,​​​‌ are eager to combine​ in innovative ways both​‌ static and dynamic information​​ coming from structured, semi-structured,​​​‌ and unstructured databases and​ social feeds. However, current​‌ content management tools for​​ this task are not​​​‌ suited for the task,​ in particular when they​‌ require a lengthy rigid​​ cycle of data integration​​​‌ and consolidation in a​ warehouse. Thus, we need​‌ flexible tools allowing us​​ to interconnect various kinds​​​‌ of data sources and​ query them together.

3.2​‌ New methods for exploring​​ and querying data graphs​​​‌

Semantic graphs, including data​ and knowledge, are hard​‌ to apprehend for users​​ due to the complexity​​​‌ of their structure and,​ often to their large​‌ volumes. To help tame​​ this complexity, we seek​​​‌ new methods for exploring​ highly heterogeneous data graphs​‌ resulting from integrating structured,​​ semi-structured, and unstructured (text)​​​‌ data. In this context,​ we study methods for​‌ automatically identifying, in a​​ large corpus of data​​ sources, interesting data paths​​​‌ that connect Named Entities‌ (NE) to each other.‌​‌ Further, in some application​​ contexts where RDF data​​​‌ graphs are collaboratively used,‌ it is essential that‌​‌ access control methods be​​ in place to guard​​​‌ access to the data.‌ Query answers need, then,‌​‌ to be computed by​​ taking into account access​​​‌ control restrictions, as well‌ as ontologies that describe‌​‌ the data semantics.

3.3​​ Navigating the continuum between​​​‌ text and (semi) structured‌ data

In data journalism‌​‌ and fact-checking applications, useful​​ information comes both in​​​‌ structured records and in‌ natural language text,

3.4‌​‌ An unified framework for​​ optimizing data analytics

Data​​​‌ analytics in the cloud‌ has become an integral‌​‌ part of enterprise businesses.​​ Big data analytics systems,​​​‌ however, still lack the‌ ability to take user‌​‌ performance goals and budgetary​​ constraints for a task​​​‌ collectively referred to as‌ task objectives, and automatically‌​‌ configure an analytic job​​ to achieve the objectives.​​​‌ Our goal is to‌ develop a data analytics‌​‌ optimizer that can automatically​​ determine a cluster configuration​​​‌ with a suitable number‌ of cores and other‌​‌ runtime system parameters that​​ best meet the task​​​‌ objectives. To achieve this,‌ we also need to‌​‌ design a multi-objective optimizer​​ that constructs a Pareto​​​‌ optimal set of job‌ configurations for task-specific objectives‌​‌ and recommends new job​​ configurations to best meet​​​‌ these objectives.

3.5 Elastic‌ resource management for virtualized‌​‌ database engines

Database engines​​ are migrating to the​​​‌ cloud to leverage the‌ opportunities for efficient resource‌​‌ management by adapting to​​ the variations and heterogeneity​​​‌ of the workloads. Resource‌ management in a virtualized‌​‌ setting, like the cloud,​​ must be enforced in​​​‌ a performance-efficient manner to‌ avoid introducing overheads to‌​‌ the execution. We design​​ elastic systems that change​​​‌ their configuration at runtime‌ with minimal cost to‌​‌ adapt to the workload​​ every time. Changes in​​​‌ the design include both‌ different resource allocations and‌​‌ different data layouts. We​​ consider different workloads, including​​​‌ transactional, analytical, and mixed,‌ and we study the‌​‌ performance implications on different​​ configurations to propose a​​​‌ set of adaptive algorithms.‌

3.6 Argumentation mining

Argumentation‌​‌ appears when we evaluate​​ the validity of new​​​‌ ideas, convince an addressee,‌ or solve a difference‌​‌ of opinion. An argument​​ contains a statement to​​​‌ be validated (a proposition‌ also called claim or‌​‌ conclusion), a set of​​ backing propositions (called premises,​​​‌ which should be accepted‌ ideas), and a logical‌​‌ connection between all the​​ pieces of information presented​​​‌ that allows the inference‌ of the conclusion. In‌​‌ our work, we focus​​ on fallacious arguments, where​​​‌ evidence does not prove‌ or disprove the claim,‌​‌ for example, in an​​ "ad hominem" argument, a​​​‌ claim is declared false‌ because the person making‌​‌ it has a character​​ flaw. We study the​​​‌ impact of fallacies in‌ online discussions and show‌​‌ the need for improving​​ tools for their detection.​​​‌ In addition, we look‌ into detecting verifiable claims‌​‌ made by politicians. We​​ started a collaboration with​​​‌ RadioFrance and with Wikidébats,‌ a debate platform focused‌​‌ on proving quality arguments​​​‌ for controversial topics.

3.7​ Measuring and mitigating risks​‌ of AI-driven information targeting​​

We are witnessing a​​​‌ massive shift in the​ way people consume information.​‌ In the past, people​​ had an active role​​​‌ in selecting the news​ they read. More recently,​‌ the information started to​​ appear on people's social​​​‌ media feeds as a​ byproduct of one's social​‌ relations. We see a​​ new shift brought by​​​‌ the emergence of online​ advertising platforms where third​‌ parties can pay ad​​ platforms to show specific​​​‌ information to particular groups​ of people through paid​‌ targeted ads. AI-driven algorithms​​ power these targeting technologies.​​​‌ Our goal is to​ study the risks with​‌ AI-driven information targeting at​​ three levels: (1) human-level-in​​​‌ which conditions targeted information​ can influence an individual's​‌ beliefs; (2) algorithmic- level​​ in which conditions AI-driven​​​‌ targeting algorithms can exploit​ people's vulnerabilities; and (3)​‌ platform-level are targeting technologies​​ leading to biases in​​​‌ the quality of information​ different groups of people​‌ receive and assimilate. Then,​​ we will use this​​​‌ understanding to propose protection​ mechanisms for platforms, regulators,​‌ and users.

4 Application​​ domains

4.1 Cloud computing​​​‌

Cloud computing services are​ strongly developing and more​‌ and more companies and​​ institutions resort to running​​​‌ their computations in the​ cloud, in order to​‌ avoid the hassle of​​ running their own infrastructure.​​​‌ Today's cloud service providers​ guarantee machine availabilities in​‌ their Service Level Agreement​​ (SLA), without any guarantees​​​‌ on performance measures according​ to a specific cost​‌ budget. Running analytics on​​ big data systems require​​​‌ the user not to​ only reserve the suitable​‌ cloud instances over which​​ the big data system​​​‌ will be running, but​ also setting many system​‌ parameters like the degree​​ of parallelism and granularity​​​‌ of scheduling. Chosing values​ for these parameters, and​‌ chosing cloud instances need​​ to meet user objectives​​​‌ regarding latency, throughput and​ cost measures, which is​‌ a complex task if​​ it is done manually​​​‌ by the user. Hence,​ we need need to​‌ transform cloud service models​​ from availabily to user​​​‌ performance objective rises and​ leads to the problem​‌ of multi-objective optimization. Research​​ carried out in the​​​‌ team within the ERC​ project “Big and Fast​‌ Data Analytics” aims to​​ develop a novel optimization​​​‌ framework for providing guarantees​ on the performance while​‌ controlling the cost of​​ data processing in the​​​‌ cloud.

4.2 Computational journalism​

Modern journalism increasingly relies​‌ on content management technologies​​ in order to represent,​​​‌ store, and query source​ data and media objects​‌ themselves. Writing news articles​​ increasingly requires consulting several​​​‌ sources, interpreting their findings​ in context, and crossing​‌ links between related sources​​ of information. Cedar research​​​‌ results directly applicable to​ this area provide techniques​‌ and tools for rich​​ Web content warehouse management.​​​‌ Within the SourcesSay AI​ Chair project, we work​‌ to devise concrete algorithms​​ and platforms to help​​​‌ journalists perform their work​ better and/or faster. This​‌ work is in collaboration​​ with the journalists from​​​‌ RadioFrance, the team Le​ vrai du faux.

4.3​‌ Computational social science

Political​​ discussions revolve around ideological​​ conflicts that often split​​​‌ the audience into two‌ opposing parties. Both parties‌​‌ try to win the​​ argument by bringing forward​​​‌ information. However, often this‌ information is misleading, and‌​‌ its dissemination employs propaganda​​ techniques. We investigate the​​​‌ impact of propaganda in‌ online forums and we‌​‌ study a particular type​​ of propagandist content, the​​​‌ fallacious argument. We show‌ that identifying such arguments‌​‌ remains a difficult task,​​ but one of high​​​‌ importance because of the‌ pervasiveness of this type‌​‌ of discourse. We also​​ explore trends around the​​​‌ diffusion and consumption of‌ propaganda and how this‌​‌ can impact or be​​ a reflection of society.​​​‌

4.4 Online targeted advertising‌

The enormous financial success‌​‌ of online advertising platforms​​ is partially due to​​​‌ the precise targeting features‌ they offer. Ad platforms‌​‌ collect large amounts of​​ data on users and​​​‌ use powerful AI-driven algorithms‌ to infer users' fine-grain‌​‌ interests and demographics, which​​ they make available to​​​‌ advertisers to target users.‌ For instance, advertisers can‌​‌ target groups of users​​ as small as tens​​​‌ or hundreds and as‌ specific as “people interested‌​‌ in anti-abortion movements that​​ have a particular education​​​‌ level”. Ad platforms also‌ employ AI-driven targeting algorithms‌​‌ to predict how “relevant”​​ ads are to particular​​​‌ groups of people to‌ decide to whom to‌​‌ deliver them. While these​​ targeting technologies are creating​​​‌ opportunities for businesses to‌ reach interested parties and‌​‌ lead to economic growth,​​ they also open the​​​‌ way for interested groups‌ to use user's data‌​‌ to manipulate them by​​ targeting messages that resonate​​​‌ with each user.

5‌ Social and environmental responsibility‌​‌

5.1 Contribution to Diversity,​​ Equity and Inclusion

Madhulika​​​‌ Mohanty co-led the SCOUT‌ action of the Diversity,‌​‌ Equity and Inclusion initiative​​ (website) for the DB​​​‌ research community from 2021-2025.‌ This action provided a‌​‌ checklist of items to​​ be checked before submitting​​​‌ a paper to promote‌ and ensure more DEI-compliant‌​‌ papers. This will be​​ integrated within the standard​​​‌ submission systems for DB‌ conferences. This has led‌​‌ to the publication of​​ 13.

6 Highlights​​​‌ of the year

6.1‌ Awards

The paper “RDF‌​‌ Query Answering in the​​ Presence of Access Restrictions”'​​​‌ by Maxime Buron ,‌ Hritika Kathuria , Ioana‌​‌ Manolescu Goujot and Georgios​​ Siachamis won the CoopIS​​​‌ 2025 Best Paper Award‌ 28

7 Latest software‌​‌ developments, platforms, open data​​

7.1 Latest software developments​​​‌

7.1.1 ConnectionLens

  • Name:
    Integration‌ of heterogeneous data using‌​‌ information extraction
  • Keyword:
    Data​​ analysis
  • Functional Description:
    ConnectionLens​​​‌ treats a set of‌ heterogeneous, independently authored data‌​‌ sources as a single​​ virtual graph, whereas nodes​​​‌ represent fine-granularity data items‌ (relational tuples, attributes, key-value‌​‌ pairs, RDF, JSON or​​ XML nodes…) and edges​​​‌ correspond either to structural‌ connections (e.g., a tuple‌​‌ is in a database,​​ an attribute is in​​​‌ a tuple, a JSON‌ node has a parent…)‌​‌ or to similarity (sameAs)​​ links. To further enrich​​​‌ the content journalists work‌ with, we also apply‌​‌ entity extraction which enables​​ to detect the people,​​​‌ organizations etc. mentioned in‌ text, whether full-text or‌​‌ text snippets found e.g.​​​‌ in RDF or XML.​ ConnectionLens is thus capable​‌ of finding and exploiting​​ connections present across heterogeneous​​​‌ data sources without requiring​ the user to specify​‌ any join predicate.
  • URL:​​
  • Publications:
  • Contact:​
    Manolescu Ioana

7.1.2 Abstra​‌

  • Name:
    Abstra: Toward Generic​​ Abstractions for Data of​​​‌ Any Model
  • Keywords:
    Heterogeneous​ Data, Data Exploration, Data​‌ analysis, Databases, LOD -​​ Linked open data
  • Functional​​​‌ Description:
    Abstra computes a​ description meant for humans,​‌ based on the idea​​ that, regardless of the​​​‌ syntax or the data​ model, any dataset holds​‌ some collections of entities/records,​​ that are possibly linked​​​‌ with relationships. Abstra relies​ on a common graph​‌ representation of any incoming​​ dataset, it leverages Information​​​‌ Extraction to detect what​ the dataset is about,​‌ and relies on an​​ original algorithm for selecting​​​‌ the core entity collections​ and their relations. Abstractions​‌ are shown both as​​ HTML text and a​​​‌ lightweight Entity-Relationship diagram.
  • URL:​
  • Publications:
    hal-04131974,​‌ hal-04131974, hal-03767967,​​ hal-03774599
  • Contact:
    Madhulika Mohanty​​​‌
  • Participants:
    Ioana Manolescu Goujot,​ Madhulika Mohanty, Nelly Barret,​‌ Prajna Devi Upadhyay

7.1.3​​ StatCheck

  • Name:
    Fact-checking Multidimensional​​​‌ Statistic Claims in French​
  • Keywords:
    Machine learning, Databases,​‌ Natural language processing, Software​​ engineering
  • Scientific Description:
    To​​​‌ strengthen public trust and​ counter disinformation, computational fact-checking,​‌ leveraging digital data sources,​​ attracts interest from the​​​‌ journalists and the computer​ science community. A particular​‌ class of interesting data​​ sources comprises statistics, that​​​‌ is, numerical data compiled​ mostly by governments, administrations,​‌ and international organizations. Statistics​​ are often multidimensional datasets,​​​‌ where multiple dimensions characterize​ one value, and the​‌ dimensions may be organized​​ in hierarchies. This paper​​​‌ describes STATCHECK, a statistic​ fact-checking system jointly developed​‌ by the authors, which​​ are either computer science​​​‌ researchers or fact-checking journalists​ working for a French-language​‌ media with a daily​​ audience of more than​​​‌ 15 millions (aud, 2022).​ The technical novelty of​‌ STATCHECK is twofold: (i)​​ we focus on multidimensional,​​​‌ complex-structure statistics, which have​ received little attention so​‌ far, despite their practical​​ importance, and (ii) novel​​​‌ statistical claim extraction modules​ for French, an area​‌ where few resources exist.​​ We validate the efficiency​​​‌ and quality of our​ system on large statistic​‌ datasets (hundreds of millions​​ of facts), including the​​​‌ complete INSEE (French) and​ Eurostat (European Union) datasets,​‌ as well as French​​ presidential election debates.
  • Functional​​​‌ Description:
    StatCheck firstly allows​ the collection of data​‌ for its operation. Two​​ types of data are​​​‌ collected: statistical tables and​ posts from social networks:​‌ - Acquisition of statistical​​ files on the site​​​‌ of referent organisations (INSEE,​ Eurostat) - Extraction of​‌ statistical tables from these​​ files, and storage of​​​‌ the extracted tables -​ Acquisition of political tweets​‌ from a list of​​ accounts The application allows​​​‌ the detection, extraction and​ search of statistical facts:​‌ - Detection and extraction​​ of statistical facts from​​​‌ Twitter posts (e.g. "Unemployment​ rate increased by 30%​‌ in 2023) - Search​​ for statistical facts in​​​‌ our database. Display of​ the twenty most relevant​‌ statistical tables for a​​ statistical fact - Automatic​​ transcription of audio files​​​‌ to detect and extract‌ transcripts of statistical facts.‌​‌
  • Release Contributions:
    - Redesign​​ of the user interface​​​‌ - Modification of the‌ software architecture - Addition‌​‌ of audio transcription
  • URL:​​
  • Publications:
    hal-01496700,​​​‌ hal-01745768, hal-02121389,‌ hal-01915148, hal-03767992,‌​‌ hal-03791175
  • Contact:
    Ioana Manolescu​​ Goujot
  • Participants:
    Antoine Gauquier,​​​‌ Tien Duc Cao, Ioana‌ Manolescu Goujot, Xavier Tannier,‌​‌ Oana-Denisa Balalau, Simon Ebel,​​ Theo Galizzi

7.1.4 ConnectionStudio​​​‌

  • Keywords:
    Heterogeneous Data, Data‌ Exploration
  • Functional Description:

    ConnectionStudio‌​‌ integrates highly heterogeneous data​​ into graphs, enriched with​​​‌ extracted entities. Studio users‌ can discover the entities‌​‌ in their data, navigate​​ across connections between datasets,​​​‌ explore and query the‌ data in many ways.‌​‌ The Studio currently supports:​​ CSV, JSON, XML, RDF,​​​‌ text, property graphs, all‌ Office formats, and PDF‌​‌ datasets.

    ConnectionStudio is a​​ novel front-end to ConnectionLens,​​​‌ Abstra and PathWays (see‌ also the respective Web‌​‌ sites). Its own novel​​ features are outlined in​​​‌ a CoopIS 2023 article.‌

  • URL:
  • Publications:
  • Contact:
    Ioana Manolescu Goujot​​​‌
  • Participants:
    Madhulika Mohanty, Simon‌ Ebel, Theo Galizzi

7.1.5‌​‌ FactSpotter

  • Keywords:
    Factual Faithfulness,​​ Text generation
  • Functional Description:​​​‌
    We propose a new‌ metric that correctly identifies‌​‌ factual faithfulness, i.e., given​​ a triple (subject, predicate,​​​‌ object), it decides if‌ the triple is present‌​‌ in a generated text.​​ We show that our​​​‌ metric FactSpotter achieves the‌ highest correlation with human‌​‌ annotations on data correct-​​ ness, data coverage, and​​​‌ relevance. In addition, FactSpotter‌ can be used as‌​‌ a plug-in feature to​​ improve the factual faithfulness​​​‌ of existing models.
  • Contact:‌
    Kun Zhang
  • Partner:
    Ecole‌​‌ Polytechnique

7.1.6 PathWays

  • Name:​​
    PathWays: finding entity paths​​​‌ in heterogeneous data graphs‌
  • Keywords:
    Named entities, Data‌​‌ Journalism, Heterogeneous Data
  • Functional​​ Description:
    PathWays models heteroegenous​​​‌ datasets in a graph‌ (see ConnectionLens). To identify‌​‌ interesting paths in this​​ graph, Pathways works on​​​‌ its (smaller) summary (see‌ Abstra) for efficiency and‌​‌ optimisation. Then, it sorts​​ paths by their potential​​​‌ interest (metric based on‌ the entity found and‌​‌ the information diluation along​​ the path) before evaluating​​​‌ them with the help‌ of a new multi-query‌​‌ optimisation algorithm. Finally, PathWays​​ shows the most interesting​​​‌ (evaluated) paths in the‌ form of tables, wich‌​‌ are very easy to​​ understanf for journalists who​​​‌ are at the initiative‌ of this scenario.
  • URL:‌​‌
  • Publications:
  • Contact:​​​‌
    Ioana Manolescu Goujot

7.1.7‌ OpenIEEntity

  • Name:
    Open Information‌​‌ Extraction with Entity Focused​​ Constraints
  • Keyword:
    Information extraction​​​‌
  • Functional Description:
    This tool‌ takes in input a‌​‌ sentence and outputs the​​ facts contained in the​​​‌ sentence, in the format‌ (subject,predicate,object).
  • Contact:
    Oana-Denisa Balalau‌​‌

7.1.8 FactCheckBureau

  • Name:
    FactCheckBureau:​​ Build Your Own Fact-Check​​​‌ Analysis Pipeline
  • Keywords:
    Fact‌ Check Retireval, Fact-checking
  • Functional‌​‌ Description:
    FactCheckBurea is an​​ end-to-end solution that enables​​​‌ researchers to easily and‌ interactively design and evaluate‌​‌ Fact Check retrieval pipelines.​​ Further, it provides a​​​‌ query interface for non-technical‌ users to find relevant‌​‌ Fact Checks for the​​ input query in the​​​‌ form of a key‌ phrase, social media post,‌​‌ or an image.
  • URL:​​​‌
  • Publication:
  • Contact:​
    Ioana Manolescu Goujot

7.1.9​‌ FDSpotter

  • Name:
    Structured Discourse​​ Representation for Factual Consistency​​​‌ Verification
  • Keyword:
    LLM
  • Functional​ Description:
    The repository includes​‌ the tool to test​​ for factual consistency, but​​​‌ also all the code​ necessary to compare our​‌ tool with state of​​ the art methods for​​​‌ factual consistency.
  • Contact:
    Oana-Denisa​ Balalau

7.1.10 COI-OpenIE

  • Keywords:​‌
    Conflict Of Interest Mining,​​ Knowledge graph, Scientific Text,​​​‌ Information extraction
  • Functional Description:​
    This software expects as​‌ input a collection of​​ certain sections (Acknowledgment, Funding​​​‌ disclosure, and so on)​ of scientific publications, and​‌ produces a knowledge graph​​ that has information about​​​‌ the different interesting relations​ among Individuals and Organizations​‌ that were present in​​ the input text corpus.​​​‌
  • Contact:
    Oana-Denisa Balalau

7.1.11​ ClimateNLP toolbox

  • Name:
    Climate​‌ NLP toolbox
  • Keywords:
    Climate​​ change, Classification, Natural language​​​‌ processing
  • Functional Description:
    Python​ Scripts to train or​‌ download models (BERT-based models,​​ TF-IDF). It also contains​​​‌ scripts to run LLM​ pipelines to perform the​‌ same tasks.
  • Contact:
    Tom​​ Calamai

7.1.12 MultilingualPoliticalLLMs

  • Keywords:​​​‌
    LLM, Multilingual
  • Functional Description:​
    We test different scenarios,​‌ where we vary the​​ language of the prompt​​​‌ while also assigning a​ nationality to the model.​‌ We evaluate models on​​ the 50 most populous​​​‌ countries and their official​ languages.
  • URL:
  • Contact:​‌
    Oana-Denisa Balalau

8 New​​ results

8.1 Data management​​​‌ for analyzing and verifying​ digital arenas

8.1.1 Graph​‌ data lakes of heterogeneous​​ data sources for data​​​‌ journalism

Participants: Oana-Denisa Balalau​, Pablo Bertaud-Velten,​‌ Nikola Dobricic, Przemyslaw​​ Dominikowski, Simon Ebel​​​‌, Theo Galizzi,​ Garima Gaur, Ioana​‌ Manolescu, Maria Jesus​​ Mellado Tenorio, Madhulika​​​‌ Mohanty, Saba Shahsavari​, Georgios Siachamis.​‌

Work carried within the​​ ANR AI Chair SourcesSay​​​‌ project has focused on​ developing a platform, ConnectionLens,​‌ for integrating arbitrary heterogeneous​​ data into a graph,​​​‌ then exploring and querying​ that graph using simple,​‌ intuitive query interfaces. The​​ main technical challenges addressed​​​‌ were: (i) how to​ interconnect structured and semi-structured​‌ data sources? We address​​ this through information extraction​​​‌ (when an entity appears​ in two data sources​‌ or two places in​​ the same graph, we​​​‌ only create one node,​ thus interlinking the two​‌ locations) and through similarity​​ comparisons7.1.1; (ii)​​​‌ how to find all​ connections between nodes matching​‌ specific search criteria, or​​ certain keywords? The question​​​‌ is particularly challenging in​ our context since ConnectionLens​‌ graphs can be pretty​​ large, and query answers​​​‌ can traverse edges in​ both directions(iii) how to​‌ convert this graph into​​ standard graph data models​​​‌ like property graphs, etc.​ ConnectionLens is available online​‌ at: ConnectionLens Gitlab repository​​, while ConnectionStudio, its​​​‌ GUI, is available at​ ConnectionStudio Gitlab repository.​‌

With the ANR TopOL​​ project, we now extend​​​‌ our contributions to large​ scale data lakes of​‌ heterogeneous data sources and​​ explore novel ways of​​​‌ exploration. In this context,​ the following new contributions​‌ have been brought:

  1. Efficiently​​ Profiling, Indexing and Querying​​​‌ Heterogeneous Datasets in Graph​ Data Lakes Building on​‌ the ConnectionLens 7.1.1 and​​ Abstra 7.1.2 frameworks, this​​ work focuses on enabling​​​‌ natural language question answering‌ over large-scale heterogenous data‌​‌ lake. In each dataset,​​ we have formalized the​​​‌ concept of entities and‌ their contexts, which serve‌​‌ as natural "anchors" of​​ users' questions, e.g. which​​​‌ Person interacted with which‌ Organization, and at what‌​‌ Location. To support efficient​​ search over the set​​​‌ of entities-in-context, we developed‌ an end-to-end system that‌​‌ ingests heterogenous sources into​​ a graph data lake​​​‌ (using ConnectionLens), abstracts them‌ into collections (using Abstra)‌​‌ and finally builds and​​ indexes the entities-in-context. The​​​‌ developed indexes include: Locality‌ Sensitive Hashing (LSH) for‌​‌ semantic similarity search and​​ TRIE-like structure for exact​​​‌ lookups.

    This work provides‌ a foundation for the‌​‌ future work (e.g. building​​ Retrieval-Augmented Generation system) allowing​​​‌ non-technical users like journalists‌ to uncover the interesting‌​‌ facts over the large​​ heterogenous data lakes, in​​​‌ particular in domains such‌ as investigative journalism (with‌​‌ the team's ongoing collaboration​​ with ICIJ).

  2. Batch Generic​​​‌ Evaluation of Keyword Queries‌ on Graphs Keyword search‌​‌ is a popular paradigm​​ for searching for information​​​‌ in graphs: users specify‌ a few search terms‌​‌ (or keywords), and the​​ system returns subtrees of​​​‌ the graph, where each‌ keyword is matched by‌​‌ a node in each​​ returned subtree. Because the​​​‌ problem is NP-hard in‌ general, many keyword search‌​‌ algorithms consider a fixed​​ score function which is​​​‌ applied to rank result‌ trees, and explore only‌​‌ part of the search​​ space, pruning trees with​​​‌ low scores. In contrast,‌ generic algorithms explore the‌​‌ complete search space (subject​​ to space or time​​​‌ limits, due to the‌ high complexity), but can‌​‌ be used with any​​ score function. In this​​​‌ work, we consider the‌ problem of simultaneously answering‌​‌ a set (batch) of​​ keyword queries, in a​​​‌ way compatible with any‌ score function. Building upon‌​‌ our recent one-query generic​​ algorithm 36, we​​​‌ show that when graph‌ nodes match keywords from‌​‌ multiple queries, graph exploration​​ effort can be shared,​​​‌ to speed up the‌ evaluation of the query‌​‌ batch. We formally establish​​ guarantees on the correctness​​​‌ and completess of our‌ algorithm, and demonstrate its‌​‌ efficiency through comprehensive experiments​​ over synthetic and real-world​​​‌ graphs.
  3. Named Entity Cleaning‌ and Enhancement with Human-in-the-loop‌​‌ Named Entities (NEs, in​​ short) are frequently encountered​​​‌ in datasets about varied‌ topics, e.g., journalistic investigations‌​‌ (people, places, companies), market​​ analysis (companies and officers),​​​‌ etc. NEs often appear‌ under different forms within‌​‌ or across datasets, due​​ to spelling variants or​​​‌ mistakes. To leverage NE-rich‌ datasets, the NEs need‌​‌ to be clean (error-free),​​ and possibly enriched with​​​‌ information from external sources.‌ While numerous data cleaning‌​‌ solutions exist, in this​​ work, we focus on​​​‌ the specific challenges raised‌ by the cleaning of‌​‌ NE sets, in particular​​ (i) through​​​‌ a visual workflow interface,‌ (ii)‌​‌ leveraging old and new​​ techniques (string distances, Knowledge​​​‌ Bases, and carefully controlled‌ access to LLMs), and‌​‌ especially (ii​​i) enabling human​​​‌ inspection and interaction with‌ the NE cleaning process,‌​‌ down to the granularity​​​‌ of an individual attribute​ of a record. The​‌ latter need is crucial​​ in order to capture​​​‌ advanced knowledge that only​ domain experts have, and​‌ which may be absent​​ from all other sources​​​‌ of information (KB, LLM,​ etc.) We support this​‌ by gathering how-provenance that​​ traces the numerous ways​​​‌ in which information is​ brought to clean NEs.​‌ We built NiceT, a​​ system addressing these challenges,​​​‌ and tested on a​ variety of real-life datasets.​‌
  4. Named Entity Centric Querying​​ over Heterogeneous Data Integrating​​​‌ information from diverse sources,​ particularly in investigative journalism,​‌ often hinges on linking​​ data through shared named​​​‌ entities (NEs). The same​ entity may appear across​‌ multiple sources, each providing​​ a different contextual perspective.​​​‌ For instance, when combining​ U.S. financial and political​‌ datasets, Donald Trump may​​ emerge as a common​​​‌ entity, associated with distinct​ roles such as businessperson​‌ and politician. From a​​ journalistic standpoint, the ability​​​‌ to seamlessly integrate heterogeneous​ data sources and query​‌ entity roles or inter-entity​​ relationships—without requiring advanced technical​​​‌ expertise—is critical.

    This project,​ centered on the problem​‌ of extracting and integrating​​ information about a named​​​‌ entity (NE) that may​ appear across heterogeneous datasets​‌ within a datalake, gives​​ rise to two concrete​​​‌ research challenges. First, given​ a input NE, identify​‌ the roles (context) it​​ plays across different datasets​​​‌ and aggregate relevant information​ about the NE. We​‌ refer to the aggregated​​ output as the Infocard​​​‌ of the NE. Second,​ given a collection of​‌ heterogeneous datasets and an​​ NE, find its interesting​​​‌ relationships with the other​ Named entities. We leverage​‌ the capabilities of our​​ in-house tools, ConnectionLens and​​​‌ Abstra, that can integrate​ structured, semi-structured, and unstructured​‌ datasets into a unified​​ graph, and create high-level​​​‌ semantic abstractions of the​ complex datasets.

8.1.2 RDF​‌ Query Answering in the​​ Presence of Access Restrictions​​​‌

Participants: Maxime Buron,​ Hritika Kathuria, Ioana​‌ Manolescu, Georgios Siachamis​​.

In this work,​​​‌ we explore algorithms for​ answering conjunctive RDF queries​‌ in the presence of​​ RDFS ontologies and access​​​‌ control. We consider an​ access control setting where​‌ by default all users​​ have access to the​​​‌ complete graph, and a​ restriction can forbid user​‌ a user's access to​​ specific IRIs. Here, restricting​​​‌ for user u the​ access to an IRI​‌ i entails that: no​​ answer to a query​​​‌ by u may contain​ the IRI i;​‌ no triple containing i​​ can be used to​​​‌ compute an answer for​ a query by i​‌, nor to entail​​ such a triple via​​​‌ reasoning with the ontology.​ We present a set​‌ of query answering algorithms​​ for this novel context,​​​‌ and prove that five​ among them are correct,​‌ i.e., sound and complete,​​ with respect to both​​​‌ the ontology and the​ access restrictions in place.​‌ We have implemented all​​ our algorithms and present​​​‌ experiments comparing their performance.​ This work was published​‌ in CoopIS 2025 28​​ where it won the​​​‌ Best Paper Award.

8.1.3​ FactCheck-KG: Towards LLM-backed FC​‌ Retrieval

Participants: Garima Gaur​​, Madhulika Mohanty.​​

There is an unprecedented​​​‌ rise in the volume‌ and reach of disinformation‌​‌ due to the popularity​​ of social media and​​​‌ the advent of generative‌ AI models. Fact-checking, that‌​‌ is, checking the veracity​​ of a certain claim,​​​‌ is unfeasible at this‌ scale, by human effort‌​‌ alone. This is primarily​​ due to the rise​​​‌ in the volume of‌ claims requiring verification, and‌​‌ also the number of​​ documents to be processed​​​‌ to verify a certain‌ claim. This process is‌​‌ further complicated by disinformation​​ re-surfacing in paraphrased forms,​​​‌ altered context, incomplete, or‌ shifted context. The fact-checkers‌​‌ often find themselves re-assessing​​ a previously evaluated claim,​​​‌ which wastes precious human‌ effort. In order to‌​‌ tackle these challenges, fact-check​​ retrieval(FCR) pipelines have been​​​‌ developed that, given a‌ newly encountered claim, aim‌​‌ to identify the most​​ relevant claims among a​​​‌ set of previously assessed‌ claims. In this work,‌​‌ we leverage NLP techniques​​ over a set of​​​‌ fact-checked claims and their‌ related articles, to build‌​‌ a Knowledge Graph (KG)​​ FactCheck-KG of named entities,​​​‌ topics, claims and articles‌ with edges capturing the‌​‌ connection across different fact-checks​​ via common topics and​​​‌ named entities. This representation‌ lays the foundation for‌​‌ more context-aware, fine-grained fact-check​​ retrieval. For example, with​​​‌ the success of retrieval‌ augmented generation (RAG) and‌​‌ its extension to a​​ Graph-based retrieval(GraphRAG) framework, our​​​‌ KG can form a‌ starting point for its‌​‌ application to solve the​​ fact-check retrieval problem.

8.1.4​​​‌ Efficient and Scalable Search‌ for Statistics

Participants: Simon‌​‌ Ebel, Helena Galhardas​​, Theo Galizzi,​​​‌ Ioana Manolescu, Aurelien‌ Peden.

Informed public‌​‌ debate needs high-quality data.​​ In this context, high-quality​​​‌ statistical data sources are‌ a valuable category of‌​‌ reference information based on​​ which a claim can​​​‌ be checked. To facilitate‌ the work of journalists‌​‌ or other fact-checkers, users'​​ questions about a specific​​​‌ claim should be automatically‌ answered based on statistical‌​‌ tables. This task is​​ complicated by the large​​​‌ number, size, and variety‌ of statistical datasets. This‌​‌ work introduces the statistical​​ table discovery problem (STD,​​​‌ in short), which aims,‌ given a natural language‌​‌ question and a set​​ of statistic datasets (multidimensional​​​‌ tables), to find the‌ tables most relevant for‌​‌ the question. We then​​ describe STAR, an algorithm​​​‌ for solving the STD‌ problem. Unlike existing table‌​‌ discovery (TD) solutions aimed​​ at relational tables, STAR​​​‌ is devised specifically for‌ multidimensional ones. Further, STAR‌​‌ treats the space and​​ time dimensions of statistical​​​‌ datasets separately. We experimentally‌ show that these features,‌​‌ together, make STAR outperform​​ state-of-the-art TD systems adapted​​​‌ to the STD problem,‌ in terms of scalability,‌​‌ search quality, preprocessing and​​ question answering time. It​​​‌ has been informally presented‌ at BDA 2025 19‌​‌ and the code is​​ available at its Gitlab​​​‌ repository.

8.1.5 Structured‌ Discourse Representation for Factual‌​‌ Consistency Verification

Participants: Oana-Denisa​​ Balalau, Ioana Manolescu​​​‌, Kun Zhang.‌

Analysing the differences in‌​‌ how events are represented​​ across texts, or verifying​​​‌ whether the language model‌ generations hallucinate, requires the‌​‌ ability to systematically compare​​​‌ their content. To support​ such a comparison, a​‌ structured representation that captures​​ fine-grained information plays a​​​‌ vital role. In particular,​ identifying distinct atomic facts​‌ and the discourse relations​​ connecting them enables deeper​​​‌ semantic comparison. Our proposed​ approach combines structured discourse​‌ information extraction with a​​ classifier, FDSpotter, for factual​​​‌ consistency verification. We show​ that adversarial discourse relations​‌ pose challenges for language​​ models, but fine-tuning on​​​‌ our annotated data, DiscInfer,​ achieves competitive performance. Our​‌ proposed approach advances factual​​ consistency verification by grounding​​​‌ in linguistic structure and​ decomposing it into interpretable​‌ components. We demonstrate the​​ effectiveness of our method​​​‌ on the evaluation of​ two tasks: data-to-text generation​‌ and text summarisation. This​​ work has been published​​​‌ in ACL (Findings) 2025​ 27 and the software​‌ is available on BIL​​ 7.1.9.

8.1.6 The​​​‌ Search for Conflicts of​ Interest: Open Information Extraction​‌ in Scientific Publications

Participants:​​ Oana-Denisa Balalau, Garima​​​‌ Gaur, Ioana Manolescu​, Prajna Upadhyay.​‌

A conflict of interest​​ (COI) appears when a​​​‌ person or a company​ has two or more​‌ interests that may directly​​ conflict. This happens, for​​​‌ instance, when a scientist​ whose research is funded​‌ by a company audits​​ the same company. For​​​‌ transparency and to avoid​ undue influence, public repositories​‌ of relations of interest​​ are increasingly recommended or​​​‌ mandated in various domains,​ and can be used​‌ to avoid COIs. In​​ this work, we propose​​​‌ an LLM-based open information​ extraction (OpenIE) framework for​‌ extracting financial or other​​ types of interesting relations​​​‌ from scientific text. We​ target scientific publications in​‌ which authors declare funding​​ sources or collaborations in​​​‌ the acknowledgment section, in​ the metadata, or in​‌ the publication, following editors’​​ requirements. We introduce an​​​‌ extraction methodology and present​ a knowledge base (KB)​‌ with a comprehensive taxonomy​​ of COI centric relations.​​​‌ Finally, we perform a​ comparative study of disclosures​‌ of two journals in​​ the field of toxicology​​​‌ and pharmacology. The work​ has been published in​‌ EMNLP (Findings) 2025 20​​ and the software is​​​‌ available on BIL 7.1.10​.

8.2 Online targeted​‌ advertising

Participants: Ines Abdelaziz​​, Nardjes Amieur,​​​‌ Abir Benzaamia, Salim​ Chouaki, Asmaa El​‌ Fraihi, Oana Goga​​.

8.2.1 A Year​​​‌ Under the DSA: Ad​ Transparency's Uneven Landscape

The​‌ Digital Services Act (DSA)​​ has put platform accountability​​​‌ on center stage, requiring​ online platforms to provide​‌ greater transparency into how​​ advertisements are targeted and​​​‌ delivered to users. Central​ to these obligations are​‌ two mechanisms: user-facing ad​​ explanations, which inform individuals​​​‌ why they were shown​ a given ad, and​‌ public ad repositories, which​​ are intended to enable​​​‌ independent auditing of advertising​ practices. This study provides​‌ the first multi-platform evaluation​​ of these two mechanisms​​​‌ across Facebook, Instagram, YouTube​ and X. Using 48,511​‌ user-facing “Why am I​​ seeing this ad?” (WAIST)​​​‌ notices, and a systematic​ analysis of each platform's​‌ public ad repository, we​​ assess how well current​​​‌ implementations disclose the parameters​ and decision processes involved​‌ in targeting. To do​​ so, we develop and​​ apply an operational framework​​​‌ based on Articles 26‌ and 39 of the‌​‌ DSA—capturing the granularity, attribution​​ of targeting and delivery​​​‌ choices, data source disclosures,‌ and accuracy—and apply it‌​‌ across both user-facing notices​​ and public ad repositories.​​​‌ Our findings show that‌ transparency remains fragmented and‌​‌ inconsistent across platforms. User-facing​​ explanations vary widely in​​​‌ precision and often omit‌ key targeting information, while‌​‌ repositories provide incomplete, misattributed,​​ and at times difficult-to-interpret​​​‌ targeting data. Moreover, discrepancies‌ between explanations and repository‌​‌ entries undermine the reliability​​ of both mechanisms. Overall,​​​‌ current transparency infrastructures fall‌ short of the DSA's‌​‌ expectations and highlight the​​ need for clearer and​​​‌ more enforceable standards for‌ advertising transparency moving forward.‌​‌ It has been accepted​​ for publication in PETs/PoPETs​​​‌ 2026.

8.2.2 A Comparative‌ Study of News Exposure‌​‌ and Consumption On and​​ Off Facebook.

Social media​​​‌ giants like Meta, Google,‌ and X leverage powerful‌​‌ algorithms to personalize user​​ feeds, a practice now​​​‌ under intense public scrutiny.‌ These algorithms can inadvertently‌​‌ skew the information users​​ consume, potentially influencing political​​​‌ opinions and voting decisions.‌ This raises critical questions:‌​‌ Do social media platforms​​ foster misinformation and contribute​​​‌ to echo chambers? To‌ address this ongoing debate,‌​‌ our study directly compares​​ news exposure on Facebook​​​‌ (where algorithmic influence is‌ strong) with news consumption‌​‌ off-platform (where user behavior​​ plays a larger role).​​​‌ Specifically, we investigate: (1)‌ Are users exposed to‌​‌ more/less misinformation on Facebook​​ compared with their off-platform​​​‌ misinformation consumption? (2) Is‌ news exposure on Facebook‌​‌ more/less diverse than off-platform​​ news consumption? (3) To​​​‌ what extent do socio-demographic‌ and psychological factors influence‌​‌ misinformation exposure on Facebook​​ and consumption off Facebook?​​​‌ (4) Is there a‌ relationship between socio-demographic and‌​‌ psychological factors and news​​ diversity on and off​​​‌ Facebook? and (5) Is‌ users' exposure to misinformation‌​‌ on Facebook correlated to​​ off-platform news consumption?

Our​​​‌ study of 123,995 news-related‌ posts on Facebook and‌​‌ 70,587 news articles visits​​ off Facebook, collected from​​​‌ 642 users during 12‌ weeks, reveals the following‌​‌ central findings: (1) Only​​ a small fraction 4%​​​‌ of users' news consumption‌ off Facebook is driven‌​‌ by news exposure on​​ Facebook, and only 5.7%​​​‌ of misinformation consumption off‌ Facebook is driven by‌​‌ news exposure on Facebook.​​ (2) There is a​​​‌ higher prevalence of misinformation‌ in user-received content on‌​‌ Facebook compared to deliberately​​ consumed content off-platform. On​​​‌ Facebook, 5.9% of our‌ users' news exposure comes‌​‌ from sources known for​​ spreading misinformation, while off-platform,​​​‌ only 2.6% of our‌ users' news consumption is‌​‌ from misinformation sources. Conversely,​​ Facebook presents more diverse​​​‌ content - 22% of‌ users received content from‌​‌ only one political leaning​​ on Facebook, compared to​​​‌ 36% of users who‌ consumed content from only‌​‌ one political leaning off-platform.​​ (3) Several socio-demographic and​​​‌ psychological factors showed a‌ statistically significant correlation with‌​‌ misinformation exposure on Facebook​​ but not misinformation consumption​​​‌ off Facebook. (4) The‌ proportion of misinformation consumed‌​‌ off Facebook emerged as​​ a statistically significant predictor​​​‌ of users' exposure to‌ misinformation on Facebook, independent‌​‌ of news consumption on​​​‌ Facebook.

This work has​ been published in CSCW​‌ 2025 15.

8.2.3​​ Privacy Settings and Ad​​​‌ Perception: The Shift from​ Third-Party Cookies to the​‌ Privacy Sandbox

Online behavioral​​ advertising, heavily reliant on​​​‌ privacy-invasive third-party cookie tracking,​ faces a significant shift​‌ as browsers like Safari,​​ Brave, and Firefox have​​​‌ already deprecated them. Google​ Chrome announced its parallel​‌ move with the "Privacy​​ Sandbox Initiative" in 2019,​​​‌ proposing privacy-preserving advertising mechanisms.​ The extent to which​‌ Privacy Sandbox can deliver​​ comparable ad relevance and​​​‌ purchase intent to the​ established third-party cookie ecosystem​‌ will likely determine its​​ adoption as a widespread​​​‌ alternative. This paper presents​ the first user study​‌ evaluating the impact of​​ Privacy Sandbox APIs on​​​‌ ad perception. Our findings​ show that users perceive​‌ Privacy Sandbox ads as​​ less relevant and exhibit​​​‌ lower purchase intent compared​ to third-party cookie–based ads,​‌ without a corresponding increase​​ in perceived privacy protection.​​​‌ These results contribute to​ the ongoing assessment of​‌ Privacy Sandbox as an​​ alternative to third-party cookies.​​​‌

8.2.4 Is Contextual Advertising​ Safe? Analyzing Systemic Risks​‌ with Ads on YouTube.​​

Contextual advertising is seeing​​​‌ a resurgence in popularity​ as a privacy-preserving alternative​‌ to behavioral targeting. While​​ often regarded as a​​​‌ coarse-grained approach, advances in​ AI-driven content analysis have​‌ transformed it into a​​ highly granular form of​​​‌ targeting.This work examines the​ safety risks of contextual​‌ targeting through a two-part​​ empirical study, analyzing its​​​‌ potential to enable targeting​ of audiences with sensitive​‌ attributes and exposing users​​ to harmful or exploitative​​​‌ ads. In controlled ad​ experiments, we show that​‌ advertisers can target audiences​​ defined by sensitive attributes​​​‌ (e.g., religious belief, mental​ health condition, and political​‌ ideology) by strategically selecting​​ contextual placements—circumventing policies that​​​‌ prohibit such targeting through​ behavioral signals. To understand​‌ how this risk manifests​​ in practice, we develop​​​‌ an automated measurement framework​ to collect contextual ads​‌ delivered on high-risk content​​ environments, focusing on conspiracy​​​‌ videos. We find that​ contextual ads are highly​‌ prevalent in these environments,​​ disproportionately deliver sensitive categories​​​‌ (e.g., alternative health, religion,​ and political), and lack​‌ transparency. We argue that​​ contextual ad systems require​​​‌ deeper empirical scrutiny and​ robust transparency mechanisms to​‌ prevent exploitation and abuse,​​ and regulators should extend​​​‌ behavioral advertising risk principles​ to the contextual domain.​‌

8.2.5 A Framework for​​ Auditing Ad Delivery Responsiveness​​​‌ to Psychological Traits

Online​ advertising platforms increasingly personalize​‌ ad delivery using users'​​ behavioral signals, even when​​​‌ advertisers cannot explicitly target​ many underlying user characteristics.​‌ Auditing delivery skews for​​ traits that are latent,​​​‌ complex, or not directly​ targetable through advertiser-facing tools​‌ remains challenging. We propose​​ an experimental framework for​​​‌ auditing ad delivery across​ latent traits by constructing​‌ trait-defined audiences and examining​​ how delivery systems allocate​​​‌ ads to these audiences​ under controlled competitive conditions.​‌ We demonstrate this framework​​ on Meta's advertising platform​​​‌ using extraversion as a​ case study. We construct​‌ trait-based audiences using two​​ approaches: psychometric assessment combined​​​‌ with tracking-based retargeting, and​ behavioral profiling based on​‌ on-platform engagement. Under controlled​​ delivery conditions, we examine​​ how the platform allocates​​​‌ personality-aligned and misaligned ads‌ across these audiences. We‌​‌ find a statistically significant​​ alignment effect in ad​​​‌ delivery: ads are more‌ likely to be delivered‌​‌ when their framing matches​​ the personality of the​​​‌ target audience (β‌=0.40‌​‌,p<0​​.001). This​​​‌ effect is strongest in‌ behaviorally profiled segments, where‌​‌ misaligned ads also exhibit​​ reduced reach relative to​​​‌ aligned ads. Our framework‌ provides a general approach‌​‌ for auditing ad delivery​​ behavior and personalization dynamics​​​‌ driven by latent user‌ traits.

8.2.6 How Persuasive‌​‌ Are LLMs in the​​ Wild? Assessing Personalized Ads​​​‌ in Real-World Delivery

Large‌ language models (LLMs) have‌​‌ demonstrated persuasive potential in​​ controlled experiments and survey-based​​​‌ studies across commercial, political,‌ and social domains. However,‌​‌ their effectiveness in real-world​​ communication environments remains largely​​​‌ unexplored. This work addresses‌ this gap by evaluating‌​‌ LLM-generated personalized messages deployed​​ in controlled advertising experiments​​​‌ on Meta platforms. We‌ assess effectiveness along three‌​‌ complementary dimensions: (1) behavioral​​ user engagement measured through​​​‌ field experiments, (2) perceived‌ appeal captured via user‌​‌ surveys, and (3) platform-level​​ dynamics analyzed through algorithmic​​​‌ ad delivery patterns. Our‌ results show that LLM-based‌​‌ personalized messages do not​​ significantly improve user engagement​​​‌ compared to non-personalized messages.‌ We also show that‌​‌ user perceptions—measured through surveys—can​​ diverge significantly from observed​​​‌ behavioral outcomes online. This‌ highlights the limitations of‌​‌ relying on survey-based evaluations​​ alone to assess the​​​‌ persuasive capabilities of LLMs.‌ Finally, we show that‌​‌ LLM-generated personalization can influence​​ platform ad delivery—shifting impressions​​​‌ toward the intended audience‌ by up to 8%‌​‌ even without explicit targeting​​ instructions. These effects are​​​‌ often constrained by the‌ platform's relevance predictions, which‌​‌ may override the cues​​ embedded in the message.​​​‌ Together, these findings provide‌ a comprehensive real-world audit‌​‌ for the effectiveness and​​ limits of LLM-based persuasion​​​‌ in the wild. It‌ has been accepted for‌​‌ publication in AAAI ICWSM​​ 2026.

8.3 Bias and​​​‌ issues in LLMs and‌ Benchmarks

Participants: Oana-Denisa Balalau‌​‌, Tom Calamai,​​ Chadi Helwe.

8.3.1​​​‌ Navigating the Political Compass:‌ Evaluating Multilingual LLMs across‌​‌ Languages and Nationalities

Large​​ Language Models (LLMs) have​​​‌ become ubiquitous in today's‌ technological landscape, boasting a‌​‌ plethora of applications, and​​ even endangering human jobs​​​‌ in complex and creative‌ fields. One such field‌​‌ is journalism: LLMs are​​ being used for summarization,​​​‌ generation and even fact-checking.‌ However, in today's political‌​‌ landscape, LLMs could accentuate​​ tensions if they exhibit​​​‌ political bias. In this‌ work, we evaluate the‌​‌ political bias of the​​ most used 15 multilingual​​​‌ LLMs via the Political‌ Compass Test. We test‌​‌ different scenarios, where we​​ vary the language of​​​‌ the prompt, while also‌ assigning a nationality to‌​‌ the model. We evaluate​​ models on the 50​​​‌ most populous countries and‌ their official languages. Our‌​‌ results indicate that language​​ has a strong influence​​​‌ on the political ideology‌ displayed by a model.‌​‌ In addition, smaller models​​ tend to display a​​​‌ more stable political ideology,‌ i.e. ideology that is‌​‌ less affected by variations​​​‌ in the prompt. The​ work has been published​‌ in ACL (Findings) 2025​​ 21 and the tool​​​‌ is available on BIL​ 7.1.12.

8.3.2 Benchmarking​‌ the Benchmarks: Reproducing Climate-Related​​ NLP Tasks

Significant efforts​​​‌ have been made in​ the NLP community to​‌ facilitate the automatic analysis​​ of climate-related corpora by​​​‌ tasks such as climate-related​ topic detection, climate risk​‌ classification, question answering over​​ climate topics, and many​​​‌ more. In this work,​ we perform a reproducibility​‌ study on 8 tasks​​ and 29 datasets, testing​​​‌ 6 models. We find​ that many tasks rely​‌ heavily on surface-level keyword​​ patterns rather than deeper​​​‌ semantic or contextual understanding.​ Moreover, we find that​‌ 96% of the datasets​​ contain annotation issues, with​​​‌ 16.6% of the sampled​ wrong predictions of a​‌ zero-shot classifier being actually​​ clear annotation mistakes, and​​​‌ 38.8% being ambiguous examples.These​ results call into question​‌ the reliability of current​​ benchmarks to meaningfully compare​​​‌ models and highlight the​ need for improved annotation​‌ practices. We conclude by​​ outlining actionable recommendations to​​​‌ enhance dataset quality and​ evaluation robustness. The work​‌ has been published in​​ ACL (Findings) 2025 18​​​‌ and the tool is​ available on BIL 7.1.11​‌.

8.4 Efficient Big​​ Data analytics

8.4.1 Graph​​​‌ Transformers for Query Plan​ Representation: Potentials and Challenges​‌

Participants: Yanlei Diao,​​ Guillaume Lachaud, Gabriel​​​‌ Lozano Pinzon, Chenghao​ Lyu.

Query Plan​‌ Representation (QPR) is central​​ to workload modeling, with​​​‌ various deep-learning based architectures​ proposed in the literature.​‌ Our work is motivated​​ by two key observations:​​​‌ (i) the research community​ still lacks clarity on​‌ which model, if any,​​ best suits the QPR​​​‌ problem; and (ii) while​ transformers have revolutionized many​‌ fields, their potential for​​ QPR remains largely underexplored.​​​‌ This study examines the​ strengths and challenges of​‌ Graph Transformers for QPR.​​ We introduce a new​​​‌ taxonomy that unifies deep-learning​ based QPR techniques along​‌ key design axes. Our​​ benchmark analysis of common​​​‌ QPR architectures reveals that​ Graph Transformer Networks (GTNs)​‌ consistently outperform alternatives, but​​ can degrade under limited​​​‌ training data. To address​ this, we propose novel​‌ data augmentation techniques to​​ enhance training diversity and​​​‌ refine GTN architectures by​ replacing ineffective language-model-inspired components​‌ with techniques better suited​​ for query plans. Evaluation​​​‌ on JOB, TPC-H, and​ TPC-DS benchmarks shows that​‌ with sufficient training data,​​ enhanced GTNs outperform existing​​​‌ models for capturing complex​ queries (JOB Full and​‌ TPC-DS) and enable the​​ query embedder trained on​​​‌ TPC-DS to generalize to​ TPC-H queries out of​‌ the box. The work​​ has been accepted in​​​‌ VLDB 2026.

8.4.2 Unsupervised​ Anomaly Detection in Multivariate​‌ Time Series across Heterogeneous​​ Domains

Participants: Yanlei Diao​​​‌, Vincent Jacob.​

The widespread adoption of​‌ digital services, along with​​ the scale and complexity​​​‌ at which they operate,​ has made incidents in​‌ IT operations increasingly more​​ likely, diverse, and impactful.​​​‌ This has led to​ the rapid development of​‌ a central aspect of​​ "Artificial Intelligence for IT​​​‌ Operations" (AIOps), focusing on​ detecting anomalies in vast​‌ amounts of multivariate time​​ series data generated by​​ service entities. In this​​​‌ paper, we begin by‌ introducing a unifying framework‌​‌ for benchmarking unsupervised anomaly​​ detection (AD) methods, and​​​‌ highlight the problem of‌ shifts in normal behaviors‌​‌ that can occur in​​ practical AIOps scenarios. To​​​‌ tackle anomaly detection under‌ domain shift, we then‌​‌ cast the problem in​​ the framework of domain​​​‌ generalization and propose a‌ novel approach, Domain-Invariant VAE‌​‌ for Anomaly Detection (DIVAD),​​ to learn domain-invariant representations​​​‌ for unsupervised anomaly detection.‌ Our evaluation results using‌​‌ the Exathlon benchmark show​​ that the two main​​​‌ DIVAD variants significantly outperform‌ the best unsupervised AD‌​‌ method in maximum performance,​​ with 20% and 15%​​​‌ improvements in maximum peak‌ F1-scores, respectively. Evaluation using‌​‌ the Application Server Dataset​​ further demonstrates the broader​​​‌ applicability of our domain‌ generalization methods. The work‌​‌ has been published in​​ VLDB 2025 22.​​​‌

8.4.3 Transactional Stateful Functions‌ on Streaming Dataflows

Participants:‌​‌ Georgios Siachamis.

Developing​​ stateful cloud applications, such​​​‌ as low-latency workflows and‌ microservices with strict consistency‌​‌ requirements, remains arduous for​​ programmers. The Stateful Functions-as-a-Service​​​‌ (SFaaS) paradigm aims to‌ serve these use cases.‌​‌ However, existing approaches provide​​ weak transactional guarantees or​​​‌ perform expensive external state‌ accesses requiring inefficient transactional‌​‌ protocols that increase execution​​ latency. In this paper,​​​‌ we present Styx, a‌ novel dataflow-based SFaaS runtime‌​‌ that executes serializable transactions​​ consisting of stateful functions​​​‌ that form arbitrary call-graphs‌ with exactly-once guarantees. Styx‌​‌ extends a deterministic transactional​​ protocol by contributing: i)​​​‌ a function acknowledgment scheme‌ to determine transaction boundaries‌​‌ required in SFaaS workloads,​​ ii) a function-execution caching​​​‌ mechanism, and iii) an‌ early commit-reply mechanism that‌​‌ substantially reduces transaction execution​​ latency. Experiments with the​​​‌ YCSB, TPC-C, and Deathstar‌ benchmarks show that Styx‌​‌ outperforms state-of-the-art approaches by​​ achieving at least one​​​‌ order of magnitude higher‌ throughput while exhibiting near-linear‌​‌ scalability and low latency.​​ This work has been​​​‌ published in SIGMOD 2025‌ 24 and demonstrated in‌​‌ VLDB 2025 25.​​

8.4.4 Dynamic Graph Databases​​​‌ with Out-of-order Updates

Participants:‌ Muhammad Khan, Ioana‌​‌ Manolescu.

Dynamic graphs​​ are omnipresent in real-time​​​‌ applications that generate massive‌ amounts of data. We‌​‌ consider dynamic graphs, where​​ edges are continuously added​​​‌ and deleted to a‌ single graph, from multiple‌​‌ update streams. The dynamic​​ graphs are stored in​​​‌ a transactional graph database.‌ Each edge update or‌​‌ deletion carries a source​​ (stream) time ST​​​‌, assigned at the‌ moment when it was‌​‌ emitted, and an arrival​​ (or transaction) time W​​​‌T , assigned when‌ the graph database receives‌​‌ it. Updates may be​​ received at the database​​​‌ out-of-order (ooo, in short):‌ due to different latencies‌​‌ on the propagation paths​​ between the data source​​​‌ and the database. We‌ proposed HAL, a novel‌​‌ in-memory dynamic graph database​​ design, addressing these challenges.​​​‌ HAL outperforms comparable systems‌ by a factor of‌​‌ up to 73×​​ in terms of update​​​‌ processing throughput and up‌ to 357× for‌​‌ analytics, while being the​​ first to support out-of-order​​​‌ updates. We have also‌ extended it with support‌​‌ for node and edge​​​‌ properties, and for historical​ queries, whereas queries should​‌ be evaluated over the​​ graph such as it​​​‌ was at a specific​ moment in the past.​‌ This work has been​​ accepted in VLDB 2025​​​‌ 12, VLDB 2025​ Large-Scale Graph Data Analytics​‌ (LSGDA) workshop 16 and​​ demonstrated in SIGMOD 2025​​​‌ 17. The code​ is available on Gitlab​‌ (code).

Participants: Ioana​​ Manolescu, Oana Balalau​​​‌, Yanlei Diao,​ Ghufran Khan, Maxime​‌ Buron, Hritika Kathuria​​, Georgios Siachamis.​​​‌

9 Bilateral contracts and​ grants with industry

9.1​‌ Bilateral contracts with industry​​

The collaborative contract with​​​‌ RadioFrance in which Oana-Denisa​ Balalau and Ioana Manolescu​‌ Goujot participate has ended.​​ We have successfully transferred​​​‌ the StatCheck software to​ our RadioFrance partner.

The​‌ collaborative contract with Amundi​​ led by Oana-Denisa Balalau​​​‌ for the CIFRE project​ has ended, the PhD​‌ student will defend his​​ PhD in 2026.

9.2​​​‌ Bilateral Grants with Industry​

Ioana Manolescu Goujot is​‌ involved in the BPI-funded​​ project CodeCommons, in collaboration​​​‌ with the Software Heritage​ Foundation (SWF). We work​‌ to generalize, enlarge, and​​ enable the efficient processing​​​‌ of the world's largest​ repository of free software.​‌ The end of the​​ PhD of Muhammad Khan​​​‌ contributed to the project.​

Ioana Manolescu Goujot ,​‌ Georgios Siachamis and Hritika​​ Kathuria have been involved​​​‌ in the BPI-funded project​ DXP (Data Exchange Project),​‌ with Amadeus, the international​​ tourism services operator. We​​​‌ participate in this project​ in collaboration with Maxime​‌ Buron, former team member,​​ now an Assistant Professor​​​‌ at UCA. Our contribution​ here is to devise​‌ an architecture for decentralized,​​ access-controled data sharing, allowing​​​‌ tourism service providers and​ clients to exchange their​‌ information via Amadeus' platform.​​

Participants: Ioana Manolescu Goujot​​​‌, Oana-Denisa Balalau,​ Oana Goga, Madhulika​‌ Mohanty, Garima Gaur​​, Yanlei Diao,​​​‌ Muhammad Khan, Maxime​ Buron, Hritika Kathuria​‌, Georgios Siachamis.​​

10 Partnerships and cooperations​​​‌

10.1 International initiatives

10.1.1​ Associate Teams in the​‌ framework of an Inria​​ International Lab or in​​​‌ the framework of an​ Inria International Program

MediumAI​‌
  • Title:
    Responsible AI for​​ Journalism
  • Duration:
    2024 -​​​‌ 2026
  • Coordinator:
    Davide Ceolin​ (Davide.Ceolin@cwi.nl)
  • Partners:
    • CWI Amsterdam​‌ (Pays-Bas)
  • Inria contact:
    Oana-Denisa​​ Balalau
  • Summary:
    From recommender​​​‌ systems to large language​ models, data-driven AI tools​‌ have shown different forms​​ of limitations and bias.​​​‌ Bias in AI tools​ may stem from multiple​‌ factors, including bias in​​ the input data the​​​‌ AI tools are trained​ on, the algorithm and​‌ the individuals responsible for​​ designing the AI tools,​​​‌ and bias in the​ evaluation and interpretation of​‌ AI tool outputs. Limitations​​ are due to technical​​​‌ difficulties in achieving specific​ tasks. Media outlets use​‌ different algorithmic aids in​​ their workflow: keyword extraction,​​​‌ entities and relations extractions,​ event extraction, sentiment analysis,​‌ automatic summarization, newsworthy story​​ detection, semi-automatic production of​​​‌ news using text generation​ models, and search, among​‌ others. Given the importance​​ of the media sector​​​‌ for our democracies, shortcomings​ in the tools they​‌ use could have severe​​ consequences. Both Inria and​​ CWI have partnerships with​​​‌ large media groups and‌ can help them address‌​‌ bias and limitations in​​ their AI workflows.

10.2​​​‌ International research visitors

10.2.1‌ Visits of international scientists‌​‌

Other international visits to​​ the team
Benjamin Ocampo​​​‌
  • Status
    PhD
  • Institution of‌ origin:
    Human-Centered Data Analytics‌​‌ team, University of Amsterdam​​
  • Country:
    Netherlands
  • Dates:
    October​​​‌ 13-17, 2025
  • Context of‌ the visit:
    Associated team‌​‌ MediumAI
  • Mobility program/type of​​ mobility:
    research stay
Davide​​​‌ Ceolin
  • Status
    researcher
  • Institution‌ of origin:
    Human-Centered Data‌​‌ Analytics team, CWI
  • Country:​​
    Netherlands
  • Dates:
    October 16-17,​​​‌ 2025
  • Context of the‌ visit:
    Associated team MediumAI‌​‌
  • Mobility program/type of mobility:​​
    research stay
Mae Sosto​​​‌
  • Status
    post-doc
  • Institution of‌ origin:
    Human-Centered Data Analytics‌​‌ team, CWI
  • Country:
    Netherlands​​
  • Dates:
    November 27-December 03,​​​‌ 2025
  • Context of the‌ visit:
    Associated team MediumAI‌​‌
  • Mobility program/type of mobility:​​
    research stay

10.2.2 Visits​​​‌ to international teams

Research‌ stays abroad
persTomCalamai
  • Visited‌​‌ institution:
    CWI, Amsterdam
  • Country:​​
    Netherlands
  • Context of the​​​‌ visit:
    Associated team MediumAI‌
  • Mobility program/type of mobility:‌​‌
    research stay

10.3 European​​ initiatives

10.3.1 Horizon Europe​​​‌

ELIAS

ELIAS project on‌ cordis.europa.eu

  • Title:
    European Lighthouse‌​‌ of AI for Sustainability​​
  • Duration:
    From September 1,​​​‌ 2023 to August 31,‌ 2027
  • Partners:
    • ECOLE POLYTECHNIQUE‌​‌ (EP), France
    • INSTITUT NATIONAL​​ DE RECHERCHE EN INFORMATIQUE​​​‌ ET AUTOMATIQUE (INRIA), France‌
    • ROBERT BOSCH KFT, Hungary‌​‌
    • BITDEFENDER SRL (Bitdefender), Romania​​
    • ETHNIKO KENTRO EREVNAS KAI​​​‌ TECHNOLOGIKIS ANAPTYXIS (CENTRE FOR‌ RESEARCH AND TECHNOLOGY HELLAS‌​‌ CERTH), Greece
    • THE UNIVERSITY​​ OF MANCHESTER (UNIVERSITY OF​​​‌ MANCHESTER), United Kingdom
    • ROBERT‌ BOSCH GMBH (BOSCH), Germany‌​‌
    • INSTITUT JOZEF STEFAN (JSI),​​ Slovenia
    • INSTITUT POLYTECHNIQUE DE​​​‌ PARIS, France
    • UNIVERSITAT DE‌ VALENCIA (UVEG), Spain
    • PROMETEIA‌​‌ SOCIETA PER AZIONI (Prometeia),​​ Italy
    • IBM IRELAND LIMITED,​​​‌ Ireland
    • KOBENHAVNS UNIVERSITET (UCPH),‌ Denmark
    • AALTO KORKEAKOULUSAATIO SR‌​‌ (AALTO), Finland
    • IDEAS NCBR​​ SP Z O.O., Poland​​​‌
    • UMEA UNIVERSITET, Sweden
    • INSTITUT‌ MINES-TELECOM, France
    • FONDAZIONE ISTITUTO‌​‌ ITALIANO DI TECNOLOGIA (IIT),​​ Italy
    • FONDATION DE L'INSTITUT​​​‌ DE RECHERCHE IDIAP (IDIAP),‌ Switzerland
    • UNIVERSITATEA NATIONALA DE‌​‌ STIINTASI TEHNOLOGIE POLITEHNICA BUCURESTI​​ (NATIONAL UNIVERSITY OF SCIENCE​​​‌ ANDTECHNOLOGY POLITEHNICA BUCHAREST), Romania‌
    • EIDGENOESSISCHE TECHNISCHE HOCHSCHULE ZUERICH‌​‌ (ETH Zürich), Switzerland
    • CESKE​​ VYSOKE UCENI TECHNICKE V​​​‌ PRAZE (CVUT), Czechia
    • FUNDACION‌ DE LA COMUNITAT VALENCIANA‌​‌ UNIDAD ELLIS ALICANTE, Spain​​
    • FONDAZIONE BRUNO KESSLER (FBK),​​​‌ Italy
    • POLITECNICO DI MILANO‌ (POLIMI), Italy
    • LA COMMUNAUTE‌​‌ D UNIVERSITES ET ETABLISSEMENTS​​ DE TOULOUSE (LA COMMUNAUTE​​​‌ D UNIVERSITES ET ETABLISSEMENTS‌ DE TOULOUSE), France
    • UNIVERSITA‌​‌ DEGLI STUDI DI TRENTO​​ (UNITN), Italy
    • UNIVERSITA DEGLI​​​‌ STUDI DI MILANO (UMIL),‌ Italy
    • HASSO-PLATTNER-INSTITUT FUR DIGITAL‌​‌ ENGINEERING GGMBH (HPI), Germany​​
    • ENGINEERING - INGEGNERIA INFORMATICA​​​‌ SPA (ENG), Italy
    • EBERHARD‌ KARLS UNIVERSITAET TUEBINGEN (UT),‌​‌ Germany
    • UNIVERSITA DEGLI STUDI​​ DI GENOVA (UNIGE), Italy​​​‌
    • MAX-PLANCK-GESELLSCHAFT ZUR FORDERUNG DER‌ WISSENSCHAFTEN EV (MPG), Germany‌​‌
    • UNIVERSITA DEGLI STUDI DI​​ MODENA E REGGIO EMILIA​​​‌ (UNIMORE), Italy
    • UNIVERSITEIT VAN‌ AMSTERDAM (UvA), Netherlands
  • Inria‌​‌ contact:
    Ioana Manolescu
  • Coordinator:​​
  • Summary:

    We live in​​​‌ a crucial historical moment,‌ with tremendous challenges ahead,‌​‌ from climate change to​​ the energy crisis. ELIAS​​​‌ emerges from the belief‌ that AI will be‌​‌ a key discipline to​​ help us tackle these​​​‌ challenges. At the same‌ time, the development of‌​‌ AI entails deep ethical​​​‌ and societal concerns that​ need to be addressed.​‌ As for fundamental research,​​ ELIAS will address key​​​‌ scientific questions about how​ AI can reduce computational​‌ costs, serves to model​​ effects of policy decisions​​​‌ on society, and impacts​ individuals. ELIAS will strive​‌ for a deep integration​​ of the fundamental research​​​‌ that takes place in​ academia and the more​‌ applications-focused research from industry.​​

    ELIAS builds on and​​​‌ expands the highly successful​ and internationally recognized European​‌

    Laboratory for Learning and​​ Intelligent Systems (ELLIS). ELIAS​​​‌ will further develop the​ excellence criteria and the​‌ pillars in ELLIS and​​ implement actions that will​​​‌ support AI researchers and​ young talents at different​‌ stages of their careers.​​ Furthermore, ELIAS will develop​​​‌ a Sciencentrepreneurship track, with​ the purpose of attracting​‌ and empowering talents at​​ the interface of scientific​​​‌ innovation and business and​ establish original AI solutions​‌ that move towards a​​ sustainable long-term future for​​​‌ our planet, contribute to​ a cohesive society, and​‌ respect individual rights.

    The​​ outcome of ELIAS will​​​‌ be to establish Europe​ as a leader in​‌ AI research in which​​ impact on the environment,​​​‌ society and the individual​ are integral considerations during​‌ development. We will measure​​ the success of this​​​‌ endeavor in terms of​ key indicators, including the​‌ number of new cross-institutional​​ collaborations, the number of​​​‌ cross-disciplinary collaborations, the number​ of industry-academic partnerships, publications​‌ in top conferences and​​ journals, patents, and the​​​‌ number of projects that​ have resulted in deployed​‌ technologies.

10.3.2 H2020 projects​​

Ioana Manolescu Goujot is​​​‌ the local PI for​ the Inria partner in​‌ the project "ELIAS -​​ European Lighthouse of AI​​​‌ for Sustainability" (2,800,000 euros).​ Madhulika Mohanty and Garima​‌ Gaur have also been​​ strongly involved.

Yanlei Diao​​​‌ has been awarded the​ ERC Grant - ERC​‌ Proof of Concept -​​ on ExplainableAD: Explainable Anomaly​​​‌ Detection for Safeguarding and​ Enhancing Modern Data Industry.​‌

10.4 National initiatives

10.4.1​​ ANR

  • Oana Goga is​​​‌ the local PI for​ LIX partner - ANR​‌ PRC 2022 - 2026​​ “FeedingBias: A multi-platform mixed-methods​​​‌ approach to news exposure​ on social media” (our​‌ part: 128,000 euros)
  • Oana​​ Goga is the local​​​‌ PI for LIX partner​ - ANR PRCE 2021​‌ - 2025 “PROPEOS: Privacy-oriented​​ Personalization of Online Services”​​​‌ (our part: 202,720 euros)​
  • The project "TopOL (Top​‌ of the Lake): discovery​​ and exploitation of heterogeneous​​​‌ data lakes through graph​ models", coordinated by Ioana​‌ Manolescu Goujot , has​​ been funded by the​​​‌ ANR. The project is​ a collaboration with U.​‌ Paris Saclay, U. Paris​​ Dauphine, U. Blois and​​​‌ U. Tours; the International​ Consortium of Investigative Journalism​‌ (ICIJ) is a non-funded​​ partner. Madhulika Mohanty also​​​‌ participates and is a​ Work Package co-leader.

10.5​‌ Regional initiatives

Ioana Manolescu​​ Goujot has been awarded​​​‌ a Fellowship of the​ Hi!Paris AI Cluster "PREDIAL:​‌ AI Data Dialogs for​​ the Press".

Yanlei Diao​​​‌ has been awarded an​ AAP Premat IP Paris​‌ 2025.

11 Dissemination

11.1​​ Promoting scientific activities

Chair​​​‌ of conference program committees​

Ioana Manolescu Goujot was​‌ the Demonstration chair at​​ EDBT 2025.

Madhulika Mohanty​​ was the demonstration chair​​​‌ of at the French‌ data base conference, BDA‌​‌ 2025.

Member of the​​ conference program committees

The​​​‌ team members have been‌ part of the following‌​‌ program committees:

  • Ioana Manolescu​​ Goujot : ACL Rolling​​​‌ Review 2025, IEEE ICDE‌ 2025, ACM PACMMOD (formerly‌​‌ SIGMOD) 2025, BDA 2025​​
  • Oana-Denisa Balalau : ACL​​​‌ Rolling Review February 2025‌
  • Madhulika Mohanty : VLDB‌​‌ 2025, ICDE 2025, EDBT​​ 2025 (Demo), ICDE 2025​​​‌ (Demo), VLDB 2025 (Demo),‌ CODS 2025, CMLS Workshop‌​‌ in ER 2025
  • Garima​​ Gaur : CIKM 2025,​​​‌ BDA 2025, CMLS Workshop‌ in ER 2025
  • Georgios‌​‌ Siachamis : ICDE 2025,​​ EDBT 2025 (Demo), DEBS​​​‌ 2025

11.1.1 Journal

Member‌ of the editorial boards‌​‌

Ioana Manolescu Goujot served​​ as an Associate Editor​​​‌ for PVLDB 2025.

Reviewer‌ - reviewing activities

Madhulika‌​‌ Mohanty reviewed for Transactions​​ on Graph Data and​​​‌ Knowledge (TGDK) and Georgios‌ Siachamis reviewed for the‌​‌ VLDB Journal (VLDBJ).

11.1.2​​ Invited talks

Ioana Manolescu​​​‌ Goujot delivered a keynote‌ at AFIA (French AI‌​‌ Reseach Association) workshop “Perspectives​​ et Défis de l'IA”​​​‌ on « Désinformation, Démocratie‌ et IA », June‌​‌ 10, 2025 (link​​).

Oana-Denisa Balalau delivered​​​‌ a talk at ESSEC‌ in the workshop Comprendre‌​‌ et Changer le Monde​​ (CCM), titled “Improving the​​​‌ quality of public debate‌ with AI”.

Madhulika Mohanty‌​‌ delivered the following talks:​​

  • “Intelligence Artificielle: un outil​​​‌ au service de l'investigation”‌ at VIGINUM in May‌​‌ 2025.
  • “Effective Exploration of​​ Graph-Structured Data” at LHC​​​‌ and IDIA Days 2025‌ in June 2025.

Tom‌​‌ Calamai delivered a workshop​​ on “les applications de​​​‌ l'IA pour l'investissement responsable”‌ organised by the FIR‌​‌ (forum pour l'investissement responsable)​​ (link)

11.1.3​​​‌ Leadership within the scientific‌ community

Ioana Manolescu Goujot‌​‌ has been the president​​ of the informal French​​​‌ Data Management Association (BDA).‌

11.1.4 Research administration

Ioana‌​‌ Manolescu Goujot represents Inria​​ in the Comité Operationnel​​​‌ of Hi!Paris, an AI‌ Pole of Excellency comprising‌​‌ IP Paris and HEC.​​ She is also an​​​‌ elected member of IP‌ Paris' Comité Académique and‌​‌ serves on its Scientific​​ Committee.

11.2 Teaching -​​​‌ Supervision - Juries -‌ Educational and pedagogical outreach‌​‌

11.2.1 Teaching

Ioana Manolescu​​ Goujot is a part-time​​​‌ professor (50%) at Ecole‌ Polytechnique. She taught:

  • Courses,‌​‌ labs and TDs in​​ CSC_51053_EP (Database Management Systems);​​​‌
  • She is in charge‌ of the M1 Internship‌​‌ program in Artificial Intelligence​​ and Data Science (CSC_52992_EP).​​​‌
  • She is also in‌ charge of the Artificial‌​‌ Intelligence M1 program at​​ Ecole Polytechnique

Madhulika Mohanty​​​‌ has a 25% Chargée‌ d'Enseignement contract at Ecole‌​‌ Polytechnique for 10 months.​​ She taught:

  • Labs and​​​‌ TDs in CSC_51053_EP (Database‌ Management Systems)
  • Labs and‌​‌ TDs in CSC_52083_EP (Systems​​ for Big Data)
  • She​​​‌ also taught 3h of‌ CM and 3h of‌​‌ TP for ECE_5DA04_TP (Big​​ Graph Data Management) at​​​‌ Télécom for DATAAI Masters.‌

Oana-Denisa Balalau is a‌​‌ part-time (33%) assistant professor​​ at Ecole Polytechnique, where​​​‌ she teaches “Mining, learning‌ and reasoning on Web‌​‌ Graphs”, L3

Przemyslaw Dominikowski​​ carries out a complementary​​​‌ teaching assignment (64h) at‌ Ecole Polytechnique. He teaches‌​‌ the labs in CSC_2F001_EP​​​‌ (Object Oriented Programming in​ C++).

Garima Gaur carried​‌ out following teaching duties:​​

  • Course, Labs and TDs​​​‌ in CSC_52640_EP (Database Management​ Systems) offered by DMAP,​‌ Ecole Polytechnique
  • Labs and​​ TDs in CSC_51053_EP (Database​​​‌ Management Systems)
  • 3h of​ CM and 3h of​‌ TP for ECE_5D04_TP (Big​​ Graph Data Management) at​​​‌ Télécom for DATAAI Masters.​

Hritika Kathuria carries out​‌ a complementary teaching assignment​​ (64h) at Ecole Polytechnique​​​‌ and teaches 2 Labs​ in CSE_102.

Tom Calamai​‌ has a 30h teaching​​ assistant (Vacataire) contract at​​​‌ Télécom Paris and Ecole​ Polytechnique. He teaches:

  • INF473G​‌
  • Machine Learning for Text​​ Mining
  • Machine learning avancé​​​‌
  • Database
  • Language Modeling

Georgios​ Siachamis carried out 3h​‌ of CM and 3h​​ of TP for ECE_5D04_TP​​​‌ (Big Graph Data Management)​ at Télécom for DATAAI​‌ Masters.

Yanlei Diao holds​​ a part-time (50%) full​​​‌ Professor position at Ecole​ Polytechnique. She teaches Systems​‌ for Big Data (CSC_52083_EP​​ Systems for Big Data),​​​‌ M1, Ecole Polytechnique.

Guillaume​ Lachaud has a 58h​‌ teaching assistant position at​​ Ecole Polytechnique. He teaches:​​​‌

  • CSC_52087_EP- Advanced Deep Learning​
  • CSC_41011_EP - Les bases​‌ de la programmation et​​ de l'algorithmique
  • CSC_43M02_EP (for​​​‌ one day) - Modal​ d'informatique - Exploration et​‌ apprentissage sur les graphes​​ du Web

11.2.2 Supervision​​​‌

The team supervised the​ following PhDs:

  1. Przemysław Dominikowski,​‌ Sep 2025 - Dec​​ 2025, advised by Ioana​​​‌ Manolescu Goujot and Madhulika​ Mohanty
  2. Kun Zhang, Jan​‌ 2025-April 2025, advised by​​ Ioana Manolescu Goujot and​​​‌ Oana-Denisa Balalau
  3. Tom Calamai,​ Jan 2025-Dec 2025, advised​‌ by Fabian Suchanek and​​ Oana-Denisa Balalau
  4. Hritika Kathuria,​​​‌ Jan 2025-Dec 2025, advised​ by Ioana Manolescu Goujot​‌ and Maxime Buron
  5. Ines​​ Abdelaziz, Dec 2025, advised​​​‌ by Oana Goga
  6. Nardjes​ Amieur, Jan 2025-Dec 2025,​‌ advised by Oana Goga​​
  7. Abir Benzaamia, Jan 2025-Dec​​​‌ 2025, advised by Oana​ Goga
  8. Asmaa El Fraihi,​‌ Jan 2025-Dec 2025, advised​​ by Oana Goga
  9. Gabriel​​​‌ Ben Zenou, Jan 2025-Dec​ 2025, advised by Oana​‌ Goga
  10. Gabriel Lozano, Sept​​ 2025-Dec 2025, advised by​​​‌ Yanlei Diao and Guillaume​ Lachaud
  11. Nazim Mezhoudi, Jan​‌ 2025-Dec 2025, advised by​​ Yanlei Diao and Mariam​​​‌ Barry (BNP Paribas)

The​ team supervised the following​‌ postdocs:

  1. Chadi Helwe, Jan​​ 2025-March 2025, advised by​​​‌ Oana-Denisa Balalau and Davide​ Ceolin
  2. Guillaume Lachaud, Jan​‌ 2025-Dec 2025, advised by​​ Yanlei Diao

The team​​​‌ supervised the following engineers:​

  1. Simon Ebel and Théo​‌ Galizzi (January to June​​ 2025), Aurélien Peden (March​​​‌ to August 2025): Oana-Denisa​ Balalau and Ioana Manolescu​‌ Goujot supervised them on​​ their collaboration project with​​​‌ RadioFrance.
  2. George Siachamis: supervised​ by Ioana Manolescu Goujot​‌ and Madhulika Mohanty on​​ efficient and expressive graph​​​‌ data management.
  3. Ines Abdelaziz​ (January to November 2025):​‌ supervised by Oana Goga​​ .

The team supervised​​​‌ the following interns:

  1. Pablo​ Bertaud-Velten, M1 IP Paris,​‌ advised by Ioana Manolescu​​ Goujot , Madhulika Mohanty​​​‌ , Garima Gaur and​ Georgios Siachamis
  2. Przemyslaw Dominikowski,​‌ M2 UP Saclay, advised​​ by Ioana Manolescu Goujot​​​‌ , Madhulika Mohanty ,​ Garima Gaur and Georgios​‌ Siachamis
  3. Nikola Dobriçic, X​​ Bachelor 3rd year, advised​​​‌ by Ioana Manolescu Goujot​ , Madhulika Mohanty and​‌ Georgios Siachamis
  4. Joanne Jegou,​​ X Bachelor 3rd year,​​ co-advised by Ioana Manolescu​​​‌ Goujot and Michael Thy‌ (APHP)
  5. Paul Kronlund-Drouault, ENS‌​‌ Lyon Bachelor 2nd year,​​ advised by Ioana Manolescu​​​‌ Goujot .
  6. Maria Mellado,‌ M2 University of Chile,‌​‌ advised by Ioana Manolescu​​ Goujot , Madhulika Mohanty​​​‌ and Garima Gaur
  7. Saba‌ Shashsavari, M1 IP Paris,‌​‌ advised by Ioana Manolescu​​ Goujot , Madhulika Mohanty​​​‌ , Garima Gaur and‌ Georgios Siachamis
  8. Vlada Voronina,‌​‌ M1, advised by Oana-Denisa​​ Balalau and Marine Le​​​‌ Morvan
  9. Rémi Guillou, X‌ Bachelor 3rd Year, advised‌​‌ by Yanlei Diao
  10. Yanis​​ Zaamoun, X Bachelor 3rd​​​‌ year, advised by Yanlei‌ Diao

The team supervised‌​‌ the following part-time projects:​​

  1. PSC "Analyse du discours​​​‌ médiatique autour du changement‌ climatique", advised by Oana-Denisa‌​‌ Balalau and Etienne Ollion​​
  2. Léo Nivelle (X3A), "Automatic​​​‌ verbalisation of statistics", advised‌ by Ioana Manolescu Goujot‌​‌
  3. Yiheng Chen, Antoine Delacour​​ and Elliot Thorel (X3A):​​​‌ "Natural language querying of‌ large heterogeneous datasets", advised‌​‌ by Ioana Manolescu Goujot​​ , Madhulika Mohanty ,​​​‌ Garima Gaur and Georgios‌ Siachamis
  4. Cédric Trinh and‌​‌ Tom Léon (X3A): "Building​​ a Knowledge Graph for​​​‌ Fact-checks", advised by Madhulika‌ Mohanty and Garima Gaur‌​‌
  5. Moritz Sommer (X and​​ RWTH Exchange Program): "Identification​​​‌ of Core Properties for‌ Semantic Concepts in Universal‌​‌ Datasets", advised by Ioana​​ Manolescu Goujot , Madhulika​​​‌ Mohanty and Garima Gaur‌
  6. Maximilien Rambaud, Nicolas Gromitsaris,‌​‌ Anthony Chassagne (X3A): "Anomaly​​ detection and explaination in​​​‌ dynamic graphs, with applications‌ in finance", advised by‌​‌ Yanlei Diao and Guillaume​​ Lachaud
  7. Gabriel Cheval, Armand​​​‌ Vabre (X3A): "Detecting data‌ drift in graphs for‌​‌ model retraining" advised by​​ Yanlei Diao and Guillaume​​​‌ Lachaud
  8. Loric Roger, Joseph‌ de Roffignac, Sylvain Dehayem‌​‌ (X3A): "Anomaly detection in​​ dynamic graphs", advised by​​​‌ Yanlei Diao and Guillaume‌ Lachaud
  9. Berthé Zié, Goly‌​‌ Kodia (X3A): "Explainable dynamic​​ graph neural networks for​​​‌ anomaly detection", advised by‌ Yanlei Diao and Guillaume‌​‌ Lachaud

11.2.3 Juries

Oana-Denisa​​ Balalau has served as​​​‌ a:

  • member of the‌ recruitment comittee for assistant‌​‌ professor at Télécom Paris​​
  • part of the PhD​​​‌ defense committee of Jonathan‌ Colin (Université Paris Saclay),‌​‌ William Soto (Université de​​ Lorraine)

Ioana Manolescu Goujot​​​‌ has served in the‌ following juries:

  • Member of‌​‌ a Professor hiring committee​​ at Université de Paris​​​‌ Dauphine (june 2025)
  • Reported‌ on the PhD thesis‌​‌ of Yifan Wang, Université​​ de Lille, defended in​​​‌ November 2025

11.3 Popularization‌

11.3.1 Specific official responsibilities‌​‌ in science outreach structures​​

Oana-Denisa Balalau is a​​​‌ member of Inria Saclay's‌ Scientific Commission. She also‌​‌ animated the foresight seminar​​ on LLMs&Science at the​​​‌ "Data and Knowledge" Inria‌ seminar in March 2025.‌​‌

Ioana Manolescu Goujot ia​​ an elected member of​​​‌ Inria's Comité d'Evaluation.

11.3.2‌ Participation in Live events‌​‌

Ioana Manolescu Goujot had​​ several intervention in national​​​‌ media:

  1. Participated to ARTE‌ "28 minutes" show on‌​‌ the impact of AI​​ on society, December 24,​​​‌ 2025.
  2. Interviewed by Michaël‌ Szadkowsky (Le Monde) for‌​‌ the article "2025, l'année​​ où la vidéo par​​​‌ IA a envahi les‌ réseaux sociaux", December 22,‌​‌ 2025.
  3. Interviewed by Désirée​​ de Lamarzelle (Forbes Magazine)​​​‌ for the article "Future‌ of work: is AI‌​‌ a friend or a​​​‌ foe?", November 13, 2025.​
  4. Interviewed by Mélinée Le​‌ Priol (La Croix) for​​ the article Faut-il avoir​​​‌ peur de la 'superintelligence​ artificielle'?", October 30, 2025​‌
  5. Interviewed by Marina Alcaraz​​ (Les Echos) on the​​​‌ frequency of fake news​ in chatbot responses, September​‌ 2025
  6. Interviewed by Marina​​ Alcaraz (Les Echos) on​​​‌ disinformation sometimes present in​ Mistral outputs, July 2025​‌
  7. Interviewed by Alexandre Capron​​ whether a GenAI vi​​​‌ (TF1) on fake AI​ videos, June 6, 2025.​‌
  8. Guest in the radio​​ show "Je pense donc​​​‌ j'agis": Où vont nos​ données et comment les​‌ protéger?", hosted by Melchior​​ Gormand, on RCF, April​​​‌ 3, 2025
  9. In a​ press conference organized as​‌ part of a "Stand​​ Up for Science" day​​​‌ on April 3, 2025​ (dépêche AEF, video recording)​‌
  10. Member of a panel​​ about ethical and regulatory​​​‌ bounds on research in​ "Journée Sciences et Médias"​‌ (French Association of Science​​ Journalists), February 10, 2025.​​​‌
  11. Interviewed by Chloé Woitier​ for the article "C'est​‌ une nouvelle pollution numérique​​ : le Slop, ce​​​‌ raz-de-marée de contenus IA​ qui menace internet", Le​‌ Figaro, February 2, 2025.​​
  12. Authored an invited opinion​​​‌ piece in l'Humanité "Les​ réseaux sociaux nuisent-ils à​‌ la démocratie?" on January​​ 27, 2025.

11.3.3 Others​​​‌ science outreach relevant activities​

Przemyslaw Dominikowski conducted an​‌ outreach session (1.5h) for​​ high school students (stage​​​‌ de seconde), presenting CEDAR's​ team research, in particular​‌ data lake indexing.

Ioana​​ Manolescu Goujot gave a​​​‌ presentation for CPES (1st​ year higher education) students​‌ at Lycée International de​​ Palaiseau Paris-Saclay.

12 Scientific​​​‌ production

12.1 Major publications​

  • 1 inproceedingsR.Rana​‌ Alotaibi, D.Damian​​ Bursztyn, A.Alin​​​‌ Deutsch, I.Ioana​ Manolescu and S.Stamatis​‌ Zampetakis. Towards Scalable​​ Hybrid Stores: Constraint-Based Rewriting​​​‌ to the Rescue.​SIGMOD 2019 - ACM​‌ SIGMOD International Conference on​​ Management of DataAmsterdam,​​​‌ NetherlandsJune 2019HAL​
  • 2 inproceedingsO.Oana​‌ Balalau, S.Simon​​ Ebel, T.Théo​​​‌ Galizzi, I.Ioana​ Manolescu, Q.Quentin​‌ Massonnat, A.Antoine​​ Deiana, E.Emilie​​​‌ Gautreau, A.Antoine​ Krempf, T.Thomas​‌ Pontillon, G.Gérald​​ Roux and J.Joanna​​​‌ Yakin. Fact-checking Multidimensional​ Statistic Claims in French​‌.TTO 2022 -​​ Truth and Trust Online​​​‌Boston [Hybrid Event], United​ StatesOctober 2022HAL​‌
  • 3 inproceedingsO.Oana​​ Balalau and R.Roxana​​​‌ Horincar. From the​ Stage to the Audience:​‌ Propaganda on Reddit.​​EACL 2021 - 16th​​​‌ Conference of the European​ Chapter of the Association​‌ for Computational LinguisticsOnline,​​ FranceApril 2021HAL​​​‌
  • 4 inproceedingsM.Maxime​ Buron, F.François​‌ Goasdoué, I.Ioana​​ Manolescu and M.-L.Marie-Laure​​​‌ Mugnier. Reformulation-based query​ answering for RDF graphs​‌ with RDFS ontologies.​​ESWC 2019 - European​​​‌ Semantic Web ConferencePortoroz,​ SloveniaMarch 2019HAL​‌
  • 5 inproceedingsD.Damian​​ Bursztyn, F.François​​​‌ Goasdoué and I.Ioana​ Manolescu. Teaching an​‌ RDBMS about ontological constraints​​.Very Large Data​​​‌ BasesNew Delhi, India​September 2016HAL
  • 6​‌ inproceedingsS.Sylvie Cazalens​​, P.Philippe Lamarre​​, J.Julien Leblay​​​‌, I.Ioana Manolescu‌ and X.Xavier Tannier‌​‌. A Content Management​​ Perspective on Fact-Checking.​​​‌The Web Conference 2018‌ - alternate paper tracks‌​‌ "Journalism, Misinformation and Fact​​ Checking"Lyon, FranceApril​​​‌ 2018, 565-574HAL‌
  • 7 articleS.Sejla‌​‌ Cebiric, F.François​​ Goasdoué, H.Haridimos​​​‌ Kondylakis, D.Dimitris‌ Kotzinos, I.Ioana‌​‌ Manolescu, G.Georgia​​ Troullinou and M.Mussab​​​‌ Zneika. Summarizing Semantic‌ Graphs: A Survey.‌​‌The VLDB Journal2018​​HAL
  • 8 inproceedingsY.​​​‌Yanlei Diao, P.‌Pawel Guzewicz, I.‌​‌Ioana Manolescu and M.​​Mirjana Mazuran. Spade:​​​‌ A Modular Framework for‌ Analytical Exploration of RDF‌​‌ Graphs.VLDB 2019​​ - 45th International Conference​​​‌ on Very Large Data‌ BasesProceedings of the‌​‌ VLDB Endowment, Vol. 12,​​ No. 12Los Angeles,​​​‌ United StatesAugust 2019‌HALDOI
  • 9 article‌​‌E.Enhui Huang,​​ L.Liping Peng,​​​‌ L. D.Luciano Di‌ Palma, A.Ahmed‌​‌ Abdelkafi, A.Anna​​ Liu and Y.Yanlei​​​‌ Diao. Optimization for‌ active learning-based interactive database‌​‌ exploration.Proceedings of​​ the VLDB Endowment (PVLDB)​​​‌121September 2018‌, 71-84HALDOI‌​‌
  • 10 inproceedingsA.Abhishek​​ Roy, Y.Yanlei​​​‌ Diao, U.Uday‌ Evani, A.Avinash‌​‌ Abhyankar, C.Clinton​​ Howarth, R.Rémi​​​‌ Le Priol and T.‌Toby Bloom. Massively‌​‌ Parallel Processing of Whole​​ Genome Sequence Data: An​​​‌ In-Depth Performance Study.‌SIGMOD '17 Proceedings of‌​‌ the 2017 ACM International​​ Conference on Management of​​​‌ DatSIGMOD '17 Proceedings‌ of the 2017 ACM‌​‌ International Conference on Management​​ of DataSIGMOD ACM​​​‌ Special Interest Group on‌ Management of DataChicago,‌​‌ Illinois, United StatesACM​​May 2017, 187-202​​​‌HALDOI
  • 11 inproceedings‌S. Y.Saumya Yashmohini‌​‌ Sahai, O.Oana​​ Balalau and R.Roxana​​​‌ Horincar. Breaking Down‌ the Invisible Wall of‌​‌ Informal Fallacies in Online​​ Discussions.ACL-IJCNLP 2021​​​‌ - Joint Conference of‌ the 59th Annual Meeting‌​‌ of the Association for​​ Computational Linguistics and the​​​‌ 11th International Joint Conference‌ on Natural Language Processing‌​‌Online, FranceAugust 2021​​HAL

12.2 Publications of​​​‌ the year

International journals‌

International peer-reviewed conferences​​

Conferences‌ without proceedings

  • 28 inproceedings‌​‌M.Maxime Buron,​​ H.Hritika Kathuria,​​​‌ I.Ioana Manolescu and‌ G.George Siachamis.‌​‌ RDF Query Answering in​​ the Presence of Access​​​‌ Restrictions.LNCS Series‌ bookCoopIS 2025 -‌​‌ 31 st International Conference​​ on Cooperative Information Systems​​​‌Marbella, SpainOctober 2025‌HALback to text‌​‌back to text

Reports​​ & preprints

Other scientific publications

Scientific popularization

  • 35 inbook​​I.Ioana Manolescu and​​​‌ P.Patrick Valduriez.​ De nouvelles architectures pour​‌ les Big Data.​​Le calcul à découvert​​​‌CNRS EditionsJanuary 2025​HAL

12.3 Cited publications​‌

  • 36 inproceedingsA. C.​​Angelos Christos Anadiotis,​​​‌ I.Ioana Manolescu and​ M.Madhulika Mohanty.​‌ Integrating Connection Search in​​ Graph Queries.ICDE​​​‌ 2023 - 39th IEEE​ International Conference on Data​‌ EngineeringAnaheim (CA), United​​ StatesApril 2023HAL​​​‌back to text