SODA

SODA - 2025

2025Activity reportProject-TeamSODA

RNSR: 202224249S

Research‌ center Inria Saclay Centre‌
Team name: Computational and‌‌ mathematical methods to understand health and society with‌ data

Creation of the‌ Project-Team: 2022 March 01‌‌

Each year, Inria research‌ teams publish an Activity Report presenting their work‌ and results over the reporting period. These reports‌ follow a common structure, with some optional sections‌ depending on the specific team. They typically begin‌ by outlining the overall objectives and research programme,‌ including the main research themes, goals, and methodological‌ approaches. They also describe the application domains targeted‌ by the team, highlighting the scientific or societal‌ contexts in which their work is situated.

The‌ reports then present the highlights of the year,‌ covering major scientific achievements, software developments, or teaching‌ contributions. When relevant, they include sections on software,‌ platforms, and open data, detailing the tools developed‌ and how they are shared. A substantial part‌ is dedicated to new results, where scientific contributions‌ are described in detail, often with subsections specifying‌ participants and associated keywords.

Finally, the Activity Report‌ addresses funding, contracts, partnerships, and collaborations at various‌ levels, from industrial agreements to international cooperations. It‌ also covers dissemination and teaching activities, such as‌ participation in scientific events, outreach, and supervision. The‌ document concludes with a presentation of scientific production,‌ including major publications and those produced during the‌ year.

Keywords

Computer Science and Digital Science

A3.3.‌ Data and knowledge analysis
A3.4. Machine learning and‌ statistics
A9.1. Knowledge
A9.2. Machine learning

Other Research‌ Topics and Application Domains

B2.3. Epidemiology
B9.1. Education‌
B9.5.6. Data science
B9.6.1. Psychology
B9.6.3. Economy, Finance‌

1 Team members, visitors, external collaborators

Research Scientists‌

Gael Varoquaux [Team leader, INRIA,‌ Senior Researcher, HDR]
Judith Abecassis [‌INRIA, ISFP]
David Holzmuller [INRIA‌, Starting Research Position, from Oct 2025‌]
Myung Kim [INRIA, Starting Research‌ Position]
Marine Le Morvan [INRIA,‌ Researcher]
Jill Jenn Vie [INRIA,‌ Researcher]

Post-Doctoral Fellows

Nicolas Hiebel [INRIA‌, Post-Doctoral Fellow, from Oct 2025]‌
Joel Mba Kouhoue [INRIA, Post-Doctoral Fellow‌, from Sep 2025]
Jingang Qu [‌INRIA, Post-Doctoral Fellow]
Clémence Reda [‌UNIV POTSDAM, Post-Doctoral Fellow, until Aug‌ 2025]

PhD Students

Julie Alberge [INRIA‌]
Gioia Blayer [INRIA, from Nov‌ 2025]
Emma Cussenot [INRIA, from‌ Dec 2025]
Marie Generali-Lince [INRIA]‌
Samuel Girard [INRIA]
Felix Lefebvre [‌INRIA]
Sebastien Melo [INRIA]
Jovan‌ Stojanovic [INRIA]

Technical Staff

Hiba Bederina‌ [INRIA, Engineer, until May 2025‌]
Riccardo Cappuzzo [INRIA, Engineer,‌ from Oct 2025]
Tristan Haugomat [INRIA‌, Engineer]
Eloi Massoulie [INRIA,‌ Engineer, from Dec 2025]

Interns and‌ Apprentices

Anav Agrawal [INRIA, Intern,‌ from May 2025 until Jul 2025]
Guillaume‌ Bertho [AP/HP, Intern, from May‌ 2025 until Nov 2025]
Emma Cussenot [‌INRIA, Intern, from May 2025 until‌ Oct 2025]
Dan Suissa [INRIA, Intern, from Nov‌ 2025]
Vlada Voronina‌ [INRIA, Intern‌‌, from May 2025 until Aug 2025]‌

Administrative Assistant

Ekaterina George‌ [INRIA]

Visiting‌‌ Scientist

Tomas Rigaux [UNIV KYOTO, from‌ Aug 2025 until Sep‌ 2025]

External Collaborators‌‌

Gaetan Brison [IP PARIS]
Lihu Chen‌ [IMPERIAL COLLEGE LDN‌]
Leo Dautun [‌‌AP/HP, from Apr 2025]
Lea Hoisnard‌ [AP/HP, until‌ Oct 2025]
Theo‌‌ Jolivet [AP/HP, from Feb 2025]‌
Elise Liu [ICM‌, from Dec 2025‌‌]
Louis Potier [AP/HP, from Dec‌ 2025]
Meilame Tayebjee‌ [ENSAE]

2‌‌ Overall objectives

2.1 Context

2.1.1 Application context: richer‌ data in health and‌ social sciences

Opportunistic data‌‌ accumulations, often observational, bare great promises for social‌ and health sciences. But‌ the data are too‌‌ big and complex for standard statistical methodologies in‌ these sciences.

Health databases‌

Increasingly rich health data‌‌ is accumulated during routine clinical practice as well‌ as for research. Its‌ large coverage brings new‌‌ promises for public health and personalized medicine, but‌ it does not fit‌ easily in standard biostatistical‌‌ practice because it is not acquired and formatted‌ for a specific medical‌ question.

Social, educational, and‌‌ behavioral sciences

Better data sheds new light on‌ human behavior and psychology,‌ for instance with on-line‌‌ learning platforms. Machine learning can be used both‌ as a model for‌ human intelligence and as‌‌ a tool to leverage these data, for instance‌ improving education.

Likewise, activity‌ traces can provide empirical‌‌ evidence for economical or political science, but‌ their complexity requires new‌ statistical practices.

AI in‌‌ society

AI increasingly impacts multiple aspects of society.‌ As such, it calls‌ for rigorous evaluation, whether‌‌ it is a benchmark of its ability, or‌ a broader assessment of‌ its impacts.

2.1.2 Related‌‌ data-science challenges

Data management: preparing tabular data for‌ analytics

Assembling, curating, and‌ transforming data for data‌‌ analysis is very labor intensive. These data-preparation steps‌ are often considered the‌ number one bottleneck to‌‌ data-science. They mostly rely on data-management techniques. A‌ typical problem is to‌ establish correspondences between entries‌‌ that denote the same entities but appear in‌ different forms (entity linking,‌ including deduplication and record‌‌ linkage). Another time-consuming process is to join and‌ aggregate data across multiple‌ tables with repetitions at‌‌ different levels (as with panel data in econometrics‌ and epidemiology) to form‌ a unique set of‌‌ “features” to describe each individual. This process is‌ related to database denormalization‌ and might require schema‌‌ alignment when performed across multiple data sources with‌ imperfect correspondence in columns‌.

Progress in machine‌‌ learning increasingly helps automating data preparation and processing‌ data with less curation.‌

From machine learning to‌‌ statistically-valid answers

Machine learning can be a tool‌ to answer complex domain‌ questions by providing non-parametric‌‌ estimators. Yet, it still requires much work,‌ eg to go beyond‌ point estimators, to derive‌‌ non-parametric procedures that account‌ for a variety of bias (censoring, sampling biases,‌ non-causal associations), or to provide theoretical and practical‌ tools to assess validity of estimates and conclusion‌ in weakly-parametric settings.

A question that is increasingly‌ important in all applications of machine learning is‌ that of auditing the model used in practice.‌ This question arises in fundamental-research settings (medical research,‌ political science...) for statistical validity, and in applications‌ to assess societal biases, or safety of AI‌ systems.

3 Research program

3.1 Table representation learning‌

Soda develops develop deep-learning methodology for relational databases,‌ from tabular datasets to full relational databases. The‌ stakes are i) to build machine-learning models that‌ apply readily to the raw data so as‌ to minimize manual cleaning, data formatting and integration,‌ and ii) to extract reusable representations that reduce‌ sample complexity on new databases by transforming the‌ data in well-distributed vectors and bringing background information.‌ The success of embarking such background knowledge in‌ foundation models such as large language models motivates‌ a quest for table foundation models.

3.2‌ Mathematical aspects of statistical learning for data science‌

While complex models used in machine learning can‌ be used as non-parametric estimators for a variety‌ of statistical tasks or for decision making, the‌ statistical procedures and validity criterion need to be‌ reinvented. Soda contributes statistical tools and results for‌ a variety of problems important to data science‌ in health and social science (epidemiology, econometrics, education).‌ Statistical topics of interest comprise:

Missing values and‌ survival analysis
Causal inference
Model validation and auditing‌
Uncertainty quantification

3.3 Machine learning for health and‌ social sciences

Soda targets applications in health and‌ social sciences, as these can markedly benefit from‌ advanced processing of richer datasets, can have a‌ large societal impact, but fall out of mainstream‌ machine-learning research, which focus on processing natural images,‌ language, and voice. Rather, data surveying humans needs‌ another focus: it is most of the time‌ tabular, sparse, with a time dimension, and missing‌ values. In term of application fields, we focus‌ on the social sciences that rely on quantitative‌ predictions or analysis across individuals, such as policy‌ evaluation. Indeed, the same formal problems, addressed in‌ the two research axes above, arise across various‌ social sciences: epidemiology, education research, and economics.‌ The challenge is to develop efficient and trustworthy‌ machine learning methodology for these high-stakes applications.

3.4‌ Turn-key machine-learning tools for socio-economic impact

Societal and‌ economical impact of machine learning requires easy-to-use practical‌ tools that can be leveraged in non-specialized organizations‌ such as hospitals or policy-making institutions.

Soda works‌ on scikit-learn, one of the most popular‌ machine-learning tool world-wide, as well as skrub,‌ a younger project that specializes machine learning for‌ tables. Our goal is to transfer outside of‌ the lab the understanding of machine learning and‌ data science accumulated by the various research efforts.‌

Soda also works on other important software tools‌ to foster growth and health of the Python data ecosystem in which‌ scikit-learn is embedded.

4‌ Application domains

4.1 Precision‌‌ medicine, public health, and epidemiology

Data management is‌ the focus of the‌ field of medical informatics‌‌ as it is notably challenging in healthcare settings,‌ due to the multiplicity‌ of sources and the‌‌ richness of the data that encompasses many modalities.‌ We apply the our‌ machine techniques for statistical‌‌ analysis, including causal inference, in medicine to facilitate‌ clinical research and public-health‌ evidence. The central questions‌‌ are that of personalized medicine –prediction at the‌ individual level, for diagnosis,‌ prognosis, or drug recommendation–‌‌ and of public health –evaluation of treatments and‌ policy, estimation of risk‌ factors. The data on‌‌ which we work are patient history and claims‌ databases: mid-dimensional data with‌ longitudinal coverage (as opposed‌‌ to “omics” or imaging data, which is high‌ dimensional and much less‌ frequently available in clinical‌‌ settings).

We collaborate actively with AP-HP and Ministère‌ de la Santé. APHP‌ provides access to its‌‌ very rich and complex data mart, with dozens‌ of tables following millions‌ of individuals, both a‌‌ challenge and an opportunity, and we work with‌ various medical specialists (neurology,‌ diabetology, public health) on‌‌ specific clinical questions related to prognostic, treatment evaluation,‌ and risk factors. With‌ Ministère de la Santé,‌‌ we process the claims data from the national‌ insurance database to establish‌ trajectories of individuals as‌‌ a function of their future health risks. The‌ short-term goal is to‌ find which medical conditions‌‌ can be predicted and with what reliability. The‌ longer-term goal is to‌ define prevention strategies.

4.2‌‌ Educational data mining

In educational data mining, we‌ are interested in developing‌ mathematical methods of learning‌‌ to personalize education through adaptive assessment (developing algorithms‌ that select questions for‌ measuring efficiently the latent‌‌ knowledge of examinees or for optimizing learning), recommending‌ learning resources, generating exercises‌ automatically. It is a‌‌ challenging problem as it is hard to quantify‌ learning, unlike in traditional‌ reinforcement learning scenarios, and‌‌ it is hard to measure the effect of‌ courses on learning. This‌ is why it is‌‌ traditionally modeled as a partially-observable Markov decision process‌ (POMDP). We are interested‌ in modeling the evolution‌‌ of uncertainty over the latent knowledge of examinees‌ over time, for example‌ using Bayesian approches, or‌‌ model-based reinforcement learning.

Soda is actively collaborating with‌ the national platform Pix.fr‌ for certifying the digital‌‌ competencies of all French citizens. Jill-Jênn Vie is‌ one of the original‌ core developers and they‌‌ jointly received a Paris Region PhD grant in‌ 2023 allowing them to‌ co-supervise the PhD of‌‌ Samuel Girard about optimizing human learning. In 2023,‌ Jill-Jênn Vie joined the‌ scientific committee of the‌‌ French Ministry of Education (CSEN, conseil scientifique de‌ l'Éducation nationale), leading‌ to collaborations with Franck‌‌ Ramus and ongoing discussions with Camille Terrier, Marc‌ Gurgand, Hugo Gimbert via‌ the scientific committee of‌‌ MonProjetSup, a state startup about a study path‌ recommender system.

4.3 Data‌ management

Data preparation for‌‌ analytics is intrinsically related‌ to data management. For instance, linked open data‌ provides consistent views on data across silos, but‌ integrating these data into a statistical model to‌ answer a given question still requires a lot‌ of user efforts. Database operation increasingly relies on‌ machine learning. While Soda is in no way‌ expert in database research, the analytic tools that‌ we build for relational data are increasingly used‌ for data management. We are collaborating with Paolo‌ Papotti (Eurecom) on this topic.

4.4 Broader data‌ science

The tools, practical and theoretical, that we‌ develop are central to many applications of data‌ science. For instance, we often discuss with banks‌ and insurances, which use machine learning but face‌ statistical problems that we tackle: censoring or other‌ sampling biases, forecasting, uncertainty quantification. Marketing and business‌ intelligence also face the same exact problems. Even‌ more generally, data preparation from relational databases is‌ a challenge is most data-science applications. We interact‌ with data scientists in a broad set of‌ applications via the user base of the software‌ tools that we develop (eg scikit-learn) and‌ the various courses and lectures that we give‌ around these tools to industry audiences.

We have‌ started a collaboration in economics (Margherita Comola, Paris‌ School of Economics) on using machine learning to‌ understanding communication strategies of politicians from social-network data.‌

4.5 Behavioral sciences

A methodological challenge in health‌ and educational sciences common to behavioral science is‌ that the quantities of interest are difficult to‌ measure, e.g. intelligence or progress of a student.‌ Supervised machine learning can infer proxies from indirect‌ signs, such as psychological traits from brain imaging,‌ diagnosis from clinical traces, or socio-economical status from‌ demographics. This notion of proxies is central in‌ policy evaluation, serving as indirect signals in causal‌ inference, to provide secondary outcomes for treatment effect‌ estimation or to control confounders not directly observed.‌

An ongoing project with Pass Culture (via Inria-Ministry‌ of Culture convention) is to adapt the recommender‌ system of the app to encourage diversity, i.e.‌ not only optimize click-through rate, but making students‌ discover new things. This is done by modeling‌ this problem as contextual bandits, and a diversity‌ term acts as regularizer in the objective function.‌

5 Social and environmental responsibility

5.1 Footprint of‌ research activities

The main footprint of Soda's activity‌ is the carbon footprint of our travels (surpassing‌ our compute cost, as we seldom run very‌ intensive computation). For this reason, we try to‌ be careful with our long-distance travel and try‌ to take the plane as little as possible.‌ Not flying at all is not possible, as‌ it would cut us off from the world-wide‌ research community sometimes mediated by crucial conferences in‌ North America. However, we favor online seminars, or‌ on-premise talks accessible by train.

Because of a‌ race to scale, artificial intelligence is starting to‌ have a large environmental footprint. As this is‌ the result of collective action, as opposed to a single research group,‌ we are trying to‌ bring this problem to‌‌ the attention of the community 42. Whenever‌ possible, we also work‌ on algorithms with small‌‌ computational costs. For instance using tree-based models instance‌ of neural networks can‌ sometimes bring sizable computational‌‌ and statistical benefits 31. This work required‌ solving fundamental challenges, as‌ trees are not differentiable,‌‌ and was difficult to get accepted because it‌ not fashionable. Another example‌ is quantifying uncertainty of‌‌ large language models to call the smallest that‌ will give a good-enough‌ answer 57.

5.2‌‌ Impact of research results

While data science can‌ improve health and education,‌ working with personal data‌‌ or providing decision tools that affect individuals comes‌ with responsibilities.

We make‌ sure that work at‌‌ Soda do not risk having direct negative impact.‌ All research real-life health‌ data (hospital-level or nation-wise)‌‌ is started only after approval by the corresponding‌ ethical board. Soda does‌ not put any tools‌‌ in production: none of the works of soda‌ directly leads to automated‌ decisions. Consequently none of‌‌ our work has directly impacted individuals. Soda works‌ on pseudonymized data, and‌ we leave the –pseudonymized–‌‌ electronic health data on servers inside the protected‌ environment of the hospital‌ where they have been‌‌ acquired and are used. Going further, Soda runs‌ research on privacy-preserving synthetic‌ data generation, to provide‌‌ open datasets for research and development without privacy‌ concerns.

Soda is also‌ active on assess and‌‌ discussing the broader impacts and risks associated to‌ AI 11, participating‌ in international efforts 49‌‌ to create consensus.

6 Highlights of the year‌

6.1 Awards

Doctor Honoris‌ Causa UC Louvain
Gael‌‌ Varoquaux
Ordre National du Mérite
Gael Varoquaux
Clarivate‌ highly cited researcher
Gael‌ Varoquaux
BFM Awards
section‌‌ IA Gael Varoquaux
Pedagogical Dynamics Prize from Fondation‌ de l'École polytechnique
Jill-Jênn‌ Vie
Sophie Germain Prize‌‌ from UK Embassy
Jill-Jênn Vie with Luc Rocher‌ from The University of‌ Oxford
ICLR 2025 spotlight‌‌ (top 300 of 11,600 submissions)
Marine Le Morvan‌ and Gael Varoquaux

7‌ Latest software developments, platforms,‌‌ open data

7.1 Latest software developments

7.1.1 Scikit-learn‌

Keywords:
Clustering, Classification, Regression,‌ Machine learning
Scientific Description:‌‌
Scikit-learn is a Python module integrating classic machine‌ learning algorithms in the‌ tightly-knit scientific Python world.‌‌ It aims to provide simple and efficient solutions‌ to learning problems, accessible‌ to everybody and reusable‌‌ in various contexts: machine-learning as a versatile tool‌ for science and engineering.‌
Functional Description:

Scikit-learn can‌‌ be used as a middleware for prediction tasks.‌ For example, many web‌ startups adapt Scikitlearn to‌‌ predict buying behavior of users, provide product recommendations,‌ detect trends or abusive‌ behavior (fraud, spam). Scikit-learn‌‌ is used to extract the structure of complex‌ data (text, images) and‌ classify such data with‌‌ techniques relevant to the state of the art.‌

Easy to use, efficient‌ and accessible to non‌‌ datascience experts, Scikit-learn is an increasingly popular machine‌ learning library in Python.‌ In a data exploration‌‌ step, the user can‌ enter a few lines on an interactive (but‌ non-graphical) interface and immediately sees the results of‌ his request. Scikitlearn is a prediction engine .‌ Scikit-learn is developed in open source, and available‌ under the BSD license.
URL:
http://scikit-learn.org
Publications:
hal-00650905‌, hal-00856511, hal-01093971
Contact:
Gael Varoquaux
Participant:‌
10 anonymous participants
Partners:
Axa, BNP Parisbas Cardif,‌ Dataiku, Nvidia, Chanel, Probabl

7.1.2 joblib

Keywords:
Parallel‌ computing, Cache
Functional Description:
Facilitate parallel computing and‌ caching in Python.
URL:
https://joblib.readthedocs.io/en/latest/
Contact:
Thomas Moreau‌
Participant:
an anonymous participant
Partner:
Probabl

7.1.3 skrub‌

Keyword:
Data analysis
Functional Description:
Joins, aggregates, and‌ vectorizes tables to enable statistical learning, including with‌ badly formated entries
URL:
https://skrub-data.org
Contact:
Gael Varoquaux‌
Participant:
2 anonymous participants
Partner:
Probabl

8 New‌ results

8.1 Table representation learning

Participants: David Holzmuller‌, Marine Le Morvan, Gael Varoquaux.‌

TabICL: Table foundation models

A new wave of‌ progress is pushing forward tabular learning. Recent models‌ have been bringing better performance across the board.‌ A poster example is that of the TabPFN‌ series of models, that rely on pretrained transformers‌ to bring excellent performance, originally in the few-shot‌ settings, and in the beginning of 2025, up‌ to moderate tables with TabPFN2. This line of‌ work has led to spinning off a startup‌ in Germany. However, the quadratic complexity of the‌ transformers is a bottleneck. With the TabICL model‌ 7, we showed that a multi-stage architecture‌ can build a pre-trained in-context predictor where the‌ separation of states decreases the quadratic cost. The‌ model can be pretrained on larger datasets, and‌ thus results in the best performer in settings‌ of larger tables. The model is faster than‌ alternatives, in particular when using a CPU rather‌ than a GPU. In addition, we released in‌ open source all the code, including the pretraining;‌ this has spurred much downstream research for multiple‌ applications and enhancements, such as privacy.

This result‌ is very significant as it pushes forward the‌ agenda of foundation models for tables. It is‌ giving birth to a very active line of‌ research. The paper has been cited 72 times‌ in less than a year.

Retrieve merge predict‌

A full data-science pipeline must often assemble data‌ across multiple source tables. When the user is‌ faced with a complex data lake, many tables‌ and little explicit link between them, it is‌ difficult to find the best assembly for a‌ given machine-learning task. This problem requires not only‌ finding which table must be joined in the‌ main table of interest –a table retrieval problem–,‌ but also how to aggregate multiple records when‌ tables are linked through a many-to-one relation. While‌ table retrieval is a classic problem of the‌ data management literature, it had been understudied in‌ the case of supervised machine learning. We assembled‌ a systematic –and open– benchmark with data lakes‌ and supervised-learning tasks 2. We found that‌ supervised learning does change the picture compared to classic table-retrieval settings in‌ that for a fixed‌ compute budget, it is‌‌ worth avoiding fancy retrieval methods, which can be‌ very computationally costly, and‌ rather using better supervised‌‌ learning methods, which can be comparatively less expensive‌ while being able to‌ extract the relevant information‌‌ from a noisy retrieval.

TabArena

The progress in‌ tabular learning—using machine learning‌ to predict from rows‌‌ of a table—has been driven by empirical studies‌ over the last few‌ years. We have contributed‌‌ to building TabArena 4, a living benchmark‌ for machine learning on‌ tabular data. TabArena contains‌‌ 51 datasets that are carefully curated to represent‌ real-world tabular learning tasks,‌ avoiding pitfalls such as‌‌ duplicated datasets with different names, data leakage, inappropriate‌ train-test splits, datasets inappropriate‌ for tabular learning methods,‌‌ and so on. The first version of the‌ benchmark evaluates 16 tabular‌ learning methods, including recent‌‌ models and 3 table foundation models. TabArena aims‌ to evaluate models in‌ the settings that allow‌‌ them to achieve peak performance. This includes hyperparameter‌ tuning with well-designed search‌ spaces, cross-validation, and ensembling‌‌ different hyper-parameter configurations. Besides providing an up-to-date comparison‌ of models, TabArena provides‌ insights on the impact‌‌ of cross-validation, ensembling, tuning, and validation overfitting. Results,‌ updated on a regular‌ basis with new methods,‌‌ are presented on a leaderboard at http://tabarena.ai.‌

TabArena is reaching a‌ very broad visibility. Indeed,‌‌ while it went public only this summer, it‌ is cited 26 times‌ 6 months later and‌‌ received the spotlight distinction at NeurIPS, the largest‌ machine learning conference.

8.2‌ Statistical aspects of machine‌‌ learning

Participants: Judith Abécassis, Marine Le Morvan‌, Gael Varoquaux.‌

Learning with missing values‌‌

A common practice for handling missing values in‌ tables consists in first‌ imputing missing values—i.e., replacing‌‌ them with plausible values—and then proceeding as if‌ the data were complete.‌ In this context, we‌‌ asked a simple but fundamental question: is it‌ worth investing effort and‌ resources in better imputations‌‌ to improve predictions? This work complements our previous‌ asymptotic theoretical findings with‌ a thorough empirical finite-sample‌‌ study 5, providing useful conclusions for practitioners.‌ Results show that better‌ recovery of missing values‌‌ leads to better prediction, but with diminishing returns:‌ a large improvement in‌ recovery quality –which typically‌‌ comes at a sizable computational cost– leads to‌ a small improvement in‌ prediction accuracy. The effect‌‌ is further reduced when using flexible learning algorithms,‌ and adding missing-value indicators‌ Overall, on real-world datasets‌‌ with powerful models, improving imputation yields very limited‌ benefits.

Guidance for evaluation‌ of medical AI

We‌‌ contributed to a guidance review on metric to‌ evaluate predictors in the‌ context of medical practice‌‌ 1. This guidance is aimed at practitioners‌ and is important given‌ the profusion of metrics‌‌ applicable to classifiers, and the confusions in what‌ they measure. The work‌ outline both the various‌‌ aspects that the metrics probe –discrimination, calibration, overall‌ performance, classification, and clinical‌ utility–, as well as‌‌ the desirable mathematical properties.‌ For instance, we stress that a good metric‌ should be proper: it should be optimal when‌ the classifier outputs the true probability of events.‌ The metrics are illustrated in the context of‌ medical usage, with an analysis of the utility‌ and benefit to the patient.

Double Debiased Machine‌ Learning for Mediation Analysis with Continuous Treatments

We‌ introduced double machine-learning estimators with better convergence properties‌ 43 to conduct a mediation analysis, ie‌ to quantify how much of the causal effect‌ of a continuous treatment goes via an intermediate‌ variable. We constructed a kernel-based, Neyman-orthogonal estimator that‌ combine regression and inverse-probability-weighting ideas while avoiding explicit‌ estimation of the mediator density, which is beneficial‌ with high-dimensional or continuous mediators, that often occur‌ in applications. We established key theoretical properties: asymptotic‌ normality at a nonparametric rate and multiple robustness‌ that tolerates some misspecified nuisance modelsand illustrate; derived‌ an asymptotically mean-squared-error–optimal bandwidth and associated confidence intervals‌ for the mediated response curve. Simulation studies and‌ an application to real-world medical data from the‌ UK Biobak cohort (assessing the mediating role of‌ brain-related variables in the effect of glycemic control‌ on cognitive outcomes) demonstrate improved finite-sample performance over‌ existing mediation estimators, highlighting the method’s practical relevance‌ for complex observational studies.

8.3 Bridging to health‌ and social sciences

Participants: Gael Varoquaux, Judith‌ Abécassis, Jill-Jênn Vie.

Emergence of maths‌ gender gap

Together with colleagues in cognitive psychology,‌ we studied determinants of the gender gap in‌ mathematics abilities 6. We analyzed four consecutive‌ cohorts of nation-wide evaluation in France, on 5-to-7-year-old‌ first graders. The data reveal the emergence of‌ a gap in test results during the first‌ grade: girls and boys start the year with‌ almost equal test performance, but after one year‌ of schooling the boys perform markedly better. This‌ gender gap emerged across all type of schooling‌ (including Montessori or other innovative pedagogy), all family‌ socio-economic status. The onset of the gap was‌ related to the admission in first grade, and‌ not the age of the children. In contrast‌ to maths, the development of language skills follow‌ different dynamics, with a gap favoring girls present‌ before schooling and different temporal evolution during schooling,‌ narrowing this gap. The study concludes that the‌ gender gap is unlikely to be due to‌ fundamental gender differences in aptitudes, but rather likely‌ mediated by interactions by teachers and parents, with‌ hypothesis such as transmission of anxiety or internalizing‌ stereotypes.

Influence of training difficulty in learning outcomes‌ of medical students

Literature supports that in order‌ to learn, tasks should not be too difficulty‌ nor too easy. In a study in press,‌ we attempted to identify that optimal level of‌ difficulty using millions of student-question interactions of French‌ students on the biggest medical training platform (‌Banque nationale d'entraînement, BNE) to determine how‌ the difficulty of practice questions relative to student‌ ability influences final exam performance. The best learning outcomes occur when students‌ engage with questions that‌ are, on average, slightly‌‌ easier than their current proficiency level. This sweet‌ spot for difficulty is‌ not universal; it varies‌‌ significantly across different medical specialties and individual student‌ abilities. High-ability students, in‌ particular, showed greater sensitivity‌‌ to question difficulty. These results emphasize the need‌ for adaptive learning systems‌ that can personalize difficulty‌‌ in real-time to match each student's evolving skills‌ and the specific complexity‌ of the subject matter.‌‌

Unpacking the scale narrative in AI

Plotting the‌ increase of the scale‌ of notable AI systems‌‌ in the last years reveals a staggering explosion.‌ AI's size has been‌ growing super exponentially on‌‌ a variety of dimensions: training compute, training cost‌ (fig:aiscale), inference cost, amount‌ of data used. Studying‌‌ the wording used in pivotal publications as well‌ as company communications shows‌ that it anchors AI‌‌ success in this growth, thus settings implicit social‌ norms around scale 8‌. But systematic analysis‌‌ of benchmark results show that scale does not‌ always bring benefit. The‌ narrative of scale is‌‌ simplified and leaves aside many important ingredients of‌ success of AI systems.‌ In addition, the race‌‌ for scale comes with planetary and societal consequences,‌ which we study and‌ document 8. Ever-increasing‌‌ inference costs threaten economic and electricity sustainability. An‌ unstoppable appetite for training‌ data leads to fitting‌‌ models on enormous datasets that elude quality control,‌ engulfing undesirable facets of‌ internet (including child pornography)‌‌ or eroding privacy. The race for scale has‌ financial consequences, benefiting above‌ all actors of compute,‌‌ but also structuring an ecosystem where cash-rich and‌ GPU-rich actors have leverage‌ on priorities, industrial or‌‌ academic. These actors sometimes have circular investments strategies:‌ funding third parties that‌ will spend all this‌‌ funding in compute, which can fuel an investment‌ bubble in AI.

Figure‌ 1: Evolution of‌ training cost of notable‌‌ AI systems

We conclude our study, published at‌ FAccT 8, by‌ underlining that academic research‌‌ has a central role to play in these‌ dynamics and must shape‌ a healthy and grounded‌‌ narrative. We recommend to 1) pursue basic AI‌ research of interest independent‌ of scale, eg uncertainty‌‌ quantification, causality, etc. 2) hold responsible norms, in‌ particular avoiding asking for‌ compute increase when editing‌‌ or reviewing, 3) always publish measures of compute‌ to document the tradeoffs.‌

This study has had‌‌ much impact: it has been well picked up‌ by academics as well‌ as policy-makers, due to‌‌ its relevance to the current economy of innovation.‌ It has been cited‌ 48 times in less‌‌ than a year.

Going from a theoretical causal‌ analysis framework to practical‌ guidance with health data‌‌

Many applications of machine learning, in particular in‌ healthcare, need to lead‌ to actionable conclusions and‌‌ support for decision-making processes‌ through. Thus, such applications must go beyond statistical‌ associations and use a causal framework that. This‌ is challenging to implement in practice, particularly when‌ dealing with noisy real-world observational data. We propose‌ and document a practical, five-step framework to turn‌ routine electronic health records (EHR) into reliable, causally-grounded‌ evidence for treatment decisions 3, illustrated on‌ the effect of albumin plus crystalloids versus crystalloids‌ alone on 28‑day mortality in sepsis. We emphasize‌ that valid inference from observational ICU data hinges‌ on: (1) careful study design using a target-trial‌ emulation/PICOT formulation to avoid time-related biases such as‌ immortal time bias; (2) explicit causal reasoning to‌ identify confounders and define an estimand; (3) robust‌ estimation using modern causal estimators, where doubly robust‌ methods with flexible machine-learning nuisances (e.g. random forests)‌ perform best; and (4) systematic “vibration” analyses to‌ quantify how sensitive conclusions are to design, confounder,‌ and model choices. Applying this pipeline to MIMIC‑IV,‌ they recover the “no average effect” of albumin‌ seen in randomized controlled trials (RCTs), while revealing‌ clinically meaningful treatment heterogeneity, with potential benefit in‌ subgroups such as older patients, males, and those‌ with septic shock, thereby showcasing how valid causal‌ machine learning on EHRs can complement RCTs for‌ individualized decision-making.

8.4 Turn-key machine-learning tools for socio-economic‌ impact

Participants: Gael Varoquaux.

Releases of scikit-learn‌

2025 saw two major releases of scikit-learn (1.7‌ in June and 1.8 in December). Scikit-learn has‌ kept improving, adding both user-visible features, and deep‌ transformations of the technical piles. We list below‌ a few highlights that are certainly not exhaustive‌ but illustrate the continuous progress made.

Figure 2:‌ Estimator displays – the screenshot shows the representation‌ of a simple pipeline combining a standard scaler‌ with a logistic regression. This representation appears in‌ the user's environment –jupyter notebook, Google collab, VScode–‌ whenever the user prints the corresponding estimator.

Increasing‌ support of GPUs
We are progressively rewriting the‌ underlying compute operations to be able to execute‌ on GPUs. As of scikit-learn 1.8, full analyses‌ can be run, including cross-validation and model evaluation.‌ On many workflows, running on the GPU leads‌ to massive speedups (multiple folds, up to 70x).‌
Linear model speed ups
The algorithmics of the‌ linear models have been improved along many directions,‌ leading to speed ups, up to 10x in‌ the sparse regression cases.
Temperature scaling recalibration
Recalibration‌ correct systematics biases in prediction probabilities, eg over‌ or under-confident classifiers. The problem becomes much harder‌ in many class settings, because each class comes‌ with a probability that must be estimated. Temperature‌ scaling is a recalibration method that is particularly‌ well suited to such settings.
Estimator displays
For‌ a user working interactively with scikit-learn, as most data-scientists do, printing the‌ models brings up a‌ rich display that we‌‌ have been improving in the last releases. The‌ stakes are to make‌ the user more productive.‌‌ As with all user-experience work, the challenge is‌ to display the right‌ information, and make it‌‌ understandable. In the last year, we have added‌ a display of the‌ hyper-parameters of the estimator,‌‌ as well as the corresponding documentation, as illustrated‌ in fig:estimatordisplay.
Free threading‌
The Python virtual machine‌‌ has historically had a central lock that prevented‌ efficient thread-based parallel computing.‌ However, this lock has‌‌ recently been removed and the virtual machine can‌ be built without it.‌ We have adapted scikit-learn‌‌ to make sure that it runs safely in‌ heavily multi-threaded settings, opening‌ the door do data‌‌ science in Python with efficient native parallel computing.‌

skrub

Skrub is a‌ package to facilitate machine‌‌ learning on tables that was first released at‌ the end of 2023.‌ Year 2024 was a‌‌ very active year for skrub, with three release‌ (0.5 in Jan, 0.6‌ in Jul, and 0.7‌‌ in Dec), and the following major features:

DataOps‌
skrub now comes with‌ a new way of‌‌ writing non-linear pipelines –dubbed DataOps– that combine‌ multiple tables, tracks provenance‌ through their transformations, and‌‌ integrates machine learning. The DataOps can then be‌ re-applied to new data,‌ cross-validated, tuned, or extracted‌‌ to be put in production.
Optuna
In skrub‌ 0.7, Optuna can be‌ used for hyper-parameter tuning‌‌ on the pipelines. It opens the door to‌ advanced hyper-parameter optimization algorithms.‌

While skrub is a‌‌ fairly new package, it is increasingly well received‌ by user. Uptake in‌ download numbers can be‌‌ seen on pypistats.org/packages/skrub, with, 6 000 downloads‌ daily, as of end‌ of December, and a‌‌ beautiful exponential growth.

joblib

joblib is a very‌ simple computation engine in‌ Python that is massively‌‌ used worldwide, including as a dependency of packages‌ such as scikit-learn for‌ parallel computing.

Release 1.5‌‌. Many changes to follow evolutions of the‌ ecosystem and improve behaviors.‌ Major changes are:

Avoiding‌‌ collisions of cache when cache is stored on‌ a shared disk across‌ different nodes from a‌‌ cluster
Support of Python 3.14

9 Bilateral contracts‌ and grants with industry‌

Participants: Judith Abecassis,‌‌ Gael Varoquaux, Jill-Jênn Vie.

9.1 Bilateral‌ contracts with industry

Probabl‌

Probabl is an Inria‌‌ spin-off in which Gaël Varoquaux has 30% of‌ his time allocated and‌ is Chief Science Officer.‌‌ Probabl's mission is to develop and make sustainable‌ an ecosystem of data-science‌ commons. Probabl is the‌‌ larger employer of scikit-learn maintainers. It builds a‌ commercial offer around the‌ scikit-learn ecosystem by augmenting‌‌ scikit-learn with solutions and services for the entreprise.‌ Gaël Varoquaux is the‌ point of contact at‌‌ Soda.

Pass Culture

Within the Ministry of Culture-Inria‌ convention, Samuel Girard and‌ Jill-Jênn Vie have been‌‌ involved in a partnership with Pass Culture (used‌ by 3 million students‌ in France) to improve‌‌ the diversity of their‌ recommendations (12 months, started in June 2024). We‌ hired an engineer, Hiba Bederina , from June‌ 2024 from May 2025 and conducted a randomized‌ controlled trial on 400,000 users, which led to‌ a publication on a RecSys workshop on social‌ good.

Collaboration with Ministère de la Santé

We‌ have a 4-year long collaboration with Ministère de‌ la Santé (HAS) on using the national healthcare‌ data for prevention and policy evaluation. Gaël Varoquaux‌ and Judith Abecassis are in charge at Soda.‌

9.2 Bilateral Grants with Industry

Collaboration with public‌ interest group Pix

Jill-Jênn Vie got a Paris‌ Region PhD 2023 funding with Pix (certification of‌ digital competencies, 6 million active users), about optimizing‌ human learning using reinforcement learning. Samuel Girard 's‌ PhD is currently on this funding (105,000 euros‌ from région Île-de-France, 20,000 euros from Pix).

10‌ Partnerships and cooperations

10.1 International initiatives

10.1.1 Inria‌ associate team not involved in an IIL or‌ an international program

Title:
Recommendations Encouraging Diversity
Duration:‌
2024 -> 2026
Coordinator:
Jill Jenn Vie and‌ Koh Takeuchi (takeuchi@i.kyoto-u.ac.jp)
Partners:
- Kyoto University (Japan)
- CNRS‌
Inria contact:
Jill Jenn Vie
Summary:
This project‌ aims to create recommender systems that optimize for‌ cultural diversity. Finding items that not only optimize‌ click-through rate, or profit, but also encourage users‌ to discover new things. The goal is to‌ borrow methods from causal inference to measure the‌ treatment effect of recommendations (defined as the diversity‌ after and before recommendation), and methods from reinforcement‌ learning to optimize this treatment effect. One key‌ element to achieve this project is that plenty‌ of real data is available thanks to our‌ current partnership with Pass Culture, an app used‌ by the French government to provide a budget‌ ranging from 20 to 300 euros for every‌ 15 to 18 years old in order to‌ purchase culture goods. These works will be done‌ between Soda team and Kyoto University, with the‌ help of CNRS.

10.2 International research visitors

10.2.1‌ Visits of international scientists

Other international visits to‌ the team

Tomas Rigaux

Status
PhD student
Institution‌ of origin:
Kyoto University
Country:
Japan
Dates:
August‌ 2025
Context of the visit:
Work on reinforcement‌ learning in graph neural networks and applications to‌ recommender systems
Mobility program/type of mobility:
Research stay‌ within associate team RED

10.2.2 Visits to international‌ teams

Research stays abroad

Jill-Jênn Vie

Visited institution:‌
Kyoto University
Country:
Japan
Dates:
December 2024-February 2025‌
Context of the visit:
Work on applications to‌ education and recommender systems
Mobility program/type of mobility:‌
Research stay within associate team RED

10.3 European‌ initiatives

10.3.1 Horizon Europe

INTERCEPT-T2D

INTERCEPT-T2D project on‌ cordis.europa.eu

Title:
Early Interception of Inflammatory-mediated Type 2‌ Diabetes
Duration:
From January 1, 2023 to December‌ 31, 2027
Partners:
- INSTITUT NATIONAL DE RECHERCHE EN‌ INFORMATIQUE ET AUTOMATIQUE (INRIA), France
- UNIVERSITA DEGLI STUDI‌ DI VERONA (UNIVR), Italy
- INSTITUT NATIONAL DE LA‌ SANTE ET DE LA RECHERCHE MEDICALE (INSERM), France‌
- UNIVERSITAT BASEL, Switzerland
- ASSISTANCE PUBLIQUE HOPITAUX DE PARIS, France
- DEUTSCHE DIABETES FORSCHUNGSGESELLSCHAFT‌ EV (DDFG), Germany
- FEDERATION‌ FRANCAISE DES DIABETIQUES, France‌‌
- INSERM TRANSFERT SA, France
- Olatec Therapeutics, BV (Olatec‌ Therapeutics, BV), Netherlands
- CENTRE‌ HOSPITALIER UNIVERSITAIRE DE LIEGE‌‌ (CHUL), Belgium
- UNIVERSITE DE LA REUNION (UR), France‌
- KAROLINSKA INSTITUTET (KAROLINSKA INSTITUTE),‌ Sweden
- UNIVERSITATSSPITAL BASEL (KANTONSSPITAL‌‌ BASEL), Switzerland
- TECHNISCHE UNIVERSITAET DRESDEN (TUD), Germany
Inria‌ contact:
Gael Varoquaux
Coordinator:‌
Summary:

The overall concept‌‌ of INTERCEPT-T2D is to establish whether an inflammatory-mediated‌ profile contributes to the‌ onset of Type 2‌‌ Diabetes (T2D) complications, thus enabling the identification of‌ patients most at risk‌ of complications and the‌‌ design of personalized prevention measures.

T2D is a‌ heterogeneous disease, which is‌ an obstacle to the‌‌ delivery of an optimal tailored treatment. Consequently, patients’‌ individual trajectories of progressive‌ hyperglycemia and risk of‌‌ chronic complications are so far difficult to predict.‌ In this context, onset‌ of diabetic complications represents‌‌ the most important transitional phase of T2D development‌ toward premature disability and‌ mortality.

Chronic systemic inflammation‌‌ has been suggested to be a major contributor‌ to the onset and‌ progression of T2D complications.‌‌ INTERCEPT-T2D will bring a new and clinically relevant‌ dimension in T2D care‌ considering at diagnosis inflammatory‌‌ parameters that are of importance for the transition‌ to T2D-related complications. The‌ combination of state-of-the-art genomics‌‌ and cell-biology technologies with targeted clinical interventions should‌ lead to potent patients’‌ stratification. It should allow‌‌ the identification and prognosis of a novel class‌ or subclass of patients‌ characterized by an “Inflammatory-mediated‌‌ T2D” endotype.

The project has access to the‌ best-documented longitudinal human European‌ cohorts of patients with‌‌ T2D, with reliable clinical and biological data allowing‌ to trace the transition‌ and evolution towards organ‌‌ complications. This, added to the exploitation of an‌ extensive health data warehouse,‌ will enable us to‌‌ establish the inflammatory trajectory of citizens with T2D‌ from diagnosis to the‌ development of complications.

To‌‌ explore the ability to prevent the transition phase‌ of T2D towards organ‌ complications, INTERCEPT-T2D will conduct‌‌ a phase II clinical trial with an anti-inflammatory‌ therapy targeting NLRP3 Inflammasome‌ activity in patients with‌‌ T2D.

RECeSS

RECeSS project on cordis.europa.eu

Title:
Robust‌ Explainable Controllable Standard for‌ drug Screening
Duration:
From‌‌ May 1, 2023 to April 30, 2025
Partners:‌
- INSTITUT NATIONAL DE RECHERCHE‌ EN INFORMATIQUE ET AUTOMATIQUE‌‌ (INRIA), France
- UNIVERSITAET ROSTOCK (UROS), Germany
Inria contact:‌
Jill-Jênn Vie
Coordinator:
Summary:‌

In 2021, drug development‌‌ pipelines last 10 years in average, and cost‌ around $2 billion, while‌ facing high failure rates,‌‌ as only around 10% of Phase 0 drug‌ candidates reach the commercialization‌ stage. These issues can‌‌ be mitigated through drug repurposing, where existent compounds‌ are systematically screened for‌ new therapeutic indications. Collaborative‌‌ filtering is a semi-supervised learning framework that leverages‌ known drug-disease matchings to‌ make novel recommendations. However,‌‌ prior works cannot be leveraged because of their‌ lack of focus on‌ human oversight and robustness‌‌ to biological data.

This project aims at bridging‌ the gap between drug‌ research and collaborative filtering‌‌ by implementing a RECeSS‌ classifier, that is

(1) Robust: deals with class‌ imbalance in drug-disease matchings, and missing drug/disease features,‌ by semi-supervised learning;

(2) Explainable: connects predicted matchings‌ to perturbed biological pathways through enrichment analyses, based‌ on the learnt importance of features in the‌ model;

(3) Controllable: guarantees a bound on the‌ false positive rate using an adaptive learning scheme;‌

(4) Standard: algorithms are trained and tested by‌ a standardized open-source pipeline.

Predicted matchings will be‌ independently validated by structure-based methods. This innovative interdisciplinary‌ project relies on a solid basis of newly‌ curated data (up to 1,386 drugs, 1,599 diseases,‌ 12 feature types). It is primarily supervised by‌ Pr. Olaf Wolkenhauer, at SBI Rostock, whose team‌ has an expertise in drug repurposing, in systems‌ biology and data imbalance in machine learning. This‌ project will help the fellow develop new skills,‌ and enhance her professional maturity in academia.

In‌ the short term, this would yield the first‌ method that fully integrates biological interpretation and risk‌ assessment to collaborative filtering-based repurposing. Long-term outcomes might‌ help define sustainable and transparent drug development for‌ rare diseases.

10.4 National initiatives

PEPR Santé Numérique‌

Soda is part of the “PEPR Santé Numérique”‌ in the SMATCH subgroup that focuses on evidence‌ of clinical efficacy. Soda will address two questions.‌ The first question, addressed in collaboration with the‌ PreMedical team, is that of external validity of‌ randomized trials: how much is the treatment effect‌ measured in a randomized clinical trial affected by‌ the sampling bias of the trial, the difference‌ between the study population and the intended target‌ population. The second question, addressed in collaboration with‌ the Heka team, is that of defining guidelines‌ to evaluate software as a medical device. One‌ particular challenge that we will tackle is to‌ give procedures and recommendations to evaluate an update‌ to a software used in clinical decision making‌ using historical data rather than a trial. The‌ project started end of 2023. Gaël Varoquaux is‌ in charge at Soda, and Judith Abecassis is‌ also supervising.

Project Partages

“Partages” is a large‌ project funded by BPI France to develop digital‌ commons for medical text analysis. In particular, the‌ project will create material suitable for fine-tuning or‌ aligning language models to perform best on French‌ medical texts. Beyond the medical terms, there are‌ specific challenges of clinical texts: these often result‌ from scanning notes that have been taken fast,‌ full of context-specific abbreviations and typos. The role‌ of Soda is to design data-augmentation routine that‌ help making language models robust to these challenges.‌ The project started end of 2024. Gaël Varoquaux‌ is in charge at Soda, and Judith Abecassis‌ is also supervising.

ANR StatQA

Marine Le Morvan‌ obtained an ANR JCJC (2025-2029, 305 k€). LLMs‌ provide unprecedented access to information, but their statistical‌ reasoning abilities remain limited. We introduce the concept‌ of Statistical Question Answering (StatQA) to designate their‌ capacity to address quantitative, non-deterministic questions with calibrated uncertainty. Our objectives are‌ twofold: first, to assess‌ the statistical soundness of‌‌ LLMs’ responses using institutional datasets (INSEE, Eurostat); second,‌ to develop multimodal approaches‌ that integrate tabular models‌‌ with natural language. This work aims to enhance‌ the reliability and precision‌ of LLM outputs.

ANR‌‌ TaFoMo

Gaël Varoquaux obtained an ANR PRCE (2025-2029,‌ 438 k€, partners Fabian‌ Suchanek at Telecom Paris‌‌ and Antoine Neuraz at Stane Group). The goal‌ is to create Table‌ Foundation Models, pre-trained on‌‌ large collections of tables, embedding rich knowledge for‌ subsequent machine-learning tasks. The‌ project involves 3 axis:‌‌ 1) developing new architectures, that handle different data‌ types and multiple tables,‌ 2) pre-training models with‌‌ diverse large data from sources like Wikidata and‌ DBpedia, drawing on PIs‌ expertise in databases and‌‌ knowledge graphs, and 3) and rigorously evaluating models‌ across tasks, including health‌ applications, to confirm their‌‌ practical value and robustness to data variations.

ANR‌ ICPC

Jill-Jênn Vie obtained‌ an ANR JCJC (2025-2029,‌‌ 238 k€). The goal is to develop an‌ assistant to learn programming‌ by solving algorithmic problems‌‌ like in coding contests. We plan to generate‌ hints without revealing the‌ solution, while exploring automatic‌‌ testcase generation to break an incorrect solution, to‌ encourage robustness. We also‌ plan to generate or‌‌ recommend exercises within the proximal zone of development‌ to keep students engaged.‌ The project will feature‌‌ actual experiments of the developed systems in classes,‌ for example in high‌ school.

10.5 Public policy‌‌ support

Conseil Scientifique CNIL

Gaël Varoquaux is a‌ scientific expert at the‌ scientific committee of CNIL,‌‌ the French data protection authority.

11 Dissemination

11.1‌ Promoting scientific activities

11.1.1‌ Scientific events: organisation

Member‌‌ of the organizing committees

Julie Alberge

NeurIPS in‌ Paris organizing committee

11.1.2‌ Scientific events: selection

Member‌‌ of the conference program committees

Gaël Varoquaux

AAAI‌ 2026 Conference Senior Program‌ Committee
ICLR 2026 Conference‌‌ Area Chairs
NeurIPS 2025 Conference Senior Area Chairs‌
ICML 2025 Meta reviewer‌

Jill-Jênn Vie

EDM 2025‌‌ Conference Senior Program Committee

Reviewer

Gaël Varoquaux

AISTATS‌ 2026 Reviewer
NeurIPS 2025‌ Workshop Reviewer

David Holzmüller‌‌

NeurIPS 2025 Reviewer
ICML 2025 Workshop Reviewer

Jill-Jênn‌ Vie

ICLR 2025 and‌ 2026 Reviewer

Judith Abécassis‌‌

ICML 2025 Reviewer
ICLR 2026 Reviewer
AIstats 2026‌ Reviewer
NeurIPS 2025 Datasets‌ and Benchmarks Track Reviewer‌‌

Marine Le Morvan

ICML 2025 Reviewer
ICLR 2026‌ Reviewer

11.1.3 Journal

Member‌ of the editorial boards‌‌

Jill-Jênn Vie

STICEF – Cadre d'usage et de‌ fonctionnement des IA génératives‌ (IAG) en éducation

Reviewer‌‌ - reviewing activities

Judith Abécassis

TMLR Reviewer
special‌ issue TAL 66(2) Reviewer‌ (Traitement automatique des langues)‌‌
The Annals of Applied Statistics Reviewer

11.1.4 Invited‌ talks

Gaël Varoquaux

Académie‌ Royale de Médecine, Bélgique,‌‌ journée sur l'IA et la santé, Brussels
Ellis-Helmoltz‌ workshop on Foundation models‌ for science, Berlin
End-to-end‌‌ data processing workshop, sigmod, Berlin
Entente CordIAle Franco-English‌ meetings, Cambridge
Journée de‌ la santé de Santé,‌‌ APHP, Paris
Indaba Chad, N'Djamena Chad (remote)
Isaac‌ Newton Institute, Cambridge
Critéo‌ AI ethics days, Criteo,‌‌ Paris
Dagstuhl workshop on‌ Table Representation Learning, Dagstuhl, Germany
Dali workshop, Sorrento,‌ Italy
Python Exchange, remote
ESMRMB keynote, Marseilles, France‌
EurIPS keynote, Copenhagen, Denmark
EurIPS benchmarking keynote, Copenhagen,‌ Denmark
Congrès de la Société Française de Physique,‌ Troyes, France
NeurIPS in Paris keynote, Paris
Séminaire‌ Owkin
Polish academy day on AI in science,‌ Paris
P16 annual days, Paris
PyData London Meeting,‌ London
Telecom Student association, Saclay, France
Teratec annual‌ event keynote, Paris
Journées de la SFDS sur‌ l'incertitude, Paris
VLDB panel on tabular foundation models,‌ London

David Holzmüller

Group seminar, University of Amsterdam,‌ Amsterdam, Netherlands
Tabular foundation models workshop, Freiburg, Germany‌
AutoML School 2025, Tübingen, Germany
PriorLabs reading group,‌ remote
Group seminar, RWTH Aachen, Aachen, Germany
Group‌ seminar, LMU München, Munich, Germany

Jill-Jênn Vie

A‌ Pre-Trained Graph-Based Model for Adaptive Sequencing of Educational‌ Documents, Kyoto University, Japan, January 29, 2025
Efficiency‌ and environmental impact of LLMs, Inria Foresight Seminar,‌ Rungis, March 26, 2025
A Pre-Trained Graph-Based Model‌

for Adaptive Sequencing of Educational Documents, IRIT, Toulouse,‌ July 7, 2025
Optimal Training Difficulty for Optimizing‌ Learning Outcomes, Saclay PhD students day in STIC,‌ Télécom Paris, Palaiseau, October 2, 2025

Judith Abécassis‌

VITE2025 : Explanability for high-dimensional statistics, Montpellier
Group‌ Seminar, iPLesp, Paris
Medical interns' seminar in Neurology‌ at Lariboisière hospital, Paris
Introduction to AI with‌ EHR for anesthesia and intensive care residents, Paris‌
Paris Health AI Workshop, Paris

Marine Le Morvan‌

Keynote at EurIPS'25 Workshop on AI for Tabular‌ Data, Copenhagen, Denmark, December 2025
Keynote at Junior‌ Conference on Data Sciences and Engineering, Paris, France,‌ September 2025
Probabilities and statistics seminar, Laboratoire de‌ Mathématiques d’Orsay, France, June 2025
Table Representation Learning‌ (TRL) seminar, ELLIS Unit Amsterdam, Netherlands, April 2025‌

11.1.5 Leadership within the scientific community

Gaël Varoquaux‌

Expert on the International AI Safety Report 2025‌

11.1.6 Scientific expertise

Gaël Varoquaux

Reviewer for the‌ general funding call at ANR (AAPG)

Jill-Jênn Vie‌

Organisation internationale de la francophonie

Judith Abécassis

Reviewer‌ for the Messidore AAP (Inserm)

11.2 Teaching -‌ Supervision - Juries - Educational and pedagogical outreach‌

Courses

Gaël Varoquaux
- Preparing tabular data for machine‌ learning, tutorial, EU ADS summer school, 3h, Luxembourg‌
- Health AI summer school, Paris, France, 30 mn‌
Marine Le Morvan
- APM_53441_EP - From Boosting to‌ Foundation Models: learning with Tabular Data, Ecole Polytechnique‌ (Master 2), 30h
- APM_51438_EP - Refresher Course in‌ Artificial Intelligence, Ecole Polytechnique (Master 1), 15h
- Learning‌ with missing values, Dauphine executive master, 6h
Jill-Jênn‌ Vie
- Deep Learning, ENS Paris, 27 h
- CSC_41M02_EP‌ Algorithms and Advanced Programming (ICPC training), École polytechnique,‌ 18 h
- SWERC training, ENS Paris-Saclay, 30 h‌ éq. TD
- Tabular Deep Learning, Institut Polytechnique de‌ Paris, 1 h
- Computer Vision, Ecole polytechnique, 45‌ h
Judith Abécassis
- Causal Inference DS-UA 9201, NYU‌ Paris, Spring 2025, 56h
- AI for Healthcare, Centrale‌ Supelec and Essec M2 (Data Sciences & Business‌ Analytics), 24h

11.2.1 Supervision

Gaël Varoquaux

PhD supervision‌
- Jovan Stojanovic (50%), co-supervised with Margherita Comola (Paris‌ School of Economics)
- Julie Alberge (30%), co-supervised with Judith Abecassis (Soda, Inria)‌
- Sebastien Melo (30%), co-supervised‌ with Marine Le Morvan‌‌ (Soda, Inria)
- Celestin Eve (50%), co-supervised with Thomas‌ Moreau (Mind, Inria)
- Meilame‌ Tayebjee (50%), co-supervised with‌‌ Guillaume Lecué (ENSAE)
- Félix Lefevbre
- Emma Cussenot, since‌ December 2025 (25%), co-supervised‌ with Judith Abecassis (Soda,‌‌ Inria) and Louis Potier (AP-HP, Université Paris-Cité)
- Gioia‌ Blayer, since November 2025‌ (70%), co-supervised with Marine‌‌ le Morvan (Soda, Inria)
Internships
- Emma Cussenot (50%,‌ co-supervised with Judith Abecassis‌ (Soda, Inria)
- Dan Suissa‌‌ (30%, co-supervised with Judith Abecassis (Soda, Inria)

Jill-Jênn‌ Vie

PhD supervision
- Jean‌ Vassoyan (33%), co-supervised with‌‌ Nicolas Vayatis
- Samuel Girard (33%), co-supervised with Amel‌ Bouzeghoub
- Marie Generali-Lince (33%),‌ co-supervised with Patrick Loiseau‌‌ (FairPlay) and Solenne Gaucher (École polytechnique)
Internships
- Anav‌ Agrawal (L2), IIT Delhi‌

Judith Abécassis

PhD supervision‌‌
- Julie Alberge (30%), co-supervised with Gaël Varoquaux (Soda,‌ Inria)
- Emma Cussenot, since‌ December 2025 (25%), co-supervised‌‌ with Louis Potier (AP-HP, Université Paris-Cité) and Gaël‌ Varoquaux (Soda, Inria)
- Thaïs‌ Walter, since September 2025‌‌ (50%), co-supervised with Jean-Damien Ricard (AP-HP, Paris University)‌
Internships
- Emma Cussenot (50%,‌ co-supervised with Gaël Varoquaux‌‌ (Soda, Inria)
- Dan Suissa (70%, co-supervised with Gaël‌ Varoquaux (Soda, Inria)
- Guillaume‌ Bertho (33%), co-supervised with‌‌ Adrien Coulet (HeKA, Inria) and Eric Jouvent (AP-HP,‌ Université Paris-Cité)

Marine Le‌ Morvan

PhD supervision
- Sebastien‌‌ Melo (70%), co-supervised with Gaël Varoquaux (Soda, Inria)‌
- Gioia Blayer (10%), co-supervised‌ with Gaël Varoquaux (Soda,‌‌ Inria)
Internships
- Vlada Voronina (70%), co-supervised with Oana‌ Balalau (Cedar, INRIA)

11.2.2‌ Juries

Gaël Varoquaux

PhD‌‌ and HDR jury
- PhD Committee of Elena Albu,‌ KU Leuven, Belgium
- PhD‌ Committee of Arnaud Delaunoy,‌‌ Université de Liège, Belgium
- PhD Committee of Nicolas‌ Hiebel, LISN Saclay, France‌
- PhD Committee of Lawrence‌‌ Steward, Inria Paris, France
- PhD Committee of Charbel‌ Kindji, Inria Rennes, France‌
- HDR Committee of Cedric‌‌ Gouypailler, CEA, France
Jury of the DataE grants‌ from Ministère de la‌ Santé

Jill-Jênn Vie

PhD‌‌ midway committee
- Loris Gaven, Inria Bordeaux, France
- Badmavasan‌ Kirouchenassamy, Sorbonne University, France‌
- Anass El-Ayady, Université de‌‌ Lorraine, France
Jury of agrégation d'informatique
Jury of‌ École polytechnique entrance examinations‌

Judith Abécassis

PhD midway‌‌ committee
- Wassila Khatir, Université Côte d'Azur, France
- Ala‌ Eddine Boudemia, Sorbonne University,‌ France
PhD jury
- PhD‌‌ Committee of Yannis Lombardi (as examinatrice), Sorbonne University,‌ France

Marine Le Morvan‌

Jury for Associate Professor‌‌ position in Statistics and Machine Learning, Sorbonne Universit´e‌ (Jussieu).

11.2.3 Educational and‌ pedagogical outreach

Gaël Varoquaux‌‌

Chroniqueur Les Échos: every 3 months, a short‌ article for the general‌ public around an AI‌‌ topic
Talk on AI at the “amicale du‌ corps des mines”
Panel‌ on AI and health‌‌ at the AI action summit in Grand Palais‌

Jill-Jênn Vie

Risques et‌ opportunités de l'IA en‌‌ éducation, formation des enseignants, École supérieure d'ingénieurs Léonard‌ de Vinci, Courbevoie, 10‌ avril 2025
Research in‌‌ personalized education, teaching competitive programming, ENS Paris-Saclay, Gif-sur-Yvette,‌ April 11, 2025
Intelligence‌ artificielle, Algorithmique et‌‌ programmation, CIRM (50 prep school teachers in computer‌ science), Marseille, May 8,‌ 2025
Un système de‌‌ recommandation de problèmes d'algo‌ pour préparer Prologin et ICPC, Finals of Prologin‌ Programming Contest 2025 (100 students under 20 years‌ old), Le Kremlin-Bicêtre, May 31, 2025
Systèmes de‌ recommandation industriels et LLM pour la recommandation, Online‌ Pix Webinar, June 17, 2025
Apprendre à l'heure‌ de l'IA, Centre Teilhard de Chardin, Orsay‌ and online, November 27, 2025

11.3 Popularization

11.3.1‌ Participation in Live events

Gaël Varoquaux

Talk on‌ AI at an event for IT professionals at‌ Lyon (Generation IA, ADIRA)
Talk on AI and‌ health at a general-public event organized at Antony‌
Talk on tabular AI at the dotAI tech‌ conference Antony
Talk at BNP Paribas's data and‌ AI annual event

Judith Abécassis

public recording of‌ a public episode podcast "Nouvelles Héroïnes" at Inria‌ Saclay, for the "Les Rendez-vous des Jeunes Mathématiciennes‌ et Informaticiennes (RJMI)" days

11.3.2 Others science outreach‌ relevant activities

Judith Abécassis

Organization of Inria Women‌ Lunches at Inria Saclay

12 Scientific production

12.1‌ Major publications

1 articleB.Ben van Calster‌, G. S.Gary S Collins, A.‌ J.Andrew J Vickers, L.Laure Wynants‌, K. F.Kathleen F Kerr, L.‌Lasai Barreñada, G.Gael Varoquaux, K.‌Karandeep Singh, K. G.Karel Gm Moons‌, T.Tina Hernandez-Boussard, D.Dirk Timmerman‌, D. J.David J Mclernon, M.‌Maarten van Smeden and E. W.Ewout W‌ Steyerberg. Evaluation of performance measures in predictive‌ artificial intelligence models to support medical decisions: overview‌ and guidance.The Lancet Digital HealthDecember‌ 2025, 100916HALDOI back to text‌
2 articleR.Riccardo Cappuzzo, A.Aimee‌ Coelho, F.Félix Lefebvre, P.Paolo‌ Papotti and G.Gaël Varoquaux. Retrieve, Merge,‌ Predict: Augmenting Tables with Data Lakes.Transactions‌ on Machine Learning Research JournalMay 2025HAL‌back to text
3 articleM.Matthieu Doutreligne‌, T.Tristan Struja, J.Judith Abecassis‌, C.Claire Morgand, L. A.Leo‌ Anthony Celi and G.Gaël Varoquaux. Step-by-step‌ causal analysis of EHRs to ground decision-making.‌PLOS Digital Health42February 2025,‌ e0000721HAL DOI back to text
4 inproceedings‌N.Nick Erickson, L.Lennart Purucker,‌ A.Andrej Tschalzev, D.David Holzmüller,‌ P. M.Prateek Mutalik Desai, D.David‌ Salinas and F.Frank Hutter. TabArena: A‌ Living Benchmark for Machine Learning on Tabular Data‌.The Thirty-ninth Annual Conference on Neural Information‌ Processing Systems Datasets and Benchmarks TrackSan Diego,‌ United States2025HALDOI back to text‌
5 inproceedingsM.Marine Le Morvan and G.‌Gaël Varoquaux. Imputation for prediction: beware of‌ diminishing returns.International Conference on Learning Representations‌ICLR 2025 - International Conference on Learning Representations‌Singapore, SingaporeApril 2025HAL back to text‌
6 articleP.Pauline Martinot, B.Bénédicte‌ Colnet, T.T. Breda, J.J.‌ Sultan, L.L. Touitou, P.P. Huguet, E.Elizabeth‌ Spelke, G.G.‌ Dehaene-Lambertz, P.P.‌‌ Bressoux and S.Stanislas Dehaene. Rapid emergence‌ of a maths gender‌ gap in first grade‌‌.Nature6438073June 2025, 1020‌ - 1029HAL DOI‌back to text
7‌‌ inproceedingsJ.Jingang Qu, D.David Holzmüller‌, G.Gaël Varoquaux‌ and M.Marine Le‌‌ Morvan. TabICL: A Tabular Foundation Model for‌ In-Context Learning on Large‌ Data.ICML 2025‌‌ - 42nd International Conference on Machine LearningVancouver,‌ CanadaJuly 2025HAL‌back to text
8‌‌ inproceedingsG.Gaël Varoquaux, A. S.Alexandra‌ Sasha Luccioni and M.‌Meredith Whittaker. Hype,‌‌ Sustainability, and the Price of the Bigger-is-Better Paradigm‌ in AI.FAccT‌ 2025 - ACM Conference‌‌ on Fairness, Accountability, and TransparencyAthens, GreeceJuly‌ 2025HAL back to‌ text back to text‌‌back to text
9 inproceedingsH.Houssam Zenati‌, J.Judith Abécassis‌, J.Julie Josse‌‌ and B.Bertrand Thirion. Double Debiased Machine‌ Learning for Mediation Analysis‌ with Continuous Treatments.‌‌Proceedings of Machine Learning ResearchAISTATS - 28th‌ International Conference on Artificial‌ Intelligence and StatisticsPMLR-‌‌Mai Khao, ThailandMay 2025HAL

12.2 Publications‌ of the year

International‌ journals

10 articleJ.‌‌Judith Abécassis, É.Élise Dumas, J.‌Julie Alberge and G.‌Gaël Varoquaux. From‌‌ prediction to prescription: Machine learning and Causal Inference‌.Annual Review of‌ Biomedical Data ScienceApril‌‌ 2025HAL DOI
11 articleR.Rishi Bommasani‌, S.Sanjeev Arora‌, J.Jennifer Chayes‌‌, Y.Yejin Choi, M.-F.Mariano-Florentino Cuéllar‌, L.Li Fei-Fei‌, D. E.Daniel‌‌ E Ho, D.Dan Jurafsky, S.‌Sanmi Koyejo, H.‌Hima Lakkaraju, A.‌‌Arvind Narayanan, A.Alondra Nelson, E.‌Emma Pierson, J.‌Joelle Pineau, S.‌‌Scott Singer, G.Gaël Varoquaux, S.‌Suresh Venkatasubramanian, I.‌Ion Stoica, P.‌‌Percy Liang and D.Dawn Song. Advancing‌ science- and evidence-based AI‌ policy: Policy must be‌‌ informed by, but also facilitate the generation of,‌ scientific evidence.Science‌3896759July 2025‌‌, 459-461HAL DOIback to text
12‌ articleB.Ben van‌ Calster, G. S.‌‌Gary S Collins, A. J.Andrew J‌ Vickers, L.Laure‌ Wynants, K. F.‌‌Kathleen F Kerr, L.Lasai Barreñada,‌ G.Gael Varoquaux,‌ K.Karandeep Singh,‌‌ K. G.Karel Gm Moons, T.Tina‌ Hernandez-Boussard, D.Dirk‌ Timmerman, D. J.‌‌David J Mclernon, M.Maarten van Smeden‌ and E. W.Ewout‌ W Steyerberg. Evaluation‌‌ of performance measures in predictive artificial intelligence models‌ to support medical decisions:‌ overview and guidance.‌‌The Lancet Digital HealthDecember 2025, 100916‌HAL DOI
13 article‌R.Riccardo Cappuzzo,‌‌ A.Aimee Coelho, F.Félix Lefebvre,‌ P.Paolo Papotti and‌ G.Gaël Varoquaux.‌‌ Retrieve, Merge, Predict: Augmenting‌ Tables with Data Lakes.Transactions on Machine‌ Learning Research JournalMay 2025HAL
14 article‌S.Sylvie Delacroix, D.Diana Robinson,‌ U.Umang Bhatt, J.Jacopo Domenicucci,‌ J.Jessica Montgomery, G.Gaël Varoquaux,‌ C. H.Carl Henrik Ek, V.Vincent‌ Fortuin, Y.Yulan He, T.Tom‌ Diethe, N.Neill Campbell, M.Mennatallah‌ El-Assady, S.Søren Hauberg, I.Ivana‌ Dusparic and N. D.Neil D Lawrence.‌ Beyond Quantification: Navigating Uncertainty in Professional AI Systems‌.RSS: Data Science and Artificial Intelligence1‌1January 2025HALDOI
15 articleM.‌Matthieu Doutreligne, T.Tristan Struja, J.‌Judith Abecassis, C.Claire Morgand, L.‌ A.Leo Anthony Celi and G.Gaël Varoquaux‌. Step-by-step causal analysis of EHRs to ground‌ decision-making.PLOS Digital Health42February‌ 2025, e0000721HALDOI
16 article M.‌Matthieu Doutreligne and G.Gaël Varoquaux. How‌ to select predictive models for decision making or‌ causal inference? GigaScience 14 January 2025 HAL DOI‌
17 articleR.Rosana El Jurdi, G.‌Gaël Varoquaux and O.Olivier Colliot. Confidence‌ intervals for performance estimates in brain MRI segmentation‌.Medical Image Analysis103July 2025,‌ 103565HAL DOI
18 articleS.-E.Stéphan-Eloïse Gras‌ and G.Gaël Varoquaux. Connaître avec les‌ modèles de langage : une rupture paradigmatique.‌Intellectica - La revue de l’Association pour la‌ Recherche sur les sciences de la Cognition (ARCo)‌81February 2025, 85-110HAL
19 article‌A.-S.Anne-Sophie Hamy, A.Agathe Chabassier,‌ C.Clara Sebbag, C.Christine Rousset-Jablonski,‌ C.Clémentine Berkach, I.Isabelle Ray-Coquard,‌ L.Laura Sablone, L.Lauren Darrigues,‌ E.Elise Dumas, A.Angélique Bobrie,‌ W.William Jacot, M.Marc Espié,‌ S.Sylvie Giacchetti, F.Floriane Jochum,‌ A.Aullène Toussaint, G.Geneviève Plu-Bureau,‌ L.Lorraine Maitrot-Mantelet, A.Anne Gompel,‌ P.Paul Gougis, R.Raphaëlle Bas,‌ C.Christine Decanter, B.Bernard Asselain,‌ C.Charles Coutant, L.Lili Sohn,‌ G.Guillemette Jacob, C.Claire Saule,‌ S.Sophie Frank, J.Judith Abécassis,‌ F.Florence Coussy, F.Fabien Reyal and‌ S.Seintinelles Research Network. Time-to-pregnancy in patients‌ with previous breast cancer and unexposed women: a‌ prospective exposed-unexposed cohort study.EClinicalMedicine86July‌ 2025, 103392HALDOI
20 articleJ.-B.‌Jean-Baptiste Julla, T.Théo Jolivet, C.‌Candice Estellat, G.Gaël Varoquaux, A.‌Aurélie Carlier, J.-F.Jean-François Gautier, J.‌Julie Alberge, Y.Yawa Abouleka, A.‌Audrey Bergès, E.Elise Liu, J.‌Judith Abécassis, F.Florence Tubach and L.‌Louis Potier. Incidence of death and amputation‌ in patients with a first diabetic foot ulcer:‌ results from the CODIA cohort.Diabetes & Metabolism516September‌ 2025, 101700HAL‌DOI
21 articleM.‌‌Marimuthu Kalimuthu, D.David Holzmüller and M.‌Mathias Niepert. LOGLO-FNO:‌ Efficient Learning of Local‌‌ and Global Features in Fourier Neural Operators.‌Transactions on Machine Learning‌ Research Journal2025HAL‌‌DOI
22 articleE. N.Erva Nihan Kandemir‌, J.-J.Jill-Jênn Vie‌, A.Adam Sanchez-Ayte‌‌, O.Olivier Palombi and F.Franck Ramus‌. Investigating the Influence‌ of Training Difficulty on‌‌ the Learning Outcomes of Medical Students.Journal‌ of Computer Assisted Learning‌421February 2026‌‌HAL DOI
23 articleM. J.Myung Jun‌ Kim, F.Félix‌ Lefebvre, G.Gaëtan‌‌ Brison, A.Alexandre Perez-Lebel and G.Gaël‌ Varoquaux. Table Foundation‌ Models: on knowledge pre-training‌‌ for tabular learning.Transactions on Machine Learning‌ Research JournalAugust 2025‌HAL
24 articleP.‌‌Pauline Martinot, B.Bénédicte Colnet, T.‌T. Breda, J.‌J. Sultan, L.‌‌L. Touitou, P.P. Huguet, E.‌Elizabeth Spelke, G.‌G. Dehaene-Lambertz, P.‌‌P. Bressoux and S.Stanislas Dehaene. Rapid‌ emergence of a maths‌ gender gap in first‌‌ grade.Nature6438073June 2025,‌ 1020 - 1029HAL‌DOI
25 articleC.‌‌Clémence Réda, J.-J.Jill-Jênn Vie and O.‌Olaf Wolkenhauer. Comprehensive‌ evaluation of pure and‌‌ hybrid collaborative filtering in drug repurposing.Scientific‌ Reports152711January‌ 2025, 2711HAL‌‌DOI
26 articleC.Clémence Réda, J.-J.‌Jill-Jênn Vie and O.‌Olaf Wolkenhauer. Joint‌‌ Embedding-Classifier Learning for Interpretable Collaborative Filtering.BMC‌ Bioinformatics261January‌ 2025, 26HAL‌‌DOI
27 articleZ.Zacchary Sadeddine, W.‌Winston Maxwell, G.‌Gaël Varoquaux and F.‌‌ M.Fabian M. Suchanek. Large Language Models‌ as Search Engines: Societal‌ Challenges.Sigir Forum‌‌2025HAL
28 articleT.Taehwan Yun,‌ M. J.Myung Jun‌ Kim and H.Hyunjung‌‌ Shin. Extremely fast graph integration for semi-supervised‌ learning via Gaussian fields‌ with Neumann approximation.‌‌Pattern Recognition164August 2025, 111495HAL‌DOI

International peer-reviewed conferences‌

29 inproceedingsJ.Judith‌‌ Abécassis, H.Houssam Zenati, S.Sami‌ Boumaïza, J.Julie‌ Josse and B.Bertrand‌‌ Thirion. CO11.2 - Explorer les fonctions cognitives‌ dans UK Biobank avec‌ une analyse de médiation‌‌ causale.EPICLIN 2025 - Conférence francophone d’EPIdémiologie‌ CLINique73Bordeaux, France‌May 2025, 203025‌‌HAL DOI
30 inproceedingsA.Anav Agrawal and‌ J.-J.Jill-Jênn Vie.‌ AlgoAce: Retrieval-Augmented Generation for‌‌ Assistance in Competitive Programming.Proceedings of 9th‌ Educational Data Mining in‌ Computer Science Education Workshop‌‌ (CSEDM 2025)CSEDM 2025 - 9th Educational Data‌ Mining in Computer Science‌ Education WorkshopPalermo, Italy‌‌July 2025HAL DOI
31 inproceedingsJ.Julie‌ Alberge, V.Vincent‌ Maladière, O.Olivier‌‌ Grisel, J.Judith Abécassis and G.Gaël‌ Varoquaux. Survival Models:‌ Proper Scoring Rule and‌‌ Stochastic Optimization with Competing‌ Risks.Proceedings of the 28th International Conference‌ on Artificial Intelligence and Statistics (AISTATS) 2025, Mai‌ Khao, Thailand. PMLR: Volume 258.AISTATS 2025 -‌ 28th International Conference on Artificial Intelligence and Statistics‌Phuket, ThailandMay 2025HAL back to text‌
32 inproceedingsP.Pascaline André, C.Charles‌ Heitz, E.Evangelia Christodoulou, A.Annika‌ Reinke, C. H.Carole H Sudre,‌ M.Michela Antonelli, M. J.M Jorge‌ Cardoso, A.Antoine Gilson, S.Sophie‌ Tezenas Du Montcel, G.Gaël Varoquaux,‌ L.Lena Maier-Hein and O.Olivier Colliot.‌ Some hidden traps of confidence intervals in medical‌ image segmentation: coverage issues.Lecture Notes in‌ Computer ScienceBRIDGE 2025 - MICCAI Workshop Bridging‌ Regulatory Science and Medical Imaging EvaluationMICCAI Workshops‌Deajeon, South KoreaSeptember 2025HAL
33 inproceedings‌D.Daniel Beaglehole, D.David Holzmüller,‌ A.Adityanarayanan Radhakrishnan and M.Mikhail Belkin.‌ xRFM: Accurate, scalable, and interpretable feature learning models‌ for tabular data.AITD 2025 – Workshop‌ on AI for Tabular DataCopenhagen, Denmark2025‌HAL DOI
34 inproceedingsS.Samuel Girard,‌ J. D.Juan D Pinto, J.-J.Jill-Jênn‌ Vie and A.Amel Bouzeghoub. RegKT: Interpretable‌ and Robust Deep Knowledge Tracing With IRT-Regularizer.‌2nd Human-Centric eXplainable AI in Education (HEXED) Workshop‌Palermo, ItalyJuly 2025HAL
35 inproceedingsC.‌Carole Ibrahim, H.Hiba Bederina, C.‌Cuesta Daniel, M.Montier Laurent, D.‌Delabre Cyrille and J.-J.Jill-Jênn Vie. Diversified‌ recommendations of cultural activities with personalized determinantal point‌ processes.RecSoGood 2025 - Second International Workshop‌ on Recommender Systems for Sustainability and Social Good‌Prague, Czech RepublicSeptember 2025HAL
36 inproceedings‌M.Marine Le Morvan and G.Gaël Varoquaux‌. Imputation for prediction: beware of diminishing returns‌.International Conference on Learning RepresentationsICLR 2025‌ - International Conference on Learning RepresentationsSingapore, Singapore‌April 2025HAL
37 inproceedingsF.Félix Lefebvre‌ and G.Gaël Varoquaux. Scalable Feature Learning‌ on Huge Knowledge Graphs for Downstream Machine Learning‌.Neural Information Processing SystemsNeurIPS 2025 -‌ 39th Annual Conference on Neural Information Processing Systems‌San Diego (California), United StatesDecember 2025HAL‌
38 inproceedingsA.Alexandre Perez-Lebel, G.Gaël‌ Varoquaux, S.Sanmi Koyejo, M.Matthieu‌ Doutreligne and M.Marine Le Morvan. Decision‌ from Suboptimal Classifiers: Excess Risk Pre-and Post-Calibration.‌Proceedings of the 28th International Conference on Artificial‌ Intelligence and Statistics (AISTATS) 2025, Mai Khao, Thailand.‌ PMLR: Volume 258.AISTATS 2025 - the 28th‌ International Conference on Artifi- cial Intelligence and Statistics‌Mai Khao, ThailandMay 2025HAL
39 inproceedings‌R.Roman Plaud, A.Alexandre Perez-Lebel,‌ M.Matthieu Labeau, A.Antoine Saillenfest and‌ T.Thomas Bonald. To Each Metric Its‌ Decoding: Post-Hoc Optimal Decision Rules of Probabilistic Hierarchical‌ Classifiers.ICML 2025 - 42nd International Conference‌ on Machine LearningVancouver (CA), CanadaJuly 2025HAL
40 inproceedingsJ.‌Jingang Qu, D.‌David Holzmüller, G.‌‌Gaël Varoquaux and M.Marine Le Morvan.‌ TabICL: A Tabular Foundation‌ Model for In-Context Learning‌‌ on Large Data.ICML 2025 - 42nd‌ International Conference on Machine‌ LearningVancouver, CanadaJuly‌‌ 2025HAL
41 inproceedingsM.Moreno Ursino,‌ S.Sandrine Boulet,‌ C.Corinne Collignon,‌‌ F.Florence Francis-Oliviero, E.Edouard Lhomme,‌ R.Raphaël Porcher,‌ F.Florence Saillour,‌‌ G.Gaël Varoquaux, V.Vincent Vercamer,‌ R.Rodolphe Thiébaut and‌ S.Sarah Zohar.‌‌ Innovative clinical trial approach for evaluating digital medical‌ devices under European HTA‌ fast-Track frameworks.ISCB‌‌ 2025 - 46th Annual Conference of the International‌ Society for Clinical Biostatistics‌Basel, SwitzerlandAugust 2025‌‌HAL
42 inproceedingsG.Gaël Varoquaux, A.‌ S.Alexandra Sasha Luccioni‌ and M.Meredith Whittaker‌‌. Hype, Sustainability, and the Price of the‌ Bigger-is-Better Paradigm in AI‌.FAccT 2025 -‌‌ ACM Conference on Fairness, Accountability, and TransparencyAthens,‌ GreeceJuly 2025HAL‌back to text
43‌‌ inproceedingsH.Houssam Zenati, J.Judith Abécassis‌, J.Julie Josse‌ and B.Bertrand Thirion‌‌. Double Debiased Machine Learning for Mediation Analysis‌ with Continuous Treatments.‌Proceedings of Machine Learning‌‌ ResearchAISTATS - 28th International Conference on Artificial‌ Intelligence and StatisticsPMLR-‌Mai Khao, ThailandMay‌‌ 2025HAL back to text

Conferences without proceedings‌

44 inproceedingsS.Sandrine‌ Boulet, V.Vincent‌‌ Thévenet, J.Judith Abécassis, C.Corinne‌ Collignon, B.Bruno‌ Giraudeau, E.Edouard‌‌ Lhomme, F.Francois Petit, R.Raphaël‌ Porcher, L.Laura‌ Richert, F.Florence‌‌ Saillour, M.Moreno Ursino, G.Gaël‌ Varoquaux, V.Vincent‌ Vercamer, S.Sarah‌‌ Zohar and R.Rodolphe Thiébaut. Narrative review‌ on the clinical evaluation‌ of AI-based digital medical‌‌ devices from a methodological perspective.PEPR-SanteNum 2025‌ - Journées Annuelles du‌ PEPR Santé NumériqueLille,‌‌ FranceOctober 2025HAL
45 inproceedingsN.Nick‌ Erickson, L.Lennart‌ Purucker, A.Andrej‌‌ Tschalzev, D.David Holzmüller, P. M.‌Prateek Mutalik Desai,‌ D.David Salinas and‌‌ F.Frank Hutter. TabArena: A Living Benchmark‌ for Machine Learning on‌ Tabular Data.NeurIPS‌‌ 2025 - The Thirty-ninth Annual Conference on Neural‌ Information Processing Systems Datasets‌ and Benchmarks TrackSan‌‌ Diego, United States2025HAL
46 inproceedingsF.‌Félix Lefebvre, M.‌ J.Myung Jun Kim‌‌ and G.Gaël Varoquaux. Knowledge-Rich Embeddings for‌ Tabular Learning.AITD‌ 2025 - Workshop on‌‌ AI for Tabular DataCopenhagen, DenmarkDecember 2025‌HAL
47 inproceedingsD.‌Daniel Musekamp, M.‌‌Marimuthu Kalimuthu, D.David Holzmüller, M.‌Makoto Takamoto and M.‌Mathias Niepert. Active‌‌ Learning for Neural PDE Solvers.ICLR 2025‌ - The Thirteenth International‌ Conference on Learning Representations‌‌Singapore, Singapore2025HALDOI

Reports & preprints‌

48 miscJ.Judith‌ Abécassis, H.Houssam‌‌ Zenati, S.Sami‌ Boumaïza, J.Julie Josse and B.Bertrand‌ Thirion. Causal mediation analysis with one or‌ multiple mediators: a comparative study.May 2025‌HAL
49 reportY.Yoshua Bengio, S.‌Sören Mindermann, D.Daniel Privitera, T.‌Tamay Besiroglu, R.Rishi Bommasani, S.‌Stephen Casper, Y.Yejin Choi, P.‌Philip Fox, B.Ben Garfinkel, D.‌Danielle Goldfarb, H.Hoda Heidari, A.‌Anson Ho, S.Sayash Kapoor, L.‌Leila Khalatbari, S.Shayne Longpre, S.‌Sam Manning, V.Vasilios Mavroudis, M.‌Mantas Mazeika, J.Julian Michael, J.‌Jessica Newman, K. Y.Kwan Yee Ng‌, C.Chinasa Okolo, D.Deborah Raji‌, G.Girish Sastry, E.Elizabeth Seger‌, T.Theodora Skeadas, T.Tobin South‌, E.Emma Strubell, F.Florian Tramèr‌, L.Lucia Velasco, N.Nicole Wheeler‌, D.Daron Acemoglu, O.Olubayo Adekanmbi‌, D.David Dalrymple, T.Thomas Dietterich‌, E.Edward Felten, P.Pascale Fung‌, P.-O.Pierre-Olivier Gourinchas, F.Fredrik Heintz‌, G.Geoffrey Hinton, N.Nick Jennings‌, A.Andreas Krause, S.Susan Leavy‌, P.Percy Liang, T.Teresa Ludermir‌, V.Vidushi Marda, H.Helen Margetts‌, J.John Mcdermid, J.Jane Munga‌, A.Arvind Narayanan, A.Alondra Nelson‌, C.Clara Neppel, A.Alice Oh‌, G.Gopal Ramchurn, S.Stuart Russell‌, M.Marietje Schaake, B.Bernhard Schölkopf‌, D.Dawn Song, A.Alvaro Soto‌, L.Lee Tiedrich, G.Gaël Varoquaux‌, A.Andrew Yao, Y.-Q.Ya-Qin Zhang‌, F.Fahad Albalawi, M.Marwan Alserkal‌, O.Olubunmi Ajala, G.Guillaume Avrin‌, C.Christian Busch, A. C.André‌ Carlos Ponce de Leon Ferreira de Carvalho,‌ B.Bronwyn Fox, A. S.Amandeep Singh‌ Gill, A. H.Ahmet Halit Hatip,‌ J.Juha Heikkilä, G.Gill Jolly,‌ Z.Ziv Katzir, H.Hiroaki Kitano,‌ A.Antonio Krüger, C.Chris Johnson,‌ S.Saif Khan, K. M.Kyoung Mu‌ Lee, D. V.Dominic Vincent Ligot,‌ O.Oleksii Molchanovskyi, A.Andrea Monti,‌ N.Nusu Mwamanzi, M.Mona Nemer,‌ N.Nuria Oliver, J. R.José Ramón‌ López Portillo, B.Balaraman Ravindran, R.‌ P.Raquel Pezoa Rivera, H.Hammam Riza‌, C.Crystal Rugege, C.Ciarán Seoighe‌, J.Jerry Sheehan, H.Haroon Sheikh‌, D.Denise Wong and Y.Yi Zeng‌. International AI Safety Report.AI safety‌ institute2025HAL back to text
50 misc‌C.Clément Berenfeld, A.Ahmed Boughdiri,‌ B.Bénédicte Colnet, W. A.Wouter A.‌ C. van Amsterdam, A.Aurélien Bellet,‌ R.Rémi Khellaf, E.Erwan Scornet and J.Julie Josse.‌ Causal Meta-Analysis: Rethinking the‌ Foundations of Evidence-Based Medicine‌‌.May 2025HAL
51 miscE.Eugene‌ Berta, D.David‌ Holzmüller, M. I.‌‌Michael I. Jordan and F.Francis Bach.‌ Rethinking Early Stopping: Refine,‌ Then Calibrate.January‌‌ 2025HAL
52 miscE.Eugène Berta,‌ D.David Holzmüller,‌ M. I.Michael I‌‌ Jordan and F.Francis Bach. Structured Matrix‌ Scaling for Multi-Class Calibration‌.November 2025HAL‌‌
53 miscR.Rahul Bordoloi, C.Clémence‌ Réda, S.Saptarshi‌ Bej and O.Olaf‌‌ Wolkenhauer. Handling Missing Data in Downstream Tasks‌ With Distribution-Preserving Guarantees.‌2025HAL
54 misc‌‌S.Sacha Braun, D.David Holzmüller,‌ M. I.Michael I.‌ Jordan and F.Francis‌‌ Bach. Conditional Coverage Diagnostics for Conformal Prediction‌.December 2025HAL‌
55 miscF.Fateme‌‌ Ghayem, R.Raphael Meudec, J.Jérôme‌ Dockès, B.Bertrand‌ Thirion and D.Demian‌‌ Wassermann. NeuroConText: Contrastive learning for neuroscience meta-analysis‌ with rich text representation‌.May 2025HAL‌‌DOI
56 miscD.David Holzmüller and M.‌Max Schölpple. Beyond‌ ReLU: How Activations Affect‌‌ Neural Kernels and Random Wide Networks.June‌ 2025HAL DOI
57‌ miscS.Sébastien Melo‌‌, G.Gaël Varoquaux and M.Marine Le‌ Morvan. Epistemic Uncertainty‌ Quantification to Improve Decisions‌‌ From Black-Box Models.December 2025HAL back‌ to text
58 misc‌R.Raphael Meudec,‌‌ J.Jérôme Dockès, F.Fateme Ghayem,‌ D.Demian Wassermann and‌ B.Bertrand Thirion.‌‌ Peaks2Image: reconstructing fMRI maps from stereotactic coordinates to‌ enhance cognitive meta-analysis.‌August 2025HAL

Other‌‌ scientific publications

59 inproceedingsJ.Julie Alberge,‌ V.Vincent Maladière,‌ O.Olivier Grisel,‌‌ J.Judith Abécassis and G.Gaël Varoquaux.‌ Survival models: Proper scoring‌ rule and stochastic optimization‌‌ with competing risks.EPICLIN 2025 - conférence‌ francophone d’ÉPIdémiologie CLINique73‌Bordeaux, FranceMay 2025‌‌, 203083HAL DOI
60 inproceedingsS.Sandrine‌ Boulet, V.Vincent‌ Thévenet, J.Judith‌‌ Abécassis, C.Corinne Collignon, B.Bruno‌ Giraudeau, E.Edouard‌ Lhomme, F.Francois‌‌ Petit, R.Raphaël Porcher, L.Laura‌ Richert, F.Florence‌ Saillour, M.Moreno‌‌ Ursino, G.Gaël Varoquaux, V.Vincent‌ Vercamer, S.Sarah‌ Zohar and R.Rodolphe‌‌ Thiébaut. Narrative review on the clinical evaluation‌ of AI-based digital medical‌ devices from a methodological‌‌ perspective.SbN 2025 - 10th Statistics &‌ Biopharmacy ConferenceParis, France‌October 2025HAL
61‌‌ inproceedingsS.Sandrine Boulet, V.Vincent Thévenet‌, J.Judith Abécassis‌, C.Corinne Collignon‌‌, B.Bruno Giraudeau, E.Edouard Lhomme‌, F.Francois Petit‌, R.Raphaël Porcher‌‌, L.Laura Richert, F.Florence Saillour‌, M.Moreno Ursino‌, G.Gaël Varoquaux‌‌, V.Vincent Vercamer, S.Sarah Zohar‌ and R.Rodolphe Thiébaut‌. Narrative review on‌‌ the clinical evaluation of‌ AI-based digital medical devices from a methodological perspective‌.PEPR-SanteNum 2025 - Journées Annuelles du PEPR‌ Santé NumériqueLille, FranceOctober 2025HAL
62‌ inproceedingsM.Moreno Ursino, S.Sandrine Boulet‌, C.Corinne Collignon, F.Florence Francis-Oliviero‌, E.Edouard Lhomme, R.Raphaël Porcher‌, F.Florence Saillour, G.Gaël Varoquaux‌, R.Rodolphe Thiébaut and S.Sarah Zohar‌. Innovative Clinical Trial Approach for Evaluating Digital‌ Medical Devices under European Fast-Track Regulatory Frameworks.‌EPICLIN 2025 - Conférence francophone d’EPIdémiologie CLINiqueBordeaux,‌ FranceMay 2025HAL

SODA - 2025

SODA - 2025

2025Activity reportProject-Team﻿​​﻿SODA

Keywords

Computer Science​​﻿﻿ and Digital Science

Other Research​​​‌ Topics and Application Domains﻿​﻿﻿

1 Team members, visitors,﻿​﻿﻿ external collaborators

Research Scientists​‌﻿﻿

Post-Doctoral Fellows﻿​﻿﻿

PhD Students﻿​﻿﻿

Technical Staff

Interns and​​​‌ Apprentices

Administrative Assistant

Visiting﻿‌​‌ Scientist

External Collaborators﻿‌​‌

2﻿‌​‌ Overall objectives

2.1 Context﻿​​﻿

2.1.1 Application context: richer​​​‌ data in health and﻿﻿﻿‌ social sciences

Health databases﻿﻿﻿‌

Social, educational, and﻿‌​‌ behavioral sciences

AI in﻿‌​‌ society

2.1.2 Related﻿‌​‌ data-science challenges

Data management:﻿​​﻿ preparing tabular data for​​​‌ analytics

From machine learning to﻿‌​‌ statistically-valid answers

3 Research program​​﻿﻿

3.1 Table representation learning​​​‌

3.2​‌﻿﻿ Mathematical aspects of statistical​​﻿﻿ learning for data science​​​‌

3.3 Machine​​﻿﻿ learning for health and​​​‌ social sciences

3.4​​​‌ Turn-key machine-learning tools for﻿​﻿﻿ socio-economic impact

4﻿﻿﻿‌ Application domains

4.1 Precision﻿‌​‌ medicine, public health, and﻿​​﻿ epidemiology

4.2﻿‌​‌ Educational data mining

4.3 Data﻿﻿﻿‌ management

4.4 Broader data​​​‌ science

4.5 Behavioral sciences

5 Social and environmental​​﻿﻿ responsibility

5.1 Footprint of​​​‌ research activities

5.2﻿‌​‌ Impact of research results﻿​​﻿

6﻿​​﻿ Highlights of the year​​​‌

6.1 Awards

7﻿﻿﻿‌ Latest software developments, platforms,﻿‌​‌ open data

7.1 Latest﻿​​﻿ software developments

7.1.1 Scikit-learn​​​‌

7.1.2 joblib

7.1.3 skrub​‌﻿﻿

8 New​‌﻿﻿ results

8.1 Table representation​​﻿﻿ learning

TabICL: Table foundation models​​﻿﻿

Retrieve merge predict​​​‌

TabArena

8.2﻿﻿﻿‌ Statistical aspects of machine﻿‌​‌ learning

Learning with missing values﻿‌​‌

Guidance for evaluation﻿﻿﻿‌ of medical AI

Double Debiased Machine​​​‌ Learning for Mediation Analysis﻿​﻿﻿ with Continuous Treatments

8.3 Bridging to health​​​‌ and social sciences

Emergence of maths​​​‌ gender gap

Influence of training﻿​﻿﻿ difficulty in learning outcomes​‌﻿﻿ of medical students

Unpacking the scale narrative﻿​​﻿ in AI

Going﻿​​﻿ from a theoretical causal​​​‌ analysis framework to practical﻿﻿﻿‌ guidance with health data﻿‌​‌

8.4 Turn-key​​﻿﻿ machine-learning tools for socio-economic​​​‌ impact

Releases of scikit-learn​‌﻿﻿

skrub

joblib﻿​​﻿

9 Bilateral contracts​​​‌ and grants with industry﻿﻿﻿‌

9.1 Bilateral​​​‌ contracts with industry

Probabl﻿﻿﻿‌

Pass Culture

Collaboration with Ministère​​﻿﻿ de la Santé

9.2 Bilateral Grants with﻿​﻿﻿ Industry

Collaboration with public​‌﻿﻿ interest group Pix

10​‌﻿﻿ Partnerships and cooperations

10.1​​﻿﻿ International initiatives

10.1.1 Inria​​​‌ associate team not involved﻿​﻿﻿ in an IIL or​‌﻿﻿ an international program

10.2​​﻿﻿ International research visitors

10.2.1​​​‌ Visits of international scientists﻿​﻿﻿

Other international visits to​‌﻿﻿ the team

Tomas Rigaux​​﻿﻿

10.2.2 Visits to international​‌﻿﻿ teams

Research stays abroad​​﻿﻿

Jill-Jênn Vie

2025Activity reportProject-TeamSODA

Computer Science and Digital Science

Other Research‌ Topics and Application Domains

1 Team members, visitors, external collaborators

Research Scientists‌

Post-Doctoral Fellows

PhD Students

Interns and‌ Apprentices

Visiting‌‌ Scientist

External Collaborators‌‌

2‌‌ Overall objectives

2.1 Context

2.1.1 Application context: richer‌ data in health and‌ social sciences

Health databases‌

Social, educational, and‌‌ behavioral sciences

AI in‌‌ society

2.1.2 Related‌‌ data-science challenges

Data management: preparing tabular data for‌ analytics

From machine learning to‌‌ statistically-valid answers

3 Research program

3.1 Table representation learning‌

3.2‌ Mathematical aspects of statistical learning for data science‌

3.3 Machine learning for health and‌ social sciences

3.4‌ Turn-key machine-learning tools for socio-economic impact

4‌ Application domains

4.1 Precision‌‌ medicine, public health, and epidemiology

4.2‌‌ Educational data mining

4.3 Data‌ management

4.4 Broader data‌ science

5 Social and environmental responsibility

5.1 Footprint of‌ research activities

5.2‌‌ Impact of research results

6 Highlights of the year‌

7‌ Latest software developments, platforms,‌‌ open data

7.1 Latest software developments

7.1.1 Scikit-learn‌

7.1.3 skrub‌

8 New‌ results

8.1 Table representation learning

TabICL: Table foundation models

Retrieve merge predict‌

8.2‌ Statistical aspects of machine‌‌ learning

Learning with missing values‌‌

Guidance for evaluation‌ of medical AI

Double Debiased Machine‌ Learning for Mediation Analysis with Continuous Treatments

8.3 Bridging to health‌ and social sciences

Emergence of maths‌ gender gap

Influence of training difficulty in learning outcomes‌ of medical students

Unpacking the scale narrative in AI

Going from a theoretical causal‌ analysis framework to practical‌ guidance with health data‌‌

8.4 Turn-key machine-learning tools for socio-economic‌ impact

Releases of scikit-learn‌

joblib

9 Bilateral contracts‌ and grants with industry‌

9.1 Bilateral‌ contracts with industry

Probabl‌

Collaboration with Ministère de la Santé

9.2 Bilateral Grants with Industry

Collaboration with public‌ interest group Pix

10‌ Partnerships and cooperations

10.1 International initiatives

10.1.1 Inria‌ associate team not involved in an IIL or‌ an international program

10.2 International research visitors

10.2.1‌ Visits of international scientists

Other international visits to‌ the team

Tomas Rigaux

10.2.2 Visits to international‌ teams

Research stays abroad

10.3 European‌ initiatives

10.3.1 Horizon Europe

10.4 National initiatives

PEPR Santé Numérique‌

Project Partages

ANR StatQA

ANR‌‌ TaFoMo

ANR‌ ICPC

10.5 Public policy‌‌ support

Conseil Scientifique CNIL

11.1‌ Promoting scientific activities

11.1.1‌ Scientific events: organisation