2025Activity reportProject-TeamSODA
RNSR: 202224249S- Research center Inria Saclay Centre
- Team name: Computational and mathematical methods to understand health and society with data
Creation of the Project-Team: 2022 March 01
Each year, Inria research teams publish an Activity Report presenting their work and results over the reporting period. These reports follow a common structure, with some optional sections depending on the specific team. They typically begin by outlining the overall objectives and research programme, including the main research themes, goals, and methodological approaches. They also describe the application domains targeted by the team, highlighting the scientific or societal contexts in which their work is situated.
The reports then present the highlights of the year, covering major scientific achievements, software developments, or teaching contributions. When relevant, they include sections on software, platforms, and open data, detailing the tools developed and how they are shared. A substantial part is dedicated to new results, where scientific contributions are described in detail, often with subsections specifying participants and associated keywords.
Finally, the Activity Report addresses funding, contracts, partnerships, and collaborations at various levels, from industrial agreements to international cooperations. It also covers dissemination and teaching activities, such as participation in scientific events, outreach, and supervision. The document concludes with a presentation of scientific production, including major publications and those produced during the year.
Keywords
Computer Science and Digital Science
- A3.3. Data and knowledge analysis
- A3.4. Machine learning and statistics
- A9.1. Knowledge
- A9.2. Machine learning
Other Research Topics and Application Domains
- B2.3. Epidemiology
- B9.1. Education
- B9.5.6. Data science
- B9.6.1. Psychology
- B9.6.3. Economy, Finance
1 Team members, visitors, external collaborators
Research Scientists
- Gael Varoquaux [Team leader, INRIA, Senior Researcher, HDR]
- Judith Abecassis [INRIA, ISFP]
- David Holzmuller [INRIA, Starting Research Position, from Oct 2025]
- Myung Kim [INRIA, Starting Research Position]
- Marine Le Morvan [INRIA, Researcher]
- Jill Jenn Vie [INRIA, Researcher]
Post-Doctoral Fellows
- Nicolas Hiebel [INRIA, Post-Doctoral Fellow, from Oct 2025]
- Joel Mba Kouhoue [INRIA, Post-Doctoral Fellow, from Sep 2025]
- Jingang Qu [INRIA, Post-Doctoral Fellow]
- Clémence Reda [UNIV POTSDAM, Post-Doctoral Fellow, until Aug 2025]
PhD Students
- Julie Alberge [INRIA]
- Gioia Blayer [INRIA, from Nov 2025]
- Emma Cussenot [INRIA, from Dec 2025]
- Marie Generali-Lince [INRIA]
- Samuel Girard [INRIA]
- Felix Lefebvre [INRIA]
- Sebastien Melo [INRIA]
- Jovan Stojanovic [INRIA]
Technical Staff
- Hiba Bederina [INRIA, Engineer, until May 2025]
- Riccardo Cappuzzo [INRIA, Engineer, from Oct 2025]
- Tristan Haugomat [INRIA, Engineer]
- Eloi Massoulie [INRIA, Engineer, from Dec 2025]
Interns and Apprentices
- Anav Agrawal [INRIA, Intern, from May 2025 until Jul 2025]
- Guillaume Bertho [AP/HP, Intern, from May 2025 until Nov 2025]
- Emma Cussenot [INRIA, Intern, from May 2025 until Oct 2025]
- Dan Suissa [INRIA, Intern, from Nov 2025]
- Vlada Voronina [INRIA, Intern, from May 2025 until Aug 2025]
Administrative Assistant
- Ekaterina George [INRIA]
Visiting Scientist
- Tomas Rigaux [UNIV KYOTO, from Aug 2025 until Sep 2025]
External Collaborators
- Gaetan Brison [IP PARIS]
- Lihu Chen [IMPERIAL COLLEGE LDN]
- Leo Dautun [AP/HP, from Apr 2025]
- Lea Hoisnard [AP/HP, until Oct 2025]
- Theo Jolivet [AP/HP, from Feb 2025]
- Elise Liu [ICM, from Dec 2025]
- Louis Potier [AP/HP, from Dec 2025]
- Meilame Tayebjee [ENSAE]
2 Overall objectives
2.1 Context
2.1.1 Application context: richer data in health and social sciences
Opportunistic data accumulations, often observational, bare great promises for social and health sciences. But the data are too big and complex for standard statistical methodologies in these sciences.
Health databases
Increasingly rich health data is accumulated during routine clinical practice as well as for research. Its large coverage brings new promises for public health and personalized medicine, but it does not fit easily in standard biostatistical practice because it is not acquired and formatted for a specific medical question.
Social, educational, and behavioral sciences
Better data sheds new light on human behavior and psychology, for instance with on-line learning platforms. Machine learning can be used both as a model for human intelligence and as a tool to leverage these data, for instance improving education.
Likewise, activity traces can provide empirical evidence for economical or political science, but their complexity requires new statistical practices.
AI in society
AI increasingly impacts multiple aspects of society. As such, it calls for rigorous evaluation, whether it is a benchmark of its ability, or a broader assessment of its impacts.
2.1.2 Related data-science challenges
Data management: preparing tabular data for analytics
Assembling, curating, and transforming data for data analysis is very labor intensive. These data-preparation steps are often considered the number one bottleneck to data-science. They mostly rely on data-management techniques. A typical problem is to establish correspondences between entries that denote the same entities but appear in different forms (entity linking, including deduplication and record linkage). Another time-consuming process is to join and aggregate data across multiple tables with repetitions at different levels (as with panel data in econometrics and epidemiology) to form a unique set of “features” to describe each individual. This process is related to database denormalization and might require schema alignment when performed across multiple data sources with imperfect correspondence in columns.
Progress in machine learning increasingly helps automating data preparation and processing data with less curation.
From machine learning to statistically-valid answers
Machine learning can be a tool to answer complex domain questions by providing non-parametric estimators. Yet, it still requires much work, eg to go beyond point estimators, to derive non-parametric procedures that account for a variety of bias (censoring, sampling biases, non-causal associations), or to provide theoretical and practical tools to assess validity of estimates and conclusion in weakly-parametric settings.
A question that is increasingly important in all applications of machine learning is that of auditing the model used in practice. This question arises in fundamental-research settings (medical research, political science...) for statistical validity, and in applications to assess societal biases, or safety of AI systems.
3 Research program
3.1 Table representation learning
Soda develops develop deep-learning methodology for relational databases, from tabular datasets to full relational databases. The stakes are i) to build machine-learning models that apply readily to the raw data so as to minimize manual cleaning, data formatting and integration, and ii) to extract reusable representations that reduce sample complexity on new databases by transforming the data in well-distributed vectors and bringing background information. The success of embarking such background knowledge in foundation models such as large language models motivates a quest for table foundation models.
3.2 Mathematical aspects of statistical learning for data science
While complex models used in machine learning can be used as non-parametric estimators for a variety of statistical tasks or for decision making, the statistical procedures and validity criterion need to be reinvented. Soda contributes statistical tools and results for a variety of problems important to data science in health and social science (epidemiology, econometrics, education). Statistical topics of interest comprise:
- Missing values and survival analysis
- Causal inference
- Model validation and auditing
- Uncertainty quantification
3.3 Machine learning for health and social sciences
Soda targets applications in health and social sciences, as these can markedly benefit from advanced processing of richer datasets, can have a large societal impact, but fall out of mainstream machine-learning research, which focus on processing natural images, language, and voice. Rather, data surveying humans needs another focus: it is most of the time tabular, sparse, with a time dimension, and missing values. In term of application fields, we focus on the social sciences that rely on quantitative predictions or analysis across individuals, such as policy evaluation. Indeed, the same formal problems, addressed in the two research axes above, arise across various social sciences: epidemiology, education research, and economics. The challenge is to develop efficient and trustworthy machine learning methodology for these high-stakes applications.
3.4 Turn-key machine-learning tools for socio-economic impact
Societal and economical impact of machine learning requires easy-to-use practical tools that can be leveraged in non-specialized organizations such as hospitals or policy-making institutions.
Soda works on scikit-learn, one of the most popular machine-learning tool world-wide, as well as skrub, a younger project that specializes machine learning for tables. Our goal is to transfer outside of the lab the understanding of machine learning and data science accumulated by the various research efforts.
Soda also works on other important software tools to foster growth and health of the Python data ecosystem in which scikit-learn is embedded.
4 Application domains
4.1 Precision medicine, public health, and epidemiology
Data management is the focus of the field of medical informatics as it is notably challenging in healthcare settings, due to the multiplicity of sources and the richness of the data that encompasses many modalities. We apply the our machine techniques for statistical analysis, including causal inference, in medicine to facilitate clinical research and public-health evidence. The central questions are that of personalized medicine –prediction at the individual level, for diagnosis, prognosis, or drug recommendation– and of public health –evaluation of treatments and policy, estimation of risk factors. The data on which we work are patient history and claims databases: mid-dimensional data with longitudinal coverage (as opposed to “omics” or imaging data, which is high dimensional and much less frequently available in clinical settings).
We collaborate actively with AP-HP and Ministère de la Santé. APHP provides access to its very rich and complex data mart, with dozens of tables following millions of individuals, both a challenge and an opportunity, and we work with various medical specialists (neurology, diabetology, public health) on specific clinical questions related to prognostic, treatment evaluation, and risk factors. With Ministère de la Santé, we process the claims data from the national insurance database to establish trajectories of individuals as a function of their future health risks. The short-term goal is to find which medical conditions can be predicted and with what reliability. The longer-term goal is to define prevention strategies.
4.2 Educational data mining
In educational data mining, we are interested in developing mathematical methods of learning to personalize education through adaptive assessment (developing algorithms that select questions for measuring efficiently the latent knowledge of examinees or for optimizing learning), recommending learning resources, generating exercises automatically. It is a challenging problem as it is hard to quantify learning, unlike in traditional reinforcement learning scenarios, and it is hard to measure the effect of courses on learning. This is why it is traditionally modeled as a partially-observable Markov decision process (POMDP). We are interested in modeling the evolution of uncertainty over the latent knowledge of examinees over time, for example using Bayesian approches, or model-based reinforcement learning.
Soda is actively collaborating with the national platform Pix.fr for certifying the digital competencies of all French citizens. Jill-Jênn Vie is one of the original core developers and they jointly received a Paris Region PhD grant in 2023 allowing them to co-supervise the PhD of Samuel Girard about optimizing human learning. In 2023, Jill-Jênn Vie joined the scientific committee of the French Ministry of Education (CSEN, conseil scientifique de l'Éducation nationale), leading to collaborations with Franck Ramus and ongoing discussions with Camille Terrier, Marc Gurgand, Hugo Gimbert via the scientific committee of MonProjetSup, a state startup about a study path recommender system.
4.3 Data management
Data preparation for analytics is intrinsically related to data management. For instance, linked open data provides consistent views on data across silos, but integrating these data into a statistical model to answer a given question still requires a lot of user efforts. Database operation increasingly relies on machine learning. While Soda is in no way expert in database research, the analytic tools that we build for relational data are increasingly used for data management. We are collaborating with Paolo Papotti (Eurecom) on this topic.
4.4 Broader data science
The tools, practical and theoretical, that we develop are central to many applications of data science. For instance, we often discuss with banks and insurances, which use machine learning but face statistical problems that we tackle: censoring or other sampling biases, forecasting, uncertainty quantification. Marketing and business intelligence also face the same exact problems. Even more generally, data preparation from relational databases is a challenge is most data-science applications. We interact with data scientists in a broad set of applications via the user base of the software tools that we develop (eg scikit-learn) and the various courses and lectures that we give around these tools to industry audiences.
We have started a collaboration in economics (Margherita Comola, Paris School of Economics) on using machine learning to understanding communication strategies of politicians from social-network data.
4.5 Behavioral sciences
A methodological challenge in health and educational sciences common to behavioral science is that the quantities of interest are difficult to measure, e.g. intelligence or progress of a student. Supervised machine learning can infer proxies from indirect signs, such as psychological traits from brain imaging, diagnosis from clinical traces, or socio-economical status from demographics. This notion of proxies is central in policy evaluation, serving as indirect signals in causal inference, to provide secondary outcomes for treatment effect estimation or to control confounders not directly observed.
An ongoing project with Pass Culture (via Inria-Ministry of Culture convention) is to adapt the recommender system of the app to encourage diversity, i.e. not only optimize click-through rate, but making students discover new things. This is done by modeling this problem as contextual bandits, and a diversity term acts as regularizer in the objective function.
5 Social and environmental responsibility
5.1 Footprint of research activities
The main footprint of Soda's activity is the carbon footprint of our travels (surpassing our compute cost, as we seldom run very intensive computation). For this reason, we try to be careful with our long-distance travel and try to take the plane as little as possible. Not flying at all is not possible, as it would cut us off from the world-wide research community sometimes mediated by crucial conferences in North America. However, we favor online seminars, or on-premise talks accessible by train.
Because of a race to scale, artificial intelligence is starting to have a large environmental footprint. As this is the result of collective action, as opposed to a single research group, we are trying to bring this problem to the attention of the community 42. Whenever possible, we also work on algorithms with small computational costs. For instance using tree-based models instance of neural networks can sometimes bring sizable computational and statistical benefits 31. This work required solving fundamental challenges, as trees are not differentiable, and was difficult to get accepted because it not fashionable. Another example is quantifying uncertainty of large language models to call the smallest that will give a good-enough answer 57.
5.2 Impact of research results
While data science can improve health and education, working with personal data or providing decision tools that affect individuals comes with responsibilities.
We make sure that work at Soda do not risk having direct negative impact. All research real-life health data (hospital-level or nation-wise) is started only after approval by the corresponding ethical board. Soda does not put any tools in production: none of the works of soda directly leads to automated decisions. Consequently none of our work has directly impacted individuals. Soda works on pseudonymized data, and we leave the –pseudonymized– electronic health data on servers inside the protected environment of the hospital where they have been acquired and are used. Going further, Soda runs research on privacy-preserving synthetic data generation, to provide open datasets for research and development without privacy concerns.
Soda is also active on assess and discussing the broader impacts and risks associated to AI 11, participating in international efforts 49 to create consensus.
6 Highlights of the year
6.1 Awards
-
Doctor Honoris Causa UC Louvain
Gael Varoquaux
-
Ordre National du Mérite
Gael Varoquaux
-
Clarivate highly cited researcher
Gael Varoquaux
-
BFM Awards
section IA Gael Varoquaux
-
Pedagogical Dynamics Prize from Fondation de l'École polytechnique
Jill-Jênn Vie
-
Sophie Germain Prize from UK Embassy
Jill-Jênn Vie with Luc Rocher from The University of Oxford
-
ICLR 2025 spotlight (top 300 of 11,600 submissions)
Marine Le Morvan and Gael Varoquaux
7 Latest software developments, platforms, open data
7.1 Latest software developments
7.1.1 Scikit-learn
-
Keywords:
Clustering, Classification, Regression, Machine learning
-
Scientific Description:
Scikit-learn is a Python module integrating classic machine learning algorithms in the tightly-knit scientific Python world. It aims to provide simple and efficient solutions to learning problems, accessible to everybody and reusable in various contexts: machine-learning as a versatile tool for science and engineering.
-
Functional Description:
Scikit-learn can be used as a middleware for prediction tasks. For example, many web startups adapt Scikitlearn to predict buying behavior of users, provide product recommendations, detect trends or abusive behavior (fraud, spam). Scikit-learn is used to extract the structure of complex data (text, images) and classify such data with techniques relevant to the state of the art.
Easy to use, efficient and accessible to non datascience experts, Scikit-learn is an increasingly popular machine learning library in Python. In a data exploration step, the user can enter a few lines on an interactive (but non-graphical) interface and immediately sees the results of his request. Scikitlearn is a prediction engine . Scikit-learn is developed in open source, and available under the BSD license.
- URL:
- Publications:
-
Contact:
Gael Varoquaux
-
Participant:
10 anonymous participants
-
Partners:
Axa, BNP Parisbas Cardif, Dataiku, Nvidia, Chanel, Probabl
7.1.2 joblib
-
Keywords:
Parallel computing, Cache
-
Functional Description:
Facilitate parallel computing and caching in Python.
- URL:
-
Contact:
Thomas Moreau
-
Participant:
an anonymous participant
-
Partner:
Probabl
7.1.3 skrub
-
Keyword:
Data analysis
-
Functional Description:
Joins, aggregates, and vectorizes tables to enable statistical learning, including with badly formated entries
- URL:
-
Contact:
Gael Varoquaux
-
Participant:
2 anonymous participants
-
Partner:
Probabl
8 New results
8.1 Table representation learning
Participants: David Holzmuller, Marine Le Morvan, Gael Varoquaux.
TabICL: Table foundation models
A new wave of progress is pushing forward tabular learning. Recent models have been bringing better performance across the board. A poster example is that of the TabPFN series of models, that rely on pretrained transformers to bring excellent performance, originally in the few-shot settings, and in the beginning of 2025, up to moderate tables with TabPFN2. This line of work has led to spinning off a startup in Germany. However, the quadratic complexity of the transformers is a bottleneck. With the TabICL model 7, we showed that a multi-stage architecture can build a pre-trained in-context predictor where the separation of states decreases the quadratic cost. The model can be pretrained on larger datasets, and thus results in the best performer in settings of larger tables. The model is faster than alternatives, in particular when using a CPU rather than a GPU. In addition, we released in open source all the code, including the pretraining; this has spurred much downstream research for multiple applications and enhancements, such as privacy.
This result is very significant as it pushes forward the agenda of foundation models for tables. It is giving birth to a very active line of research. The paper has been cited 72 times in less than a year.
Retrieve merge predict
A full data-science pipeline must often assemble data across multiple source tables. When the user is faced with a complex data lake, many tables and little explicit link between them, it is difficult to find the best assembly for a given machine-learning task. This problem requires not only finding which table must be joined in the main table of interest –a table retrieval problem–, but also how to aggregate multiple records when tables are linked through a many-to-one relation. While table retrieval is a classic problem of the data management literature, it had been understudied in the case of supervised machine learning. We assembled a systematic –and open– benchmark with data lakes and supervised-learning tasks 2. We found that supervised learning does change the picture compared to classic table-retrieval settings in that for a fixed compute budget, it is worth avoiding fancy retrieval methods, which can be very computationally costly, and rather using better supervised learning methods, which can be comparatively less expensive while being able to extract the relevant information from a noisy retrieval.
TabArena
The progress in tabular learning—using machine learning to predict from rows of a table—has been driven by empirical studies over the last few years. We have contributed to building TabArena 4, a living benchmark for machine learning on tabular data. TabArena contains 51 datasets that are carefully curated to represent real-world tabular learning tasks, avoiding pitfalls such as duplicated datasets with different names, data leakage, inappropriate train-test splits, datasets inappropriate for tabular learning methods, and so on. The first version of the benchmark evaluates 16 tabular learning methods, including recent models and 3 table foundation models. TabArena aims to evaluate models in the settings that allow them to achieve peak performance. This includes hyperparameter tuning with well-designed search spaces, cross-validation, and ensembling different hyper-parameter configurations. Besides providing an up-to-date comparison of models, TabArena provides insights on the impact of cross-validation, ensembling, tuning, and validation overfitting. Results, updated on a regular basis with new methods, are presented on a leaderboard at http://tabarena.ai.
TabArena is reaching a very broad visibility. Indeed, while it went public only this summer, it is cited 26 times 6 months later and received the spotlight distinction at NeurIPS, the largest machine learning conference.
8.2 Statistical aspects of machine learning
Participants: Judith Abécassis, Marine Le Morvan, Gael Varoquaux.
Learning with missing values
A common practice for handling missing values in tables consists in first imputing missing values—i.e., replacing them with plausible values—and then proceeding as if the data were complete. In this context, we asked a simple but fundamental question: is it worth investing effort and resources in better imputations to improve predictions? This work complements our previous asymptotic theoretical findings with a thorough empirical finite-sample study 5, providing useful conclusions for practitioners. Results show that better recovery of missing values leads to better prediction, but with diminishing returns: a large improvement in recovery quality –which typically comes at a sizable computational cost– leads to a small improvement in prediction accuracy. The effect is further reduced when using flexible learning algorithms, and adding missing-value indicators Overall, on real-world datasets with powerful models, improving imputation yields very limited benefits.
Guidance for evaluation of medical AI
We contributed to a guidance review on metric to evaluate predictors in the context of medical practice 1. This guidance is aimed at practitioners and is important given the profusion of metrics applicable to classifiers, and the confusions in what they measure. The work outline both the various aspects that the metrics probe –discrimination, calibration, overall performance, classification, and clinical utility–, as well as the desirable mathematical properties. For instance, we stress that a good metric should be proper: it should be optimal when the classifier outputs the true probability of events. The metrics are illustrated in the context of medical usage, with an analysis of the utility and benefit to the patient.
Double Debiased Machine Learning for Mediation Analysis with Continuous Treatments
We introduced double machine-learning estimators with better convergence properties 43 to conduct a mediation analysis, ie to quantify how much of the causal effect of a continuous treatment goes via an intermediate variable. We constructed a kernel-based, Neyman-orthogonal estimator that combine regression and inverse-probability-weighting ideas while avoiding explicit estimation of the mediator density, which is beneficial with high-dimensional or continuous mediators, that often occur in applications. We established key theoretical properties: asymptotic normality at a nonparametric rate and multiple robustness that tolerates some misspecified nuisance modelsand illustrate; derived an asymptotically mean-squared-error–optimal bandwidth and associated confidence intervals for the mediated response curve. Simulation studies and an application to real-world medical data from the UK Biobak cohort (assessing the mediating role of brain-related variables in the effect of glycemic control on cognitive outcomes) demonstrate improved finite-sample performance over existing mediation estimators, highlighting the method’s practical relevance for complex observational studies.
8.3 Bridging to health and social sciences
Participants: Gael Varoquaux, Judith Abécassis, Jill-Jênn Vie.
Emergence of maths gender gap
Together with colleagues in cognitive psychology, we studied determinants of the gender gap in mathematics abilities 6. We analyzed four consecutive cohorts of nation-wide evaluation in France, on 5-to-7-year-old first graders. The data reveal the emergence of a gap in test results during the first grade: girls and boys start the year with almost equal test performance, but after one year of schooling the boys perform markedly better. This gender gap emerged across all type of schooling (including Montessori or other innovative pedagogy), all family socio-economic status. The onset of the gap was related to the admission in first grade, and not the age of the children. In contrast to maths, the development of language skills follow different dynamics, with a gap favoring girls present before schooling and different temporal evolution during schooling, narrowing this gap. The study concludes that the gender gap is unlikely to be due to fundamental gender differences in aptitudes, but rather likely mediated by interactions by teachers and parents, with hypothesis such as transmission of anxiety or internalizing stereotypes.
Influence of training difficulty in learning outcomes of medical students
Literature supports that in order to learn, tasks should not be too difficulty nor too easy. In a study in press, we attempted to identify that optimal level of difficulty using millions of student-question interactions of French students on the biggest medical training platform (Banque nationale d'entraînement, BNE) to determine how the difficulty of practice questions relative to student ability influences final exam performance. The best learning outcomes occur when students engage with questions that are, on average, slightly easier than their current proficiency level. This sweet spot for difficulty is not universal; it varies significantly across different medical specialties and individual student abilities. High-ability students, in particular, showed greater sensitivity to question difficulty. These results emphasize the need for adaptive learning systems that can personalize difficulty in real-time to match each student's evolving skills and the specific complexity of the subject matter.
Unpacking the scale narrative in AI
Plotting the increase of the scale of notable AI systems in the last years reveals a staggering explosion. AI's size has been growing super exponentially on a variety of dimensions: training compute, training cost (fig:aiscale), inference cost, amount of data used. Studying the wording used in pivotal publications as well as company communications shows that it anchors AI success in this growth, thus settings implicit social norms around scale 8. But systematic analysis of benchmark results show that scale does not always bring benefit. The narrative of scale is simplified and leaves aside many important ingredients of success of AI systems. In addition, the race for scale comes with planetary and societal consequences, which we study and document 8. Ever-increasing inference costs threaten economic and electricity sustainability. An unstoppable appetite for training data leads to fitting models on enormous datasets that elude quality control, engulfing undesirable facets of internet (including child pornography) or eroding privacy. The race for scale has financial consequences, benefiting above all actors of compute, but also structuring an ecosystem where cash-rich and GPU-rich actors have leverage on priorities, industrial or academic. These actors sometimes have circular investments strategies: funding third parties that will spend all this funding in compute, which can fuel an investment bubble in AI.
Evolution of the training cost (in dollars) of notable AI systems across the years
Evolution of the training cost (in dollars) of notable AI systems across the years
We conclude our study, published at FAccT 8, by underlining that academic research has a central role to play in these dynamics and must shape a healthy and grounded narrative. We recommend to 1) pursue basic AI research of interest independent of scale, eg uncertainty quantification, causality, etc. 2) hold responsible norms, in particular avoiding asking for compute increase when editing or reviewing, 3) always publish measures of compute to document the tradeoffs.
This study has had much impact: it has been well picked up by academics as well as policy-makers, due to its relevance to the current economy of innovation. It has been cited 48 times in less than a year.
Going from a theoretical causal analysis framework to practical guidance with health data
Many applications of machine learning, in particular in healthcare, need to lead to actionable conclusions and support for decision-making processes through. Thus, such applications must go beyond statistical associations and use a causal framework that. This is challenging to implement in practice, particularly when dealing with noisy real-world observational data. We propose and document a practical, five-step framework to turn routine electronic health records (EHR) into reliable, causally-grounded evidence for treatment decisions 3, illustrated on the effect of albumin plus crystalloids versus crystalloids alone on 28‑day mortality in sepsis. We emphasize that valid inference from observational ICU data hinges on: (1) careful study design using a target-trial emulation/PICOT formulation to avoid time-related biases such as immortal time bias; (2) explicit causal reasoning to identify confounders and define an estimand; (3) robust estimation using modern causal estimators, where doubly robust methods with flexible machine-learning nuisances (e.g. random forests) perform best; and (4) systematic “vibration” analyses to quantify how sensitive conclusions are to design, confounder, and model choices. Applying this pipeline to MIMIC‑IV, they recover the “no average effect” of albumin seen in randomized controlled trials (RCTs), while revealing clinically meaningful treatment heterogeneity, with potential benefit in subgroups such as older patients, males, and those with septic shock, thereby showcasing how valid causal machine learning on EHRs can complement RCTs for individualized decision-making.
8.4 Turn-key machine-learning tools for socio-economic impact
Participants: Gael Varoquaux.
Releases of scikit-learn
2025 saw two major releases of scikit-learn (1.7 in June and 1.8 in December). Scikit-learn has kept improving, adding both user-visible features, and deep transformations of the technical piles. We list below a few highlights that are certainly not exhaustive but illustrate the continuous progress made.
Here the user has expanded the display for the LogisticRegression to reveal all the parameter values. Hovering on a parameter name reveals the corresponding description.
HTML estimator display showing the parameter values
HTML estimator display showing the parameter values
-
Increasing support of GPUs
We are progressively rewriting the underlying compute operations to be able to execute on GPUs. As of scikit-learn 1.8, full analyses can be run, including cross-validation and model evaluation. On many workflows, running on the GPU leads to massive speedups (multiple folds, up to 70x).
-
Linear model speed ups
The algorithmics of the linear models have been improved along many directions, leading to speed ups, up to 10x in the sparse regression cases.
-
Temperature scaling recalibration
Recalibration correct systematics biases in prediction probabilities, eg over or under-confident classifiers. The problem becomes much harder in many class settings, because each class comes with a probability that must be estimated. Temperature scaling is a recalibration method that is particularly well suited to such settings.
-
Estimator displays
For a user working interactively with scikit-learn, as most data-scientists do, printing the models brings up a rich display that we have been improving in the last releases. The stakes are to make the user more productive. As with all user-experience work, the challenge is to display the right information, and make it understandable. In the last year, we have added a display of the hyper-parameters of the estimator, as well as the corresponding documentation, as illustrated in fig:estimatordisplay.
-
Free threading
The Python virtual machine has historically had a central lock that prevented efficient thread-based parallel computing. However, this lock has recently been removed and the virtual machine can be built without it. We have adapted scikit-learn to make sure that it runs safely in heavily multi-threaded settings, opening the door do data science in Python with efficient native parallel computing.
skrub
Skrub is a package to facilitate machine learning on tables that was first released at the end of 2023. Year 2024 was a very active year for skrub, with three release (0.5 in Jan, 0.6 in Jul, and 0.7 in Dec), and the following major features:
-
DataOps
skrub now comes with a new way of writing non-linear pipelines –dubbed DataOps– that combine multiple tables, tracks provenance through their transformations, and integrates machine learning. The DataOps can then be re-applied to new data, cross-validated, tuned, or extracted to be put in production.
-
Optuna
In skrub 0.7, Optuna can be used for hyper-parameter tuning on the pipelines. It opens the door to advanced hyper-parameter optimization algorithms.
While skrub is a fairly new package, it is increasingly well received by user. Uptake in download numbers can be seen on pypistats.org/packages/skrub, with, 6 000 downloads daily, as of end of December, and a beautiful exponential growth.
joblib
joblib is a very simple computation engine in Python that is massively used worldwide, including as a dependency of packages such as scikit-learn for parallel computing.
Release 1.5. Many changes to follow evolutions of the ecosystem and improve behaviors. Major changes are:
- Avoiding collisions of cache when cache is stored on a shared disk across different nodes from a cluster
- Support of Python 3.14
9 Bilateral contracts and grants with industry
Participants: Judith Abecassis, Gael Varoquaux, Jill-Jênn Vie.
9.1 Bilateral contracts with industry
Probabl
Probabl is an Inria spin-off in which Gaël Varoquaux has 30% of his time allocated and is Chief Science Officer. Probabl's mission is to develop and make sustainable an ecosystem of data-science commons. Probabl is the larger employer of scikit-learn maintainers. It builds a commercial offer around the scikit-learn ecosystem by augmenting scikit-learn with solutions and services for the entreprise. Gaël Varoquaux is the point of contact at Soda.
Pass Culture
Within the Ministry of Culture-Inria convention, Samuel Girard and Jill-Jênn Vie have been involved in a partnership with Pass Culture (used by 3 million students in France) to improve the diversity of their recommendations (12 months, started in June 2024). We hired an engineer, Hiba Bederina , from June 2024 from May 2025 and conducted a randomized controlled trial on 400,000 users, which led to a publication on a RecSys workshop on social good.
Collaboration with Ministère de la Santé
We have a 4-year long collaboration with Ministère de la Santé (HAS) on using the national healthcare data for prevention and policy evaluation. Gaël Varoquaux and Judith Abecassis are in charge at Soda.
9.2 Bilateral Grants with Industry
Collaboration with public interest group Pix
Jill-Jênn Vie got a Paris Region PhD 2023 funding with Pix (certification of digital competencies, 6 million active users), about optimizing human learning using reinforcement learning. Samuel Girard 's PhD is currently on this funding (105,000 euros from région Île-de-France, 20,000 euros from Pix).
10 Partnerships and cooperations
10.1 International initiatives
10.1.1 Inria associate team not involved in an IIL or an international program
-
Title:
Recommendations Encouraging Diversity
-
Duration:
2024 -> 2026
-
Coordinator:
Jill Jenn Vie and Koh Takeuchi (takeuchi@i.kyoto-u.ac.jp)
-
Partners:
- Kyoto University (Japan)
- CNRS
-
Inria contact:
Jill Jenn Vie
-
Summary:
This project aims to create recommender systems that optimize for cultural diversity. Finding items that not only optimize click-through rate, or profit, but also encourage users to discover new things. The goal is to borrow methods from causal inference to measure the treatment effect of recommendations (defined as the diversity after and before recommendation), and methods from reinforcement learning to optimize this treatment effect. One key element to achieve this project is that plenty of real data is available thanks to our current partnership with Pass Culture, an app used by the French government to provide a budget ranging from 20 to 300 euros for every 15 to 18 years old in order to purchase culture goods. These works will be done between Soda team and Kyoto University, with the help of CNRS.
10.2 International research visitors
10.2.1 Visits of international scientists
Other international visits to the team
Tomas Rigaux
-
Status
PhD student
-
Institution of origin:
Kyoto University
-
Country:
Japan
-
Dates:
August 2025
-
Context of the visit:
Work on reinforcement learning in graph neural networks and applications to recommender systems
-
Mobility program/type of mobility:
Research stay within associate team RED
10.2.2 Visits to international teams
Research stays abroad
Jill-Jênn Vie
-
Visited institution:
Kyoto University
-
Country:
Japan
-
Dates:
December 2024-February 2025
-
Context of the visit:
Work on applications to education and recommender systems
-
Mobility program/type of mobility:
Research stay within associate team RED
10.3 European initiatives
10.3.1 Horizon Europe
INTERCEPT-T2D
INTERCEPT-T2D project on cordis.europa.eu
-
Title:
Early Interception of Inflammatory-mediated Type 2 Diabetes
-
Duration:
From January 1, 2023 to December 31, 2027
-
Partners:
- INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET AUTOMATIQUE (INRIA), France
- UNIVERSITA DEGLI STUDI DI VERONA (UNIVR), Italy
- INSTITUT NATIONAL DE LA SANTE ET DE LA RECHERCHE MEDICALE (INSERM), France
- UNIVERSITAT BASEL, Switzerland
- ASSISTANCE PUBLIQUE HOPITAUX DE PARIS, France
- DEUTSCHE DIABETES FORSCHUNGSGESELLSCHAFT EV (DDFG), Germany
- FEDERATION FRANCAISE DES DIABETIQUES, France
- INSERM TRANSFERT SA, France
- Olatec Therapeutics, BV (Olatec Therapeutics, BV), Netherlands
- CENTRE HOSPITALIER UNIVERSITAIRE DE LIEGE (CHUL), Belgium
- UNIVERSITE DE LA REUNION (UR), France
- KAROLINSKA INSTITUTET (KAROLINSKA INSTITUTE), Sweden
- UNIVERSITATSSPITAL BASEL (KANTONSSPITAL BASEL), Switzerland
- TECHNISCHE UNIVERSITAET DRESDEN (TUD), Germany
-
Inria contact:
Gael Varoquaux
- Coordinator:
-
Summary:
The overall concept of INTERCEPT-T2D is to establish whether an inflammatory-mediated profile contributes to the onset of Type 2 Diabetes (T2D) complications, thus enabling the identification of patients most at risk of complications and the design of personalized prevention measures.
T2D is a heterogeneous disease, which is an obstacle to the delivery of an optimal tailored treatment. Consequently, patients’ individual trajectories of progressive hyperglycemia and risk of chronic complications are so far difficult to predict. In this context, onset of diabetic complications represents the most important transitional phase of T2D development toward premature disability and mortality.
Chronic systemic inflammation has been suggested to be a major contributor to the onset and progression of T2D complications. INTERCEPT-T2D will bring a new and clinically relevant dimension in T2D care considering at diagnosis inflammatory parameters that are of importance for the transition to T2D-related complications. The combination of state-of-the-art genomics and cell-biology technologies with targeted clinical interventions should lead to potent patients’ stratification. It should allow the identification and prognosis of a novel class or subclass of patients characterized by an “Inflammatory-mediated T2D” endotype.
The project has access to the best-documented longitudinal human European cohorts of patients with T2D, with reliable clinical and biological data allowing to trace the transition and evolution towards organ complications. This, added to the exploitation of an extensive health data warehouse, will enable us to establish the inflammatory trajectory of citizens with T2D from diagnosis to the development of complications.
To explore the ability to prevent the transition phase of T2D towards organ complications, INTERCEPT-T2D will conduct a phase II clinical trial with an anti-inflammatory therapy targeting NLRP3 Inflammasome activity in patients with T2D.
RECeSS
RECeSS project on cordis.europa.eu
-
Title:
Robust Explainable Controllable Standard for drug Screening
-
Duration:
From May 1, 2023 to April 30, 2025
-
Partners:
- INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET AUTOMATIQUE (INRIA), France
- UNIVERSITAET ROSTOCK (UROS), Germany
-
Inria contact:
Jill-Jênn Vie
- Coordinator:
-
Summary:
In 2021, drug development pipelines last 10 years in average, and cost around $2 billion, while facing high failure rates, as only around 10% of Phase 0 drug candidates reach the commercialization stage. These issues can be mitigated through drug repurposing, where existent compounds are systematically screened for new therapeutic indications. Collaborative filtering is a semi-supervised learning framework that leverages known drug-disease matchings to make novel recommendations. However, prior works cannot be leveraged because of their lack of focus on human oversight and robustness to biological data.
This project aims at bridging the gap between drug research and collaborative filtering by implementing a RECeSS classifier, that is
(1) Robust: deals with class imbalance in drug-disease matchings, and missing drug/disease features, by semi-supervised learning;
(2) Explainable: connects predicted matchings to perturbed biological pathways through enrichment analyses, based on the learnt importance of features in the model;
(3) Controllable: guarantees a bound on the false positive rate using an adaptive learning scheme;
(4) Standard: algorithms are trained and tested by a standardized open-source pipeline.
Predicted matchings will be independently validated by structure-based methods. This innovative interdisciplinary project relies on a solid basis of newly curated data (up to 1,386 drugs, 1,599 diseases, 12 feature types). It is primarily supervised by Pr. Olaf Wolkenhauer, at SBI Rostock, whose team has an expertise in drug repurposing, in systems biology and data imbalance in machine learning. This project will help the fellow develop new skills, and enhance her professional maturity in academia.
In the short term, this would yield the first method that fully integrates biological interpretation and risk assessment to collaborative filtering-based repurposing. Long-term outcomes might help define sustainable and transparent drug development for rare diseases.
10.4 National initiatives
PEPR Santé Numérique
Soda is part of the “PEPR Santé Numérique” in the SMATCH subgroup that focuses on evidence of clinical efficacy. Soda will address two questions. The first question, addressed in collaboration with the PreMedical team, is that of external validity of randomized trials: how much is the treatment effect measured in a randomized clinical trial affected by the sampling bias of the trial, the difference between the study population and the intended target population. The second question, addressed in collaboration with the Heka team, is that of defining guidelines to evaluate software as a medical device. One particular challenge that we will tackle is to give procedures and recommendations to evaluate an update to a software used in clinical decision making using historical data rather than a trial. The project started end of 2023. Gaël Varoquaux is in charge at Soda, and Judith Abecassis is also supervising.
Project Partages
“Partages” is a large project funded by BPI France to develop digital commons for medical text analysis. In particular, the project will create material suitable for fine-tuning or aligning language models to perform best on French medical texts. Beyond the medical terms, there are specific challenges of clinical texts: these often result from scanning notes that have been taken fast, full of context-specific abbreviations and typos. The role of Soda is to design data-augmentation routine that help making language models robust to these challenges. The project started end of 2024. Gaël Varoquaux is in charge at Soda, and Judith Abecassis is also supervising.
ANR StatQA
Marine Le Morvan obtained an ANR JCJC (2025-2029, 305 k€). LLMs provide unprecedented access to information, but their statistical reasoning abilities remain limited. We introduce the concept of Statistical Question Answering (StatQA) to designate their capacity to address quantitative, non-deterministic questions with calibrated uncertainty. Our objectives are twofold: first, to assess the statistical soundness of LLMs’ responses using institutional datasets (INSEE, Eurostat); second, to develop multimodal approaches that integrate tabular models with natural language. This work aims to enhance the reliability and precision of LLM outputs.
ANR TaFoMo
Gaël Varoquaux obtained an ANR PRCE (2025-2029, 438 k€, partners Fabian Suchanek at Telecom Paris and Antoine Neuraz at Stane Group). The goal is to create Table Foundation Models, pre-trained on large collections of tables, embedding rich knowledge for subsequent machine-learning tasks. The project involves 3 axis: 1) developing new architectures, that handle different data types and multiple tables, 2) pre-training models with diverse large data from sources like Wikidata and DBpedia, drawing on PIs expertise in databases and knowledge graphs, and 3) and rigorously evaluating models across tasks, including health applications, to confirm their practical value and robustness to data variations.
ANR ICPC
Jill-Jênn Vie obtained an ANR JCJC (2025-2029, 238 k€). The goal is to develop an assistant to learn programming by solving algorithmic problems like in coding contests. We plan to generate hints without revealing the solution, while exploring automatic testcase generation to break an incorrect solution, to encourage robustness. We also plan to generate or recommend exercises within the proximal zone of development to keep students engaged. The project will feature actual experiments of the developed systems in classes, for example in high school.
10.5 Public policy support
Conseil Scientifique CNIL
Gaël Varoquaux is a scientific expert at the scientific committee of CNIL, the French data protection authority.
11 Dissemination
11.1 Promoting scientific activities
11.1.1 Scientific events: organisation
Member of the organizing committees
Julie Alberge
- NeurIPS in Paris organizing committee
11.1.2 Scientific events: selection
Member of the conference program committees
Gaël Varoquaux
- AAAI 2026 Conference Senior Program Committee
- ICLR 2026 Conference Area Chairs
- NeurIPS 2025 Conference Senior Area Chairs
- ICML 2025 Meta reviewer
Jill-Jênn Vie
- EDM 2025 Conference Senior Program Committee
Reviewer
Gaël Varoquaux
- AISTATS 2026 Reviewer
- NeurIPS 2025 Workshop Reviewer
David Holzmüller
- NeurIPS 2025 Reviewer
- ICML 2025 Workshop Reviewer
Jill-Jênn Vie
- ICLR 2025 and 2026 Reviewer
Judith Abécassis
- ICML 2025 Reviewer
- ICLR 2026 Reviewer
- AIstats 2026 Reviewer
- NeurIPS 2025 Datasets and Benchmarks Track Reviewer
Marine Le Morvan
- ICML 2025 Reviewer
- ICLR 2026 Reviewer
11.1.3 Journal
Member of the editorial boards
Jill-Jênn Vie
- STICEF – Cadre d'usage et de fonctionnement des IA génératives (IAG) en éducation
Reviewer - reviewing activities
Judith Abécassis
- TMLR Reviewer
- special issue TAL 66(2) Reviewer (Traitement automatique des langues)
- The Annals of Applied Statistics Reviewer
11.1.4 Invited talks
Gaël Varoquaux
- Académie Royale de Médecine, Bélgique, journée sur l'IA et la santé, Brussels
- Ellis-Helmoltz workshop on Foundation models for science, Berlin
- End-to-end data processing workshop, sigmod, Berlin
- Entente CordIAle Franco-English meetings, Cambridge
- Journée de la santé de Santé, APHP, Paris
- Indaba Chad, N'Djamena Chad (remote)
- Isaac Newton Institute, Cambridge
- Critéo AI ethics days, Criteo, Paris
- Dagstuhl workshop on Table Representation Learning, Dagstuhl, Germany
- Dali workshop, Sorrento, Italy
- Python Exchange, remote
- ESMRMB keynote, Marseilles, France
- EurIPS keynote, Copenhagen, Denmark
- EurIPS benchmarking keynote, Copenhagen, Denmark
- Congrès de la Société Française de Physique, Troyes, France
- NeurIPS in Paris keynote, Paris
- Séminaire Owkin
- Polish academy day on AI in science, Paris
- P16 annual days, Paris
- PyData London Meeting, London
- Telecom Student association, Saclay, France
- Teratec annual event keynote, Paris
- Journées de la SFDS sur l'incertitude, Paris
- VLDB panel on tabular foundation models, London
David Holzmüller
- Group seminar, University of Amsterdam, Amsterdam, Netherlands
- Tabular foundation models workshop, Freiburg, Germany
- AutoML School 2025, Tübingen, Germany
- PriorLabs reading group, remote
- Group seminar, RWTH Aachen, Aachen, Germany
- Group seminar, LMU München, Munich, Germany
Jill-Jênn Vie
- A Pre-Trained Graph-Based Model for Adaptive Sequencing of Educational Documents, Kyoto University, Japan, January 29, 2025
- Efficiency and environmental impact of LLMs, Inria Foresight Seminar, Rungis, March 26, 2025
-
A Pre-Trained Graph-Based Model
for Adaptive Sequencing of Educational Documents, IRIT, Toulouse, July 7, 2025
- Optimal Training Difficulty for Optimizing Learning Outcomes, Saclay PhD students day in STIC, Télécom Paris, Palaiseau, October 2, 2025
Judith Abécassis
- VITE2025 : Explanability for high-dimensional statistics, Montpellier
- Group Seminar, iPLesp, Paris
- Medical interns' seminar in Neurology at Lariboisière hospital, Paris
- Introduction to AI with EHR for anesthesia and intensive care residents, Paris
- Paris Health AI Workshop, Paris
Marine Le Morvan
- Keynote at EurIPS'25 Workshop on AI for Tabular Data, Copenhagen, Denmark, December 2025
- Keynote at Junior Conference on Data Sciences and Engineering, Paris, France, September 2025
- Probabilities and statistics seminar, Laboratoire de Mathématiques d’Orsay, France, June 2025
- Table Representation Learning (TRL) seminar, ELLIS Unit Amsterdam, Netherlands, April 2025
11.1.5 Leadership within the scientific community
Gaël Varoquaux
- Expert on the International AI Safety Report 2025
11.1.6 Scientific expertise
Gaël Varoquaux
- Reviewer for the general funding call at ANR (AAPG)
Jill-Jênn Vie
- Organisation internationale de la francophonie
Judith Abécassis
- Reviewer for the Messidore AAP (Inserm)
11.2 Teaching - Supervision - Juries - Educational and pedagogical outreach
Courses
-
Gaël Varoquaux
- Preparing tabular data for machine learning, tutorial, EU ADS summer school, 3h, Luxembourg
- Health AI summer school, Paris, France, 30 mn
-
Marine Le Morvan
- APM_53441_EP - From Boosting to Foundation Models: learning with Tabular Data, Ecole Polytechnique (Master 2), 30h
- APM_51438_EP - Refresher Course in Artificial Intelligence, Ecole Polytechnique (Master 1), 15h
- Learning with missing values, Dauphine executive master, 6h
-
Jill-Jênn Vie
- Deep Learning, ENS Paris, 27 h
- CSC_41M02_EP Algorithms and Advanced Programming (ICPC training), École polytechnique, 18 h
- SWERC training, ENS Paris-Saclay, 30 h éq. TD
- Tabular Deep Learning, Institut Polytechnique de Paris, 1 h
- Computer Vision, Ecole polytechnique, 45 h
-
Judith Abécassis
- Causal Inference DS-UA 9201, NYU Paris, Spring 2025, 56h
- AI for Healthcare, Centrale Supelec and Essec M2 (Data Sciences & Business Analytics), 24h
11.2.1 Supervision
Gaël Varoquaux
- PhD supervision
- Jovan Stojanovic (50%), co-supervised with Margherita Comola (Paris School of Economics)
- Julie Alberge (30%), co-supervised with Judith Abecassis (Soda, Inria)
- Sebastien Melo (30%), co-supervised with Marine Le Morvan (Soda, Inria)
- Celestin Eve (50%), co-supervised with Thomas Moreau (Mind, Inria)
- Meilame Tayebjee (50%), co-supervised with Guillaume Lecué (ENSAE)
- Félix Lefevbre
- Emma Cussenot, since December 2025 (25%), co-supervised with Judith Abecassis (Soda, Inria) and Louis Potier (AP-HP, Université Paris-Cité)
- Gioia Blayer, since November 2025 (70%), co-supervised with Marine le Morvan (Soda, Inria)
- Internships
- Emma Cussenot (50%, co-supervised with Judith Abecassis (Soda, Inria)
- Dan Suissa (30%, co-supervised with Judith Abecassis (Soda, Inria)
Jill-Jênn Vie
- PhD supervision
- Jean Vassoyan (33%), co-supervised with Nicolas Vayatis
- Samuel Girard (33%), co-supervised with Amel Bouzeghoub
- Marie Generali-Lince (33%), co-supervised with Patrick Loiseau (FairPlay) and Solenne Gaucher (École polytechnique)
- Internships
- Anav Agrawal (L2), IIT Delhi
Judith Abécassis
- PhD supervision
- Julie Alberge (30%), co-supervised with Gaël Varoquaux (Soda, Inria)
- Emma Cussenot, since December 2025 (25%), co-supervised with Louis Potier (AP-HP, Université Paris-Cité) and Gaël Varoquaux (Soda, Inria)
- Thaïs Walter, since September 2025 (50%), co-supervised with Jean-Damien Ricard (AP-HP, Paris University)
- Internships
- Emma Cussenot (50%, co-supervised with Gaël Varoquaux (Soda, Inria)
- Dan Suissa (70%, co-supervised with Gaël Varoquaux (Soda, Inria)
- Guillaume Bertho (33%), co-supervised with Adrien Coulet (HeKA, Inria) and Eric Jouvent (AP-HP, Université Paris-Cité)
Marine Le Morvan
- PhD supervision
- Sebastien Melo (70%), co-supervised with Gaël Varoquaux (Soda, Inria)
- Gioia Blayer (10%), co-supervised with Gaël Varoquaux (Soda, Inria)
- Internships
- Vlada Voronina (70%), co-supervised with Oana Balalau (Cedar, INRIA)
11.2.2 Juries
Gaël Varoquaux
- PhD and HDR jury
- PhD Committee of Elena Albu, KU Leuven, Belgium
- PhD Committee of Arnaud Delaunoy, Université de Liège, Belgium
- PhD Committee of Nicolas Hiebel, LISN Saclay, France
- PhD Committee of Lawrence Steward, Inria Paris, France
- PhD Committee of Charbel Kindji, Inria Rennes, France
- HDR Committee of Cedric Gouypailler, CEA, France
- Jury of the DataE grants from Ministère de la Santé
Jill-Jênn Vie
- PhD midway committee
- Loris Gaven, Inria Bordeaux, France
- Badmavasan Kirouchenassamy, Sorbonne University, France
- Anass El-Ayady, Université de Lorraine, France
- Jury of agrégation d'informatique
- Jury of École polytechnique entrance examinations
Judith Abécassis
- PhD midway committee
- Wassila Khatir, Université Côte d'Azur, France
- Ala Eddine Boudemia, Sorbonne University, France
- PhD jury
- PhD Committee of Yannis Lombardi (as examinatrice), Sorbonne University, France
Marine Le Morvan
- Jury for Associate Professor position in Statistics and Machine Learning, Sorbonne Universit´e (Jussieu).
11.2.3 Educational and pedagogical outreach
Gaël Varoquaux
- Chroniqueur Les Échos: every 3 months, a short article for the general public around an AI topic
- Talk on AI at the “amicale du corps des mines”
- Panel on AI and health at the AI action summit in Grand Palais
Jill-Jênn Vie
- Risques et opportunités de l'IA en éducation, formation des enseignants, École supérieure d'ingénieurs Léonard de Vinci, Courbevoie, 10 avril 2025
- Research in personalized education, teaching competitive programming, ENS Paris-Saclay, Gif-sur-Yvette, April 11, 2025
- Intelligence artificielle, Algorithmique et programmation, CIRM (50 prep school teachers in computer science), Marseille, May 8, 2025
- Un système de recommandation de problèmes d'algo pour préparer Prologin et ICPC, Finals of Prologin Programming Contest 2025 (100 students under 20 years old), Le Kremlin-Bicêtre, May 31, 2025
- Systèmes de recommandation industriels et LLM pour la recommandation, Online Pix Webinar, June 17, 2025
- Apprendre à l'heure de l'IA, Centre Teilhard de Chardin, Orsay and online, November 27, 2025
11.3 Popularization
11.3.1 Participation in Live events
Gaël Varoquaux
- Talk on AI at an event for IT professionals at Lyon (Generation IA, ADIRA)
- Talk on AI and health at a general-public event organized at Antony
- Talk on tabular AI at the dotAI tech conference Antony
- Talk at BNP Paribas's data and AI annual event
Judith Abécassis
- public recording of a public episode podcast "Nouvelles Héroïnes" at Inria Saclay, for the "Les Rendez-vous des Jeunes Mathématiciennes et Informaticiennes (RJMI)" days
11.3.2 Others science outreach relevant activities
Judith Abécassis
- Organization of Inria Women Lunches at Inria Saclay
12 Scientific production
12.1 Major publications
- 1 articleEvaluation of performance measures in predictive artificial intelligence models to support medical decisions: overview and guidance.The Lancet Digital HealthDecember 2025, 100916HALDOIback to text
- 2 articleRetrieve, Merge, Predict: Augmenting Tables with Data Lakes.Transactions on Machine Learning Research JournalMay 2025HALback to text
- 3 articleStep-by-step causal analysis of EHRs to ground decision-making.PLOS Digital Health42February 2025, e0000721HALDOIback to text
- 4 inproceedingsTabArena: A Living Benchmark for Machine Learning on Tabular Data.The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks TrackSan Diego, United States2025HALDOIback to text
- 5 inproceedingsImputation for prediction: beware of diminishing returns.International Conference on Learning RepresentationsICLR 2025 - International Conference on Learning RepresentationsSingapore, SingaporeApril 2025HALback to text
- 6 articleRapid emergence of a maths gender gap in first grade.Nature6438073June 2025, 1020 - 1029HALDOIback to text
- 7 inproceedingsTabICL: A Tabular Foundation Model for In-Context Learning on Large Data.ICML 2025 - 42nd International Conference on Machine LearningVancouver, CanadaJuly 2025HALback to text
- 8 inproceedingsHype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI.FAccT 2025 - ACM Conference on Fairness, Accountability, and TransparencyAthens, GreeceJuly 2025HALback to textback to textback to text
- 9 inproceedingsDouble Debiased Machine Learning for Mediation Analysis with Continuous Treatments.Proceedings of Machine Learning ResearchAISTATS - 28th International Conference on Artificial Intelligence and StatisticsPMLR-Mai Khao, ThailandMay 2025HAL
12.2 Publications of the year
International journals
International peer-reviewed conferences
Conferences without proceedings
Reports & preprints
Other scientific publications