EN FR
EN FR
CELESTE - 2025

2025​Activity reportProject-TeamCELESTE​‌

RNSR: 201923222N
  • Research center​​ Inria Saclay Centre at​​​‌ Université Paris-Saclay
  • In partnership​ with:CNRS, Université Paris-Saclay​‌
  • Team name: mathematical statistics​​ and learning
  • In collaboration​​​‌ with:Laboratoire de mathématiques​ d'Orsay de l'Université de​‌ Paris-Saclay (LMO)

Creation of​​ the Project-Team: 2019 June​​​‌ 01

Each year, Inria​ research teams publish an​‌ Activity Report presenting their​​ work and results over​​​‌ the reporting period. These​ reports follow a common​‌ structure, with some optional​​ sections depending on the​​​‌ specific team. They typically​ begin by outlining the​‌ overall objectives and research​​ programme, including the main​​​‌ research themes, goals, and​ methodological approaches. They also​‌ describe the application domains​​ targeted by the team,​​​‌ highlighting the scientific or​ societal contexts in which​‌ their work is situated.​​

The reports then present​​​‌ the highlights of the​ year, covering major scientific​‌ achievements, software developments, or​​ teaching contributions. When relevant,​​​‌ they include sections on​ software, platforms, and open​‌ data, detailing the tools​​ developed and how they​​​‌ are shared. A substantial​ part is dedicated to​‌ new results, where scientific​​ contributions are described in​​​‌ detail, often with subsections​ specifying participants and associated​‌ keywords.

Finally, the Activity​​ Report addresses funding, contracts,​​​‌ partnerships, and collaborations at​ various levels, from industrial​‌ agreements to international cooperations.​​ It also covers dissemination​​​‌ and teaching activities, such​ as participation in scientific​‌ events, outreach, and supervision.​​ The document concludes with​​​‌ a presentation of scientific​ production, including major publications​‌ and those produced during​​ the year.

Keywords

Computer​​​‌ Science and Digital Science​

  • A3.1.1. Modeling, representation
  • A3.1.8.​‌ Big data (production, storage,​​ transfer)
  • A3.3. Data and​​​‌ knowledge analysis
  • A3.3.3. Big​ data analysis
  • A3.4. Machine​‌ learning and statistics
  • A3.5.1.​​ Analysis of large graphs​​​‌
  • A6.1. Methods in mathematical​ modeling
  • A9.2. Machine learning​‌
  • A9.2.1. Supervised learning
  • A9.2.2.​​ Unsupervised learning
  • A9.2.3. Reinforcement​​​‌ learning
  • A9.2.4. Optimization and​ learning
  • A9.2.5. Bayesian methods​‌
  • A9.2.6. Neural networks
  • A9.2.7.​​ Kernel methods
  • A9.2.8. Deep​​​‌ learning

Other Research Topics​ and Application Domains

  • B1.1.4.​‌ Genetics and genomics
  • B1.1.7.​​ Bioinformatics
  • B2.2.4. Infectious diseases,​​​‌ Virology
  • B2.3. Epidemiology
  • B4.​ Energy
  • B4.4. Energy delivery​‌
  • B4.5. Energy consumption
  • B5.2.1.​​ Road vehicles
  • B5.2.2. Railway​​​‌
  • B5.5. Materials
  • B5.9. Industrial​ maintenance
  • B7.1. Traffic management​‌
  • B7.1.1. Pedestrian traffic and​​ crowds
  • B9.5.2. Mathematics
  • B9.8.​​ Reproducibility
  • B9.9. Ethics

1​​​‌ Team members, visitors, external‌ collaborators

Research Scientists

  • Kevin‌​‌ Bleakley [INRIA,​​ Researcher]
  • Etienne Boursier​​​‌ [INRIA, ISFP‌]
  • Gilles Celeux [‌​‌INRIA, Emeritus]​​
  • Evgenii Chzhen [CNRS​​​‌, Researcher]
  • Hugo‌ Cui [CNRS]‌​‌
  • Gilles Stoltz [CNRS​​, Senior Researcher,​​​‌ HDR]

Faculty Members‌

  • Sylvain Arlot [Team‌​‌ leader, UNIV PARIS​​ SACLAY, Professor,​​​‌ until Nov 2025]‌
  • Claire Boyer [Team‌​‌ leader, UNIV PARIS​​ SACLAY, Professor,​​​‌ from Dec 2025]‌
  • Sylvain Arlot [UNIV‌​‌ PARIS SACLAY, Professor​​, from Dec 2025​​​‌]
  • Claire Boyer [‌UNIV PARIS SACLAY,‌​‌ Professor, from Apr​​ 2025 until Nov 2025​​​‌]
  • Guillermo Durand [‌UNIV PARIS SACLAY,‌​‌ Associate Professor, from​​ Apr 2025]
  • Luca​​​‌ Ganassali [UNIV PARIS‌ SACLAY, Associate Professor‌​‌, from Apr 2025​​]
  • Christophe Giraud [​​​‌UNIV PARIS SACLAY,‌ Professor]
  • Christine Keribin‌​‌ [UNIV PARIS SACLAY​​, Professor]
  • Pascal​​​‌ Massart [UNIV PARIS‌ SACLAY, Professor]‌​‌
  • Patrick Pamphile [UNIV​​ PARIS SACLAY, Associate​​​‌ Professor]
  • Vincent Rivoirard‌ [LMO, Professor‌​‌ Delegation]

Post-Doctoral Fellows​​

  • Julien Aubert [INRIA​​​‌, Post-Doctoral Fellow,‌ from Feb 2025 until‌​‌ Oct 2025]
  • Margaux​​ Zaffran [INRIA,​​​‌ Post-Doctoral Fellow, from‌ Oct 2025]

PhD‌​‌ Students

  • Bertrand Even [​​UNIV PARIS SACLAY]​​​‌
  • Simone Maria Giancola [‌UNIV PARIS SACLAY,‌​‌ from Nov 2025]​​
  • Justine Lebrun [SNCF​​​‌, CIFRE, from‌ Feb 2025]
  • Leonardo‌​‌ Martins Bianco [LMO​​, until Sep 2025​​​‌]
  • Chiara Mignacco [‌UNIV PARIS SACLAY,‌​‌ until Sep 2025]​​
  • Pierre-Andre Mikem [UNIV​​​‌ PARIS SACLAY]
  • Dhia‌ Elhaq Ouerfelli [UNIV‌​‌ PARIS SACLAY]
  • Romain​​ Perier [UNIV PARIS​​​‌ SACLAY]
  • Guillaume Principato‌ [EDF]
  • Antoine‌​‌ Scheid [Ecole Polytechnique​​ and Inria Paris]​​​‌
  • Hanqi Sun [INRIA‌, from Sep 2025‌​‌]
  • Gayane Taturyan [​​IRT SYSTEM X]​​​‌
  • Daniil Tiapkin [ECOLE‌ POLY PALAISEAU]
  • Victor‌​‌ Turmel [UNIV PARIS​​ SACLAY]
  • Timothée Vincon​​​‌ [EDF R&D,‌ CIFRE, from Oct‌​‌ 2025]

Interns and​​ Apprentices

  • Shuailong Zhu [​​​‌INRIA, Intern,‌ until Mar 2025]‌​‌

Administrative Assistant

  • Laetitia Jubely​​ [INRIA, from​​​‌ May 2025]

External‌ Collaborators

  • Benjamin Auder [‌​‌CNRS]
  • Jean-Michel Poggi​​ [UNIV PARIS SACLAY​​​‌]

2 Overall objectives‌

2.1 Mathematical statistics and‌​‌ learning

Data science—a vast​​ field that includes statistics,​​​‌ machine learning, signal processing,‌ data visualization, and databases—has‌​‌ become front-page news due​​ to its ever-increasing impact​​​‌ on society, over and‌ above the important role‌​‌ it already played in​​ science over the last​​​‌ few decades. Within data‌ science, the statistical community‌​‌ has long-term experience in​​ how to infer knowledge​​​‌ from data, based on‌ solid mathematical foundations. The‌​‌ recent field of machine​​ learning has also made​​​‌ important progress by combining‌ statistics and optimization, with‌​‌ a fresh point of​​​‌ view that originates in​ applications where prediction is​‌ more important than building​​ models.

The Celeste project-team​​​‌ is positioned at the​ interface between statistics and​‌ machine learning. We are​​ statisticians in a mathematics​​​‌ department, with strong mathematical​ backgrounds, interested in interactions​‌ between theory, algorithms, and​​ applications. Indeed, applications are​​​‌ the source of many​ of our interesting theoretical​‌ problems, while the theory​​ we develop plays a​​​‌ key role in (i)​ understanding how and why​‌ successful statistical learning algorithms​​ work—hence improving them—and (ii)​​​‌ building new algorithms upon​ mathematical statistics-based foundations. Therefore,​‌ we tackle several major​​ challenges of machine learning​​​‌ with our mathematical statistics​ point of view (in​‌ particular the algorithmic fairness​​ issue), always having in​​​‌ mind that modern datasets​ are often high-dimensional and/or​‌ large-scale, which must be​​ taken into account at​​​‌ the building stage of​ statistical learning algorithms. For​‌ instance, there often are​​ trade-offs between statistical accuracy​​​‌ and complexity which we​ want to clarify as​‌ much as possible.

In​​ addition, most theoretical guarantees​​​‌ that we prove are​ non-asymptotic, which is important​‌ because the number of​​ features p is often​​​‌ larger than the sample​ size n in modern​‌ datasets, hence asymptotic results​​ with p fixed and​​​‌ n+∞​ are not relevant. The​‌ non-asymptotic approach is also​​ closer to the real-world​​​‌ than specific asymptotic settings,​ since it is difficult​‌ to say whether p​​=1000 and n​​​‌=100 corresponds to​ the setting p=​‌10n or p​​=n3/​​​‌2.

Finally, a​ key ingredient in our​‌ research program is connecting​​ our theoretical and methodological​​​‌ results with (a great​ number of) real-world applications.​‌ This is the reason​​ why a large part​​​‌ of our work is​ devoted to industrial and​‌ medical data modeling on​​ a set of real-world​​​‌ problems coming from our​ long-term collaborations with several​‌ partners, as well as​​ various opportunistic one-shot collaborations.​​​‌

3 Research program

In​ 2025, the Celeste team​‌ pursued a coherent research​​ program at the intersection​​​‌ of statistical learning, optimization,​ and probabilistic modeling, with​‌ a constant emphasis on​​ three cross-cutting requirements: rigorous​​​‌ guarantees, scalability, and adaptation​ to structure (constraints, geometry,​‌ dependence, and dynamics). The​​ team’s contributions advance both​​​‌ foundational theory—clarifying what can​ be learned, under which​‌ assumptions, and at what​​ computational cost—and practical methodologies​​​‌ motivated by large-scale industrial​ and scientific applications.

3.1​‌ Uncertainty Quantification for Structured​​ Multivariate Outputs.

A first​​​‌ pillar of the program​ develops distribution-free uncertainty quantification​‌ tools that remain valid​​ in finite samples while​​​‌ adapting to multivariate structure.​ One contribution studies conformal​‌ prediction for hierarchical data,​​ where components satisfy known​​​‌ linear relations. By integrating​ a reconciliation (projection) step​‌ into split conformal prediction,​​ the method leverages hierarchy​​​‌ to build strictly more​ efficient prediction regions at​‌ the same coverage level,​​ including for the demanding​​​‌ goal of component-wise coverage.​ This work also forges​‌ links between conformal inference​​ and the literature on​​​‌ forecast reconciliation, thereby unifying​ perspectives from statistical learning​‌ and forecasting.

Complementing this​​ structural viewpoint, a second​​ contribution introduces optimal transport–based​​​‌ conformal prediction for multivariate‌ outputs. Using Monge–Kantorovich vector‌​‌ ranks and quantiles, it​​ constructs flexible (potentially non-convex)​​​‌ prediction regions that better‌ reflect the geometry of‌​‌ complex uncertainty patterns, while​​ preserving finite-sample, distribution-free coverage.​​​‌ Together, these works strengthen‌ the team’s capability to‌​‌ deliver uncertainty sets that​​ are both reliable and​​​‌ informative in high-dimensional settings.‌

3.2 Learning with Structure,‌​‌ Constraints, and Operational Dynamics.​​

A second major axis​​​‌ designs learning methods that‌ incorporate constraints, domain structure,‌​‌ and temporal/strategic dynamics, motivated​​ by operational problems.

On​​​‌ the applied side, the‌ team develops statistical models‌​‌ for passenger movements within​​ trains with communicating coaches,​​​‌ using infra-red door sensors‌ to infer within-train flows‌​‌ and improve coach-level occupancy​​ estimation. The proposed family​​​‌ of models—culminating in a‌ station-specific “local” modeling interpretable‌​‌ as a recurrent neural​​ architecture—yields both interpretable parameters​​​‌ and a substantial forecasting‌ benefit (about 15% improvement‌​‌ for alighting-count prediction), supporting​​ the operational upgrade of​​​‌ real-time crowding information in‌ the Greater Paris area.‌​‌

On the methodological side,​​ the team proposes unified​​​‌ frameworks for time series‌ forecasting under linear constraints,‌​‌ showing that constrained empirical​​ risk minimization can be​​​‌ solved exactly using only‌ linear algebra, enabling highly‌​‌ scalable GPU implementations and​​ strong performance on real​​​‌ forecasting tasks (e.g., energy‌ demand and tourism). In‌​‌ a related direction, the​​ team addresses multi-class classification​​​‌ under system-level constraints through‌ post-processing of randomized classifiers:‌​‌ by formulating the problem​​ as a constrained stochastic​​​‌ program and using entropic‌ regularization with dual optimization,‌​‌ the method enforces constraints​​ such as fairness, abstention,​​​‌ or churn without retraining,‌ while providing finite-sample guarantees.‌​‌

This axis also includes​​ learning in interactive settings:​​​‌ work on prediction-aware learning‌ in multi-agent systems introduces‌​‌ a framework where agents​​ exploit forecasts of future​​​‌ payoffs to improve performance‌ in time-varying games. The‌​‌ proposed algorithm (POMWU) achieves​​ convergence and welfare guarantees​​​‌ close to static settings‌ when prediction errors are‌​‌ controlled, refining classical regret​​ analyses in dynamic environments.​​​‌

Finally, the team contributes‌ to rare anomaly detection‌​‌ through supervised active-learning frameworks​​ that combine expert labeling​​​‌ with both classifier-driven and‌ active-learning selection of candidates.‌​‌ A distinctive aspect is​​ to use the anomaly​​​‌ scores from an ensemble‌ of unsupervised detectors as‌​‌ features, generalizing aggregation methods​​ and extending them to​​​‌ ordered data such as‌ time series; the resulting‌​‌ methodology is implemented in​​ the open-source library acanag​​​‌.

Across these contributions,‌ the overarching theme is‌​‌ the design of learning​​ procedures that are constraint-aware,​​​‌ structure-exploiting, and deployable at‌ scale, while remaining anchored‌​‌ in theoretical guarantees.

3.3​​ Optimization, Learning Dynamics, and​​​‌ Reinforcement Learning Foundations.

A‌ third axis investigates how‌​‌ optimization algorithms shape learned​​ solutions, with particular attention​​​‌ to overparameterized models and‌ sequential decision-making.

Three contributions‌​‌ provide a detailed theoretical​​ account of optimization dynamics​​​‌ in two-layer ReLU networks,‌ identifying an early alignment‌​‌ phase leading to sparse​​ representations and showing that​​​‌ this phenomenon can both‌ enable implicit compression and,‌​‌ in some regimes, prevent​​ interpolation even in large​​​‌ networks. Building on these‌ dynamics, the team explains‌​‌ a simplicity bias and​​​‌ an optimization threshold: with​ enough data, training may​‌ converge to non-interpolating solutions​​ that nonetheless generalize optimally,​​​‌ illuminating a principled transition​ from memorization to generalization.​‌

In parallel, the team​​ develops theoretical frameworks for​​​‌ grokking, for instance establishing​ a two-stage limit behavior​‌ (as weight decay vanishes):​​ an initial phase resembling​​​‌ unregularized gradient flow, followed​ by a slower phase​‌ governed by a Riemannian​​ norm-minimization flow along the​​​‌ manifold of critical points.​ This program clarifies the​‌ mechanism by which norm​​ reduction can occur without​​​‌ sacrificing training performance, yielding​ eventual generalization improvements.

In​‌ reinforcement learning, the team​​ revisits policy optimization for​​​‌ adversarial MDPs, showing that​ policy improvement can be​‌ framed as a generic​​ reduction to adversarial learning​​​‌ not only on Q-values​ but also on advantage​‌ functions, and not limited​​ to exponential weights. The​​​‌ work provides convergence results​ for last iterates under​‌ broad “monotone weight” strategies​​ and transfers stronger regret​​​‌ notions (e.g., strongly adaptive​ and tracking regret) into​‌ the MDP setting. It​​ also clarifies how these​​​‌ reductions inform practical policy​ optimization when models are​‌ unknown and value functions​​ must be estimated.

Collectively,​​​‌ this axis advances a​ principled understanding of the​‌ interaction between optimization, regularization​​ (explicit or implicit), and​​​‌ generalization, and delivers tools​ for sequential decision-making with​‌ stronger performance guarantees.

3.4​​ Generative Modeling: Score-Based Methods.​​​‌

A fourth axis develops​ theory for score-based and​‌ diffusion generative models, with​​ concrete guidance for practice.​​​‌ One contribution analyzes noise​ schedules in score-based generative​‌ modeling, deriving explicit KL​​ bounds (and improved Wasserstein​​​‌ bounds under additional regularity)​ that quantify how schedule​‌ choices impact learning accuracy.​​ Another explains why memorization​​​‌ is often limited in​ diffusion training: in denoising​‌ score matching, the empirical​​ optimum becomes irregular in​​​‌ the small-noise regime, but​ sufficiently large learning rates​‌ induce an implicit regularization​​ that prevents stable convergence​​​‌ to arbitrarily low-risk minima,​ thereby mitigating memorization. A​‌ third contribution studies optimal​​ stopping in latent diffusion​​​‌ models, showing that deterioration​ in the final steps​‌ can be intrinsic to​​ dimensionality reduction: optimal stopping​​​‌ depends systematically on latent​ dimension and interacts with​‌ other training constraints.

Together,​​ these results provide a​​​‌ unified theoretical view of​ training hyperparameters and dynamics​‌ in diffusion-type models, linking​​ generalization, memorization, and sample​​​‌ quality to principled quantitative​ criteria.

3.5 Computational Limits​‌ and Statistical–Computational Gaps.

A​​ core theoretical pillar of​​​‌ the program studies the​ boundary between what is​‌ statistically possible and what​​ is computationally achievable.

Using​​​‌ the low-degree polynomial framework,​ the team develops new​‌ tools to derive computational​​ lower bounds in latent​​​‌ models, improving sharpness and​ simplifying proofs by better​‌ leveraging latent structure; these​​ are instantiated for clustering,​​​‌ sparse clustering, and biclustering,​ with matching upper bounds​‌ and accompanying statistical results.​​ Extending beyond independence assumptions,​​​‌ the team has introduced​ cumulant-based techniques for weakly​‌ dependent structures such as​​ permutations and sampling without​​​‌ replacement, enabling evidence of​ statistical–computational gaps in permutation-based​‌ tasks including feature matching​​ and seriation.

The team​​​‌ also proposes a direct​ approach to low-degree lower​‌ bounds through almost orthonormal​​ polynomial bases in random​​ graph models, which both​​​‌ recovers known results and‌ yields new lower bounds‌​‌ while identifying low-degree optimal​​ polynomials—thereby informing algorithm design.​​​‌ Finally, the work on‌ stochastic block models with‌​‌ many communities postulates and​​ establishes a new threshold​​​‌ below Kesten–Stigum when (‌Kn),‌​‌ showing that optimal polynomial-time​​ recovery may require motif-counting​​​‌ strategies beyond classical spectral‌ methods in denser regimes.‌​‌

This axis strengthens the​​ team’s leadership on fine-grained​​​‌ complexity barriers in modern‌ inference and clarifies which‌​‌ algorithmic paradigms are necessary​​ to approach statistical limits.​​​‌

3.6 Attention Mechanisms: Theory‌ for Transformers

The team‌​‌ also puts forward rigorous​​ theory for attention mechanisms​​​‌ as computational and statistical‌ primitives. One contribution introduces‌​‌ the single-location regression task​​ and shows that a​​​‌ simplified nonlinear self-attention predictor‌ can achieve asymptotic Bayes‌​‌ optimality, despite non-convex training.​​ Another proves that simplified​​​‌ attention layers can perform‌ clustering in Gaussian mixtures,‌​‌ including an “in-context quantization”​​ phenomenon where even fixed​​​‌ identity projections can extract‌ structure. A third contribution‌​‌ provides a statistical-physics analysis​​ explaining the advantage of​​​‌ softmax attention over linear‌ attention: softmax achieves population‌​‌ Bayes optimality and remains​​ superior in finite-sample regimes,​​​‌ offering principled insight into‌ why softmax is central‌​‌ to large language models​​ and how activations interact​​​‌ with generalization.

These works‌ collectively clarify the conditions‌​‌ under which attention architectures​​ provably recover latent structure​​​‌ and sparse information in‌ sometimes asymptotic regimes.

3.7‌​‌ Robust Statistical Inference, Multiple​​ Testing, Reliability, and Model​​​‌ Selection.

A final axis‌ addresses reliability in inference,‌​‌ both through error control​​ and through the selection​​​‌ of appropriate models.

In‌ multiple testing, the team‌​‌ proposes a fast algorithm​​ to compute an entire​​​‌ curve of confidence bounds‌ for the false discovery‌​‌ proportion along nested selection​​ paths, leveraging forest-structured reference​​​‌ families and incremental updates‌ to reduce computational cost‌​‌ to (𝒪(​​|𝒦|m​​​‌)). In network‌ inference, robustness is tackled‌​‌ through SBM parameter estimation​​ under misspecification, with error​​​‌ bounds extending beyond Erdös–Rényi‌ settings and the proposal‌​‌ of SubSearch, a subgraph​​ exploration procedure that both​​​‌ robustly estimates parameters and‌ identifies outlying nodes responsible‌​‌ for departures from the​​ SBM assumptions.

Robustness and​​​‌ structured modeling also appear‌ in an applied multi-omics‌​‌ study of pink discoloration​​ defects in bloomy cheeses,​​​‌ combining microbial profiling and‌ metabolomics. By using Gaussian‌​‌ latent block model co-clustering​​ to uncover associations between​​​‌ microbial communities and metabolites,‌ and validating hypotheses through‌​‌ inoculation experiments, the study​​ provides strong evidence for​​​‌ a microbial driver of‌ the defect, illustrating the‌​‌ team’s capacity to deploy​​ modern statistical modeling to​​​‌ complex biological datasets.

Finally,‌ the team contributes to‌​‌ model selection theory beyond​​ the classical quadratic loss.​​​‌ One work studies penalized‌ selection in the sequence‌​‌ model under sub-Gaussian noise​​ for non-Euclidean losses (notably​​​‌ (p)‌ losses), deriving oracle inequalities‌​‌ via sub-Weibull concentration and​​ establishing minimax rates over​​​‌ Besov bodies with applications‌ to nonparametric regression. Another‌​‌ contribution revisits concentration tools​​ in a basic Rademacher​​​‌ framework to illuminate cut-off‌ phenomena in penalized model‌​‌ selection, linking ideas from​​​‌ concentration of product measures​ to statistical procedures in​‌ a conceptually streamlined setting.​​

3.8 Overall Positioning.

Across​​​‌ these axes, Celeste’s 2025​ program delivers a tightly​‌ connected set of advances:​​ reliable uncertainty quantification, constraint-aware​​​‌ and structure-exploiting learning, theory-driven​ understanding of optimization and​‌ generative modeling, and sharp​​ characterizations of computational feasibility.​​​‌ The year’s contributions combine​ foundational theory, scalable algorithmic​‌ design, and demonstrated relevance​​ to industrial and scientific​​​‌ applications (transportation, energy, anomaly​ detection, and multi-omics), reinforcing​‌ the team’s strategic positioning​​ at the interface of​​​‌ mathematical statistics and modern​ machine learning.

4 Application​‌ domains

4.1 Electricity load​​ consumption: forecasting and control​​​‌

Celeste has a long-term​ collaboration with EDF R&D​‌ on electricity consumption. An​​ important problem is to​​​‌ forecast consumption, e.g., for​ electric vehicles. We currently​‌ work on hierarchical consumption​​ data of electric vehicles,​​​‌ for which we aim​ to output probabilistic forecasts,​‌ e.g., through conformal inference​​ methods.

4.2 Electricity production:​​​‌ control

A new project​ started with EDF in​‌ 2025 involves improving production​​ control in nuclear plants,​​​‌ in particular, in terms​ of limiting effluents and​‌ with more reactive production​​ plans (required due to​​​‌ the increasing importance of​ renewable energy in the​‌ electricity mix).

4.3 Cytometry​​

Celeste collaborates with Metafora​​​‌ to explore the use​ of multiple instance learning​‌ in flow cytometry as​​ a means of early​​​‌ detection of specific cancers.​ This collaboration involves Pascal​‌ Massart and Christine Keribin,​​ in the context of​​​‌ Pierre-André Mikem's Cifre PhD,​ which follows on from​‌ Louis Pujol's thesis defended​​ in 2022.

4.4 Railway​​​‌ operation

Following the CIFRE​ PhD of Rémi Coulaud,​‌ we continue our ongoing​​ collaboration with SNCF–Transilien to​​​‌ exploit large datasets on​ railway operation and passenger​‌ flows, obtained by automatic​​ recording devices (for passenger​​​‌ flows, these correspond to​ sensors at the door​‌ level). We model and​​ forecast passenger movement inside​​​‌ train coaches so as​ to be able to​‌ provide incoming passengers with​​ information on how crowded​​​‌ wagons are. We connect​ this problem to a​‌ neural network framework in​​ order to improve performance.​​​‌ The next step is​ to take into account​‌ the behavior of passengers​​ on platforms. This is​​​‌ part of a CIFRE​ PhD contract which started​‌ in 2025.

4.5 Anomaly​​ detection in industrial time​​​‌ series

Celeste works with​ IRT SystemX and IRT​‌ Saint Exupery to create​​ statistical and machine learning​​​‌ methods to detect rare​ anomalies in high-dimensional industrial​‌ time series.

4.6 Reliability​​

Data collected on the​​​‌ lifetime of complex systems​ is often non-homogeneous, affected​‌ by variability in component​​ production and differences in​​​‌ real-world system use. In​ general, this variability is​‌ neither controlled nor observed​​ in any way, but​​​‌ must be taken into​ account in reliability analysis.​‌ We use latent structure​​ models to identify the​​​‌ main causes of failure,​ and to predict system​‌ reliability as accurately as​​ possible.

4.7 Neglected tropical​​​‌ diseases

Celeste collaborates with​ researchers at Institut Pasteur​‌ on encephalitis in South-East​​ Asia, especially with Jean-David​​​‌ Pommier.

4.8 Explainability in​ change-points detection in high​‌ dimensional multivariate time series​​

Detecting changes in time​​ series is essential in​​​‌ many areas, such as‌ identifying anomalies in industrial‌​‌ processes, monitoring medical conditions,​​ detecting variations in climatic​​​‌ conditions, or analyzing fluctuations‌ in financial markets. Numerous‌​‌ change-point detection approaches have​​ been developed, both offline​​​‌ and online, and applied‌ to univariate and multivariate‌​‌ series. In the multivariate​​ context, where the components​​​‌ of the series can‌ represent the measurements of‌​‌ thousands of sensors, an​​ important question remains after​​​‌ the change-point has been‌ estimated: which sensors are‌​‌ specifically involved in the​​ detected change? Dhia-Elhaq Ouerfelli's​​​‌ PhD thesis develops post-hoc‌ methods to identify the‌​‌ coordinates involved in a​​ detected change and to​​​‌ evaluate the quality of‌ this detection.

4.9 Education‌​‌ sciences

In collaboration with​​ the EST laboratory at​​​‌ Université Paris-Saclay, the Celeste‌ team conducts educational science‌​‌ research focusing on the​​ adaptation of first-year university​​​‌ students to higher education.‌ The team investigates learning‌​‌ and adaptation processes by​​ analyzing highly heterogeneous data,​​​‌ such as questionnaire responses‌ and verbatim texts. These‌​‌ data's underlying latent structures​​ are not directly observable.​​​‌ Methodologically, the research relies‌ on statistical and machine‌​‌ learning approaches to uncover​​ these latent structures. These​​​‌ approaches combine factor analysis,‌ unsupervised clustering methods, and‌​‌ large language models for​​ semantic representation and analysis.​​​‌ Thus, this research contributes‌ to a data-driven, structure-aware‌​‌ understanding of student success​​ and teaching practices.

4.10​​​‌ Ancient materials

Celeste collaborates‌ with CNRS-IPANEMA (Ancient Materials‌​‌ Research Platform). The goal​​ is to propose a​​​‌ new image segmentation method‌ based on a dissimilarity‌​‌ which is particularly well​​ adapted to XRF images.​​​‌ This will allow less‌ exposure to radiation, which‌​‌ is important when dealing​​ with antiques.

5 Social​​​‌ and environmental responsibility

5.1‌ Footprint of research activities‌​‌

The carbon emissions of​​ Celeste team members related​​​‌ to their jobs were‌ very low and came‌​‌ essentially from:

  • limited levels​​ of transport to and​​​‌ from work, and a‌ small amount for essentially‌​‌ land travel to conferences​​ in France and Europe.​​​‌
  • electronic communication (email, Google‌ searches, Zoom meetings, online‌​‌ seminars, LLM requests, etc.).​​
  • the carbon emissions embedded​​​‌ in their personal computing‌ devices (construction), either laptops‌​‌ or desktops.
  • electricity for​​ personal computing devices and​​​‌ for the workplace, plus‌ also water, heating, and‌​‌ maintenance for the latter.​​ Note that only 7.1%​​​‌ (2018) of France's electricity‌ is not sourced from‌​‌ nuclear energy or renewables​​ so team member carbon​​​‌ emissions related to electricity‌ are minimal.

In terms‌​‌ of magnitude, the largest​​ per capita ongoing emissions​​​‌ (excluding flying) are likely‌ simply to be those‌​‌ from buying computers that​​ have a carbon footprint​​​‌ from their construction, in‌ the range of 100‌​‌ kg Co2-e each. In​​ contrast, typical email use​​​‌ per year is around‌ 10 kg Co2-e per‌​‌ person, and a Zoom​​ call comes to around​​​‌ 10g Co2-e per hour‌ per person, while web‌​‌ browsing uses around 100g​​ Co2-e per hour. Consequently,​​​‌ 2025 was a low‌ carbon year for the‌​‌ Celeste team.

The approximate​​ (rounded for simplicity) kg​​​‌ Co2-e values cited above‌ come from the book,‌​‌ “How Bad are Bananas”​​​‌ by Mike Berners-Lee (2020)​ which estimates carbon emissions​‌ in everyday life.

5.2​​ Impact of research results​​​‌

In addition to the​ long-term impact of our​‌ theoretical work—which is of​​ course impossible to assess​​​‌ immediately—we are involved in​ several applied research projects​‌ which aim to have​​ short/mid-term positive impacts on​​​‌ society.

First, the broad​ use of artificial intelligence/machine​‌ learning/statistics nowadays comes with​​ several major ethical issues,​​​‌ one being to avoid​ making unfair or discriminatory​‌ decisions. Our theoretical work​​ on algorithmic fairness has​​​‌ already led to several​ “fair” algorithms that could​‌ be widely used in​​ the short term (one​​​‌ of them is already​ used for enforcing fair​‌ decision-making in student admissions​​ at the University of​​​‌ Genoa).

Second, Patrick Pamphile's​ collaboration with the EST​‌ laboratory led him to​​ join the SYREP (Synergie​​​‌ Réussite Étudiante et Pédagogie)​ working group at Université​‌ Paris-Saclay. There, research insights​​ contribute to institutional strategies​​​‌ aimed at improving student​ success and informing teaching​‌ practices (see Section 4.9​​).

Third, we expect​​​‌ short-term positive impact on​ society from our direct​‌ collaborations with companies such​​ as EDF (forecasting and​​​‌ control of electricity load​ consumption for electric vehicles),​‌ Metafora (early detection of​​ cancers), and SNCF (better​​​‌ forecasting the numbers of​ passengers in each coach​‌ so as to guide​​ boarding passengers to the​​​‌ coaches with most space​ available).

Last, we collaborate​‌ with biologists on neglected​​ tropical diseases; encephalitis in​​​‌ particular, with implications in​ global health strategies.

6​‌ Highlights of the year​​

6.1 Awards

  • Margaux Zaffran​​​‌ (postdoctoral researcher) received the​ following distinctions:
    • Jacques Neveu​‌ PhD Thesis Prize 2024​​ (awarded in 2025 for​​​‌ a thesis defended in​ 2024),
    • PhD Thesis Prize​‌ in Mathematics, Industry, and​​ Society 2025,
    • Paul Caseau​​​‌ PhD Thesis Prize 2025.​

6.2 Grants

  • The Géné-Pi​‌ project (PI: Claire Boyer;​​ co-PIs: Gérard Biau, Francis​​​‌ Bach, and Pierre Marion)​ was awarded PEPR-IA funding​‌ for the amount of​​ 850,000 euros.

6.3 Selected​​​‌ publications

  • New computational barrier​ for stochastic block models​‌ (SBM) with many communities​​34. Cavity method​​​‌ from statistical physics predicts​ that community recovery in​‌ SBM is possible in​​ polynomial time only above​​​‌ the KS threshold. In​ collaboration with A. Carpentier​‌ (Postdam University) and N.​​ Verzelen (INRAE-Montpellier), C. Giraud​​​‌ has proven that this​ prediction breaks down in​‌ the many communities regime.​​ We have shown that​​​‌ community recovery is possible​ below the Kesten-Stigum (KS)​‌ threshold by counting some​​ specific blow-up motifs. In​​​‌ particular, the non-backtracking counts​ originating from message passing​‌ and Bethe free energy​​ are sub-optimal in this​​​‌ case. By developing a​ new technique for proving​‌ low-degree lower bounds, we​​ have also identified this​​​‌ new computational barrier for​ community recovery in SBM​‌ with many communities.

7​​ Latest software developments, platforms,​​​‌ open data

7.1 Latest​ software developments

7.1.1 acanag​‌

  • Keyword:
    Anomaly detection
  • Functional​​ Description:
    La bibliothèque Python​​​‌ acanag ou Active Anomaly​ Detection apprend à détecter​‌ les anomalies dans les​​ données multidimensionnelles de type​​​‌ bags, lots, ou séries​ temporelles.
  • Contact:
    Kevin Bleakley​‌
  • Partners:
    CNRS, IRT SystemX,​​ IRT Saint Exupéry

7.1.2​​ sanssouci

  • Keyword:
    Multiple testing​​​‌
  • Functional Description:
    In a‌ multiple testing context, sanssouci‌​‌ provides statistical guarantees on​​ possibly user-defined and/or data-driven​​​‌ sets of hypotheses. Typical‌ use cases include differential‌​‌ gene expression studies in​​ genomics and fMRI studies​​​‌ in neuroimaging. New contributions‌ include overall optimization and‌​‌ documentation improvements, and, above​​ all, the implementation of​​​‌ the new algorithms described‌ in 11.
  • Contact:‌​‌
    Guillermo Durand
  • Partner:
    Pierre​​ Neuvial (CNRS, Université de​​​‌ Toulouse)

7.1.3 KCPD

  • Name:‌
    Kernel Change Point Detection‌​‌
  • Keyword:
    Change-point detection
  • Functional​​ Description:
    The library is​​​‌ based on the kernel‌ change point detection methods‌​‌ described in Sylvain Arlot​​ and co-authors (2012,2017).
  • URL:​​​‌
  • Contact:
    Kevin Bleakley‌
  • Partner:
    IRT SystemX

7.2‌​‌ Open data

8 New​​ results

8.1 Uncertainty Quantification​​​‌ and Conformal Prediction

8.1.1‌ Conformal prediction for hierarchical‌​‌ data

Participants: Guillaume Principato​​, Gilles Stoltz,​​​‌ Jean-Michel Poggi.

In‌ collaboration with colleagues from‌​‌ EDF (Yvenn Amara-Ouali, Yannig​​ Goude, and Bachir Hamrouche)​​​‌ we study in 45‌ conformal prediction for multivariate‌​‌ data, and more precisely,​​ focus on hierarchical data,​​​‌ where some components are‌ linear combinations of others.‌​‌ Intuitively, the hierarchical structure​​ can be leveraged to​​​‌ reduce the size of‌ prediction regions for the‌​‌ same coverage level. We​​ implement this intuition by​​​‌ including a projection step‌ (also called a reconciliation‌​‌ step) in the split​​ conformal prediction (SCP) procedure,​​​‌ and prove that the‌ resulting prediction regions are‌​‌ indeed globally smaller. We​​ do so both under​​​‌ the classic goal of‌ joint coverage, and under‌​‌ a new and challenging​​ task: component-wise coverage, for​​​‌ which efficiency results are‌ more difficult to obtain.‌​‌ The associated strategies and​​ their analyses are based​​​‌ both on the literature‌ of SCP and of‌​‌ forecast reconciliation, which we​​ connect. We also illustrate​​​‌ the theoretical findings, for‌ different scales of hierarchies,‌​‌ on simulated data.

8.1.2​​ Optimal transport-based conformal prediction​​​‌

Participants: Claire Boyer.‌

This joint work 25‌​‌ with Gauthier Thurin (ENS​​ Paris) and Kimia Nadjahi​​​‌ (ENS Paris) proposes a‌ novel conformal prediction framework‌​‌ for multivariate outputs based​​ on optimal transport. By​​​‌ leveraging Monge–Kantorovich vector ranks‌ and quantiles, the method‌​‌ constructs flexible, potentially non-convex​​ prediction regions that better​​​‌ capture the geometry of‌ complex uncertainty patterns, while‌​‌ retaining finite-sample, distribution-free coverage​​ guarantees.

8.2 Learning with​​​‌ Structure, Constraints, and Dynamics‌

8.2.1 Modeling of passenger‌​‌ movements in trains with​​ communicating coaches

Participants: Christine​​​‌ Keribin, Gilles Stoltz‌.

In collaboration with‌​‌ colleagues from SNCF, namely,​​ Mélissa Baietto and Rémi​​​‌ Coulaud, we model in‌ 6 passenger movements within‌​‌ communicating coaches equipped with​​ infra-red sensors at each​​​‌ door, counting the numbers‌ of passengers boarding and‌​‌ alighting at that door.​​ The business objective is​​​‌ to better estimate the‌ real occupancy rate of‌​‌ each coach instead of​​ solely using boarding counts​​​‌ and discarding passenger movements.‌ To do so, we‌​‌ propose modelings based on​​ stochastic transition matrices that​​​‌ are specific to each‌ station in the most‌​‌ complex modeling. The latter,​​ called local modeling, also​​​‌ has to estimate alighting‌ counts, which it does‌​‌ through data-based alighting rates​​​‌ rather than with origin-destination​ matrices. This piece of​‌ the methodology is of​​ independent interest. The local​​​‌ modeling may actually be​ seen as a neural​‌ network (a recurrent neural​​ network with a many-to-many​​​‌ architecture featuring one hidden​ layer). All modelings are​‌ fit through least-squares minimizations.​​ We evaluate them both​​​‌ qualitatively and quantitatively, on​ data from line H​‌ of the suburban railway​​ network of the Greater​​​‌ Paris area. The qualitative​ evaluation consists of successfully​‌ interpreting the outcomes of​​ the models (transition matrices,​​​‌ alighting rates) based on​ the geographies of the​‌ platforms of the boarding​​ or alighting stations. The​​​‌ quantitative evaluation consists of​ using the models constructed​‌ to forecast alighting counts:​​ modeling passenger movements improves​​​‌ the forecasting performance by​ about at least 15​‌% compared to ignoring​​ the existence of such​​​‌ movements. All in all,​ this study backs up​‌ upgrading the passenger-movement modeling​​ layer in the real-time​​​‌ crowding information deployed in​ the greater-Paris area from​‌ the global modeling currently​​ used to local modeling.​​​‌

8.2.2 Forecasting time series​ with constraints

Participants: Claire​‌ Boyer.

The collaborative​​ work 36 with colleagues​​​‌ from EDF proposes a​ unified framework for time​‌ series forecasting that systematically​​ integrates linear constraints into​​​‌ learning algorithms. The framework​ encompasses and combines existing​‌ approaches such as generalized​​ additive models and hierarchical​​​‌ forecasting, and shows that​ the exact minimizer of​‌ the constrained empirical risk​​ can be computed efficiently​​​‌ using only linear algebra​ operations. This formulation enables​‌ highly scalable implementations optimized​​ for GPU architectures. Extensive​​​‌ empirical evaluations on real-world​ applications, including electricity demand​‌ and tourism forecasting, demonstrate​​ state-of-the-art performance of the​​​‌ proposed approach.

8.2.3 Randomized​ multi-class classification under system​‌ constraints: a unified approach​​ via post-processing

Participants: Evgenii​​​‌ Chzhen, Gayane Taturyan​.

In collaboration with​‌ M. Hebiri, in 35​​ we study the problem​​​‌ of multi-class classification under​ system-level constraints expressible as​‌ linear functionals over randomized​​ classifiers. We propose a​​​‌ post-processing approach that adjusts​ a given base classifier​‌ to satisfy general constraints​​ without retraining. Our method​​​‌ formulates the problem as​ a linearly constrained stochastic​‌ program over randomized classifiers,​​ and leverages entropic regularization​​​‌ and dual optimization techniques​ to construct a feasible​‌ solution. We provide finite-sample​​ guarantees for the risk​​​‌ and constraint satisfaction for​ the final output of​‌ our algorithm under minimal​​ assumptions. The framework accommodates​​​‌ a broad class of​ constraints, including fairness, abstention,​‌ and churn requirements.

8.2.4​​ Prediction-aware learning in multi-agent​​​‌ systems

Participants: Etienne Boursier​.

The work 19​‌ proposes a prediction-aware learning​​ framework for uncoupled online​​​‌ learning in time-varying multiplayer​ games, where agents exploit​‌ forecasts of future payoffs​​ to adapt their strategies.​​​‌ While classical regret guarantees​ degrade rapidly in dynamic​‌ environments, this approach explicitly​​ incorporates prediction to obtain​​​‌ tighter performance bounds when​ payoff variations are predictable.​‌ We introduce POMWU, a​​ contextual extension of the​​​‌ Optimistic Multiplicative Weight Update​ algorithm, and show that,​‌ under bounded prediction errors,​​ it achieves convergence and​​​‌ social welfare guarantees comparable​ to those in static​‌ games, up to terms​​ depending on the prediction​​ quality.

8.2.5 Detecting rare​​​‌ anomalies in multidimensional data‌ using active and supervised‌​‌ learning

Participants: Kevin Bleakley​​.

Detecting rare anomalies​​​‌ in batches of multidimensional‌ data is challenging. We‌​‌ have proposed an original​​ supervised active-learning framework 7​​​‌ that sends a small‌ number of data points‌​‌ from each batch to​​ an expert for labeling​​​‌ as ‘anomaly’ or ‘nominal’‌ via two mechanisms: (i)‌​‌ points most likely to​​ be anomalies in the​​​‌ eyes of a supervised‌ classifier trained on previously-labeled‌​‌ data; and (ii) points​​ suggested by an active​​​‌ learner. Instead of training‌ the supervised classifier directly‌​‌ on currently-labeled raw data,​​ we treat the scores​​​‌ calculated by an ensemble‌ of M user-defined unsupervised‌​‌ anomaly detectors as if​​ they were the learner’s​​​‌ input features. Our approach‌ generalizes earlier attempts to‌​‌ linearly aggregate unsupervised anomaly​​ detector scores, and broadens​​​‌ the scope of these‌ methods from unordered bags‌​‌ of data to ordered​​ data such as time​​​‌ series. Simulated and real‌ data trials suggest that‌​‌ this method usually outperforms—often​​ significantly—linear strategies. The Python​​​‌ library acanag implements our‌ proposed method. This 2025‌​‌ work, in collaboration with​​ Benjamin Auder (LMO Orsay),​​​‌ Martin Royer (IRT System‌ X), and Mouhcine Mendhil‌​‌ (IRT Saint Exupéry), was​​ subsequently published early 2026​​​‌ in TMLR.

8.2.6 Physics-informed‌ kernel learning

Participants: Claire‌​‌ Boyer.

The article​​ 37 introduces physics-informed kernel​​​‌ learning (PIKL), a principled‌ alternative to physics-informed neural‌​‌ networks that integrates physical​​ priors through a kernel-based​​​‌ formulation. By approximating the‌ underlying kernel using Fourier‌​‌ methods, the authors derive​​ a tractable estimator that​​​‌ minimizes a physics-informed risk‌ combining data fidelity and‌​‌ PDE constraints. The framework​​ comes with theoretical guarantees​​​‌ that quantify the impact‌ of the physical prior‌​‌ on convergence rates. Numerical​​ experiments demonstrate that PIKL​​​‌ can outperform physics-informed neural‌ networks in both accuracy‌​‌ and computational efficiency, and​​ in some settings even​​​‌ surpass classical PDE solvers,‌ particularly in the presence‌​‌ of noisy boundary conditions.​​ This is a joint​​​‌ work with Nathan Doumèche‌ (EDF & Sorbonne Université),‌​‌ Francis Bach (Inria), and​​ Gérard Biau (Sorbonne Université),​​​‌ accepted for publication in‌ JMLR in 2025.

8.2.7‌​‌ Fast kernel methods: Sobolev,​​ physics-informed, and additive models​​​‌

Participants: Claire Boyer.‌

The work 37 addresses‌​‌ the scalability limitations of​​ kernel methods by introducing​​​‌ a GPU-accelerated framework for‌ kernel regression with O‌​‌(nlogn​​) computational complexity. Leveraging​​​‌ Fourier representations of kernels‌ together with non-uniform fast‌​‌ Fourier transforms (NUFFT), the​​ proposed approach enables exact,​​​‌ fast, and memory-efficient computations‌ at scale. The framework‌​‌ is instantiated for Sobolev​​ kernel regression, physics-informed regression,​​​‌ and additive models, and‌ the resulting estimators are‌​‌ shown—when applicable—to achieve minimax​​ convergence rates consistent with​​​‌ classical kernel theory. Extensive‌ experiments demonstrate the ability‌​‌ to process datasets with​​ tens of billions of​​​‌ samples within minutes, combining‌ strong statistical guarantees with‌​‌ unprecedented computational scalability.

8.3​​ Optimization, Learning Dynamics, and​​​‌ Reinforcement Learning

8.3.1 Early‌ alignment in two-layer networks‌​‌ training is a two-edged​​ sword

Participants: Etienne Boursier​​​‌.

The work 8‌ characterizes the early-stage optimization‌​‌ dynamics of two-layer neural​​​‌ networks with (leaky) ReLU​ activations. In a general​‌ setting, it provides a​​ precise description and quantitative​​​‌ analysis of an early​ alignment phase, during which​‌ neurons align with a​​ small number of key​​​‌ directions determined by the​ critical points of a​‌ data-dependent function. Throughout this​​ phase, the learned function​​​‌ remains close to zero,​ while the representation becomes​‌ increasingly sparse. This sparsification​​ is typically preserved throughout​​​‌ training and ultimately yields​ a final estimator that​‌ is effectively equivalent to​​ a much smaller network.​​​‌ Building on this alignment​ phenomenon, we also present​‌ an example with three​​ data points showing that,​​​‌ in the small-initialization regime,​ arbitrarily large overparameterized networks​‌ may fail to interpolate​​ the data. This result​​​‌ highlights that the seminal​ convergence guarantees for infinitely​‌ wide networks critically depend​​ on the smoothness of​​​‌ the activation function and​ do not extend to​‌ networks with ReLU activations.​​

8.3.2 Simplicity bias and​​​‌ optimization threshold in two-layer​ ReLU networks

Participants: Etienne​‌ Boursier.

Building on​​ the early alignment characterization​​​‌ of 8, the​ work 17 shows that,​‌ when sufficient data are​​ available, trained two-layer ReLU​​​‌ networks often converge to​ simpler solutions that do​‌ not fully interpolate the​​ training data yet generalize​​​‌ better. In particular, for​ a specific linear data​‌ model, we show that​​ the trained network converges​​​‌ to a solution that​ closely matches the least-squares​‌ linear estimator, and is​​ therefore optimal on unseen​​​‌ data. This simple example​ illustrates the transition from​‌ memorization to generalization—an effect​​ observed in in-context learning​​​‌ and diffusion model training—where,​ beyond a certain number​‌ of training samples, the​​ optimization dynamics fail to​​​‌ reach an interpolating global​ minimum. Instead, they converge​‌ to a spurious local​​ minimum of the training​​​‌ loss that nonetheless achieves​ minimal test error.

8.3.3​‌ A theoretical framework for​​ grokking: interpolation followed by​​​‌ Riemannian norm minimisation

Participants:​ Etienne Boursier.

Grokking​‌ is a training phenomenon​​ characterized by two distinct​​​‌ phases: an initial overfitting​ regime with near-zero training​‌ loss and high test​​ loss, followed—after a long​​​‌ delay—by a generalization phase​ in which both training​‌ and test losses become​​ small. The work 18​​​‌ provides a rigorous and​ general characterization of the​‌ two-stage optimization dynamics underlying​​ the grokking phenomenon. In​​​‌ overparameterized settings, the critical​ points of the training​‌ loss form manifolds. Under​​ suitable smoothness assumptions, we​​​‌ establish a two-stage convergence​ of the parameter trajectory​‌ as the weight-decay parameter​​ λ0.​​​‌ During the first phase,​ the dynamics follow the​‌ unregularized gradient flow, which​​ may lead to poor​​​‌ generalization, for instance in​ large-initialization regimes. In the​‌ second phase, occurring on​​ a time scale of​​​‌ order 1/λ​, the trajectory converges​‌ to a Riemannian flow​​ that minimizes the parameter​​​‌ norm over the critical​ manifold of the training​‌ loss. This phase induces​​ a decrease in parameter​​​‌ norm while preserving training​ performance, a mechanism typically​‌ associated with improved generalization​​ and responsible for the​​​‌ emergence of grokking.

8.3.4​ Policy optimization via adversarial​‌ learning on advantage functions​​

Participants: Chiara Mignacco,​​ Gilles Stoltz.

In​​​‌ collaboration with Matthieu Jonckheere‌ (LAAS–CNRS, Toulouse), We revisit‌​‌ in 14 the reduction​​ of learning in adversarial​​​‌ Markov decision processes (MDPs)‌ to adversarial learning based‌​‌ on Q-values; this reduction​​ has been considered in​​​‌ a number of recent‌ articles as one building‌​‌ block to perform policy​​ optimization. Namely, we first​​​‌ consider and extend this‌ reduction in an ideal‌​‌ setting where an oracle​​ provides value functions: it​​​‌ may involve any adversarial‌ learning strategy (not just‌​‌ exponential weights) and it​​ may be based indifferently​​​‌ on Q-values or on‌ advantage functions. We then‌​‌ present two extensions: first,​​ convergence of the last​​​‌ iterate for a vast‌ class of adversarial learning‌​‌ strategies (again, not just​​ exponential weights), satisfying a​​​‌ property called monotonicity of‌ weights; and second, stronger‌​‌ regret criteria for learning​​ in MDPs, inherited from​​​‌ the stronger regret criteria‌ of adversarial learning named‌​‌ strongly adaptive regret and​​ tracking regret. Then,​​​‌ we demonstrate how adversarial‌ learning, also referred to‌​‌ as aggregation of experts,​​ relates to aggregation (orchestration)​​​‌ of expert policies: we‌ obtain stronger forms of‌​‌ performance guarantees in this​​ setting than existing ones,​​​‌ via yet another, simple‌ reduction. Finally, we discuss‌​‌ the impact of the​​ reduction of learning in​​​‌ adversarial MDPs to adversarial‌ learning in practical scenarios‌​‌ where transition kernels are​​ unknown and value functions​​​‌ must be learned. In‌ particular, we review the‌​‌ literature and note that​​ many strategies for policy​​​‌ optimization feature a policy-improvement‌ step based on exponential‌​‌ weights with estimated Q-values.​​ Our main message is​​​‌ that this step may‌ be replaced by the‌​‌ application of any adversarial​​ learning strategy on estimated​​​‌ Q-values or on estimated‌ advantage functions.

The empirical‌​‌ evaluation of this methodology,​​ together with other twists,​​​‌ is conducted in the‌ companion article 42.‌​‌

8.4 Generative Models and​​ Score-Based Methods

8.4.1 An​​​‌ analysis of the noise‌ schedule for score-based generative‌​‌ models

Participants: Claire Boyer​​.

In collaboration with​​​‌ Stanislas Strasman (Sorbonne Université),‌ Antonio Ocello (ENSAE), Sylvain‌​‌ Le Corff (Sorbonne Université),​​ and Vincent Lemaire (Sorbonne​​​‌ Université), the article 15‌ provides a theoretical analysis‌​‌ of score-based generative models,​​ deriving explicit upper bounds​​​‌ on the Kullback–Leibler divergence‌ between the target and‌​‌ learned distributions that depend​​ on the noise schedule.​​​‌ Under additional regularity assumptions,‌ we obtain improved Wasserstein‌​‌ error bounds by exploiting​​ contraction properties of the​​​‌ underlying dynamics. These results‌ yield practical insights into‌​‌ the choice of training​​ hyperparameters, notably the noise​​​‌ schedule, and are illustrated‌ through numerical experiments on‌​‌ synthetic data and CIFAR-10,​​ highlighting an optimal regime​​​‌ within a parametric family‌ of schedules.

8.4.2 Taking‌​‌ a big step: large​​ learning rates in denoising​​​‌ score matching prevent memorization‌

Participants: Claire Boyer.‌​‌

The conference proceedings 27​​, conducted in collaboration​​​‌ with Yu-Han Wu (Google‌ DeepMind), Pierre Marion (Inria)‌​‌ and Gérard Biau (Sorbonne​​ Université), investigate the origin​​​‌ of memorization in diffusion-based‌ generative models and explain‌​‌ why this is often​​ limited in practice despite​​​‌ the absence of explicit‌ regularization. Focusing on denoising‌​‌ score matching, we show​​​‌ that the empirical optimal​ score is highly irregular​‌ in the small-noise regime​​ and leads to memorization​​​‌ of the training data.​ We then identify an​‌ implicit regularization mechanism induced​​ by sufficiently large learning​​​‌ rates in stochastic gradient​ descent, proving that such​‌ training dynamics prevent stable​​ convergence toward arbitrarily low-risk​​​‌ local minima. As a​ result, the learned score​‌ cannot closely match the​​ empirical optimum, thereby mitigating​​​‌ memorization. The theoretical analysis,​ conducted in a simplified​‌ one-dimensional setting with two-layer​​ neural networks, is supported​​​‌ by numerical experiments demonstrating​ the central role of​‌ the learning rate in​​ controlling memorization effects.

8.4.3​​​‌ Optimal stopping in latent​ diffusion models

Participants: Claire​‌ Boyer.

The collaborative​​ work 46 with researchers​​​‌ from Google, Sorbonne Université​ and Inria, investigates an​‌ unexpected phenomenon in latent​​ diffusion models (LDMs), namely​​​‌ that the final steps​ of the diffusion process​‌ can deteriorate sample quality.​​ Going beyond standard numerical​​​‌ arguments for early stopping,​ the authors show that​‌ this effect is intrinsic​​ to the dimensionality reduction​​​‌ inherent in LDMs. Within​ a Gaussian setting with​‌ linear autoencoders, they provide​​ a theoretical characterization of​​​‌ the interplay between latent​ dimension and optimal stopping​‌ time, demonstrating that lower-dimensional​​ latent representations benefit from​​​‌ earlier stopping, while higher-dimensional​ ones require later termination.​‌ The analysis further reveals​​ interactions between latent dimensionality​​​‌ and other key hyperparameters,​ such as constraints in​‌ score matching. These findings​​ are supported by experiments​​​‌ on both synthetic and​ real datasets, establishing early​‌ stopping as a critical​​ hyperparameter for controlling generative​​​‌ quality in LDMs.

8.5​ Computational Limits and Statistical–Computational​‌ Gaps

8.5.1 Computational lower​​ bounds in latent models:​​​‌ clustering, sparse-clustering, biclustering

Participants:​ Bertrand Even, Christophe​‌ Giraud.

In collaboration​​ with Bertrand Even and​​​‌ Nicolas Verzelen, we investigate​ in 39 computational lower​‌ bounds in latent models.​​ In many high-dimensional problems,​​​‌ like sparse-PCA, planted clique,​ and clustering, the best​‌ known algorithms with polynomial​​ time complexity fail to​​​‌ reach the statistical performance​ provably achievable by algorithms​‌ free of computational constraints.​​ This observation has given​​​‌ rise to the conjecture​ of the existence, for​‌ some problems, of gaps—so​​ called statistical-computational gaps—between the​​​‌ best possible statistical performance​ achievable without computational constraints,​‌ and the best performance​​ achievable with poly-time algorithms.​​​‌ A powerful approach to​ assess the best performance​‌ achievable in poly-time is​​ to investigate the best​​​‌ performance achievable by polynomials​ with low-degree. We build​‌ on the seminal paper​​ of Schramm and Wein​​​‌ 52 and propose a​ new scheme to derive​‌ lower bounds on the​​ performance of low-degree polynomials​​​‌ in some latent space​ models. By better leveraging​‌ the latent structures, we​​ obtain new and sharper​​​‌ results, with simplified proofs.​ We then instantiate our​‌ scheme to provide computational​​ lower bounds for the​​​‌ problems of clustering, sparse​ clustering, and biclustering. We​‌ also prove matching upper-bounds​​ and some additional statistical​​​‌ results, in order to​ provide a comprehensive description​‌ of the statistical-computational gaps​​ occurring in these three​​​‌ problems.

8.5.2 Computational barriers​ for permutation-based problems, and​‌ cumulants of weakly dependent​​ random variables

Participants: Bertrand​​ Even, Christophe Giraud​​​‌.

In collaboration with‌ Bertrand Even and Nicolas‌​‌ Verzelen, we investigate in​​ 38 computational barriers for​​​‌ permutation-based problems. In many‌ high-dimensional problems, polynomial-time algorithms‌​‌ fall short of achieving​​ the statistical limits attainable​​​‌ without computational constraints. A‌ powerful approach to probe‌​‌ the limits of polynomial-time​​ algorithms is to study​​​‌ the performance of low-degree‌ polynomials. Low-degree lower bounds‌​‌ are tightly related to​​ multivariate cumulants. Prior works​​​‌ leverage independence among latent‌ variables to bound cumulants.‌​‌ However, such approaches break​​ down for problems with​​​‌ latent structure lacking independence,‌ such as those involving‌​‌ random permutations. To address​​ this important restriction, we​​​‌ develop a technique to‌ upper-bound cumulants under weak‌​‌ dependencies—such as those arising​​ from sampling without replacement​​​‌ or random permutations. To‌ showcase the effectiveness of‌​‌ our approach, we uncover​​ evidence of statistical–computational gaps​​​‌ in multiple feature matching‌ and in seriation problems.‌​‌

8.5.3 Low-degree lower bounds​​ via almost orthonormal bases​​​‌

Participants: Simone Maria Giancola‌, Christophe Giraud.‌​‌

In collaboration with Alexandra​​ Carpentier, and Nicolas Verzelen,​​​‌ S.M. Giancola and C.‌ Giraud investigate in 33‌​‌ low-degree lower bounds via​​ almost orthonormal bases. Low-degree​​​‌ polynomials have emerged as‌ a powerful paradigm for‌​‌ providing evidence of statistical-computational​​ gaps across a variety​​​‌ of high-dimensional statistical models.‌ For detection problems—where the‌​‌ goal is to test​​ a planted distribution ℙ​​​‌' against a null‌ distribution with independent‌​‌ components—the standard approach is​​ to bound the advantage​​​‌ using an L2‌()-orthonormal‌​‌ family of polynomials. However,​​ this method breaks down​​​‌ for estimation tasks or‌ more complex testing problems‌​‌ where has some​​ planted structure, so that​​​‌ no simple L2‌()-orthogonal‌​‌ polynomial family is available.​​ To address this challenge,​​​‌ several technical workarounds have‌ been proposed, though their‌​‌ implementation can be tricky.​​

In this work, we​​​‌ propose a more direct‌ proof strategy. Focusing on‌​‌ random graph models, we​​ construct a basis of​​​‌ polynomials that is almost‌ orthonormal under ,‌​‌ in precisely those regimes​​ where statistical-computational gaps arise.​​​‌ This almost orthonormal basis‌ not only yields a‌​‌ direct route to establishing​​ low-degree lower bounds, but​​​‌ also allows us to‌ explicitly identify the polynomials‌​‌ that optimize the low-degree​​ criterion. This, in turn,​​​‌ provides insights into the‌ design of optimal polynomial-time‌​‌ algorithms. We illustrate the​​ effectiveness of our approach​​​‌ by recovering known low-degree‌ lower bounds, and establishing‌​‌ new ones for problems​​ such as hidden subcliques,​​​‌ stochastic block models, and‌ seriation models.

8.5.4 Phase‌​‌ transitions for stochastic block​​ models with more than​​​‌ sqrt(n) communities

Participants: Christophe‌ Giraud.

.

In‌​‌ collaboration with Alexandra Carpentier,​​ and Nicolas Verzelen, C.​​​‌ Giraud investigated in 34‌ the problem of community‌​‌ recovery in stochastic block​​ models (SBM) with many​​​‌ communities. Predictions from statistical‌ physics postulate that recovery‌​‌ of the communities in​​ SBM is possible in​​​‌ polynomial time above, and‌ only above, the Kesten-Stigum‌​‌ (KS) threshold. This conjecture​​ has given rise to​​​‌ a rich literature, proving‌ that non-trivial community recovery‌​‌ is indeed possible in​​​‌ SBM above the KS​ threshold. Failure of low-degree​‌ polynomials below the KS​​ threshold was also proven,​​​‌ as long as the​ number K of communities​‌ remains smaller than n​​, where n is​​​‌ the number of nodes​ in the observed graph.​‌

In this work, we​​ postulate a new threshold​​​‌ below the KS threshold​ for Kn​‌, and we prove​​ that:

  • 1.
    For any​​​‌ graph density, low-degree​ polynomials fail to recover​‌ communities below the postulated​​ threshold.
  • 2.
    Community recovery​​​‌ is possible in polynomial​ time above the postulated​‌ threshold, essentially by counting​​ occurrences of some specific​​​‌ motifs, based on the​ blow-up of a cycle.​‌

In particular, counting self-avoiding​​ paths of length log​​​‌(n)—which​ is closely related to​‌ spectral algorithms based on​​ the non-backtracking operator—is optimal​​​‌ only in the sparse​ regime. Other motif counts—unrelated​‌ to spectral properties—must be​​ considered in denser regimes.​​​‌

8.6 Attention Mechanisms

8.6.1​ Attention layers provably solve​‌ single-location regression

Participants: Claire​​ Boyer.

The conference​​​‌ proceedings 20, conducted​ in collaboration with Pierre​‌ Marion (Inria), Gérard Biau​​ (Sorbonne Université,) and Raphaël​​​‌ Berthier (Inria), contributes to​ the theoretical understanding of​‌ attention-based models by analyzing​​ their ability to recover​​​‌ sparse, token-level information and​ internal linear representations. We​‌ introduce the single-location regression​​ task, in which only​​​‌ one token at a​ random and unknown location​‌ in a sequence determines​​ the output. We propose​​​‌ a dedicated predictor, interpretable​ as a simplified non-linear​‌ self-attention mechanism, and establish​​ its asymptotic Bayes optimality.​​​‌ Despite the non-convexity of​ the training problem, the​‌ analysis shows that the​​ model successfully learns the​​​‌ underlying structure, providing theoretical​ insight into the effectiveness​‌ of attention mechanisms in​​ settings with sparse and​​​‌ structured token dependencies.

8.6.2​ Attention-based clustering

Participants: Claire​‌ Boyer.

In collaboration​​ with Rodrigo Maulen (Sorbonne​​​‌ Université) and Pierre Marion​ (Inria), the conference proceedings​‌ 22 provides a theoretical​​ analysis of the ability​​​‌ of transformer architectures to​ uncover latent structure in​‌ data in an unsupervised​​ manner. Focusing on data​​​‌ generated from Gaussian mixture​ models, the authors show​‌ that a simplified two-head​​ attention layer can effectively​​​‌ perform clustering: by minimizing​ a suitably defined population​‌ risk using unlabeled data,​​ the attention head parameters​​​‌ provably align with the​ true mixture centroids. The​‌ study further demonstrates that​​ even an attention layer​​​‌ with fixed key, query,​ and value matrices set​‌ to the identity—thus involving​​ no trainable parameters—can perform​​​‌ in-context quantization. These results​ highlight the intrinsic capacity​‌ of attention mechanisms to​​ adapt to input-dependent distributions​​​‌ and to capture underlying​ structural properties of the​‌ data.

8.6.3 Statistical advantage​​ of softmax attention: insights​​​‌ from single location regression​

Participants: Claire Boyer.​‌

The work 28,​​ conducted in collaboration with​​​‌ colleagues from Inria Paris,​ ENS and EPFL, provides​‌ a theoretical investigation of​​ attention mechanisms in large​​​‌ language models, with a​ particular focus on understanding​‌ the role of the​​ softmax activation. Through the​​​‌ study of a single-location​ regression task, the authors​‌ analyze attention-based predictors in​​ a high-dimensional regime using​​ tools from statistical physics.​​​‌ They show that, at‌ the population level, softmax‌​‌ attention achieves the Bayes-optimal​​ risk, whereas linear attention​​​‌ is intrinsically suboptimal. The‌ analysis further identifies key‌​‌ properties of activation functions​​ required for optimal performance.​​​‌ In the finite-sample regime,‌ the authors derive an‌​‌ asymptotic characterization of the​​ test error, demonstrating that​​​‌ although softmax is no‌ longer strictly Bayes-optimal, it‌​‌ consistently outperforms linear attention.​​ These results shed light​​​‌ on the fundamental advantages‌ of softmax attention and‌​‌ its connection to gradient-based​​ optimization dynamics.

8.7 Robust​​​‌ Statistical Inference, Multiple Testing,‌ Reliability, and Model Selection.‌​‌

8.7.1 Fast confidence bounds​​ for the false discovery​​​‌ proportion over a path‌ of hypotheses

Participants: Guillermo‌​‌ Durand.

In the​​ work 11, in​​​‌ a multiple testing context,‌ we present a new‌​‌ algorithm (and an additional​​ trick) that allows one​​​‌ to quickly compute an‌ entire curve of confidence‌​‌ bounds for the false​​ discovery proportion when the​​​‌ underlying bound Vℜ‌* construction is based‌​‌ on a reference family​​ with a forest​​​‌ structure like in 50‌. By an entire‌​‌ curve, we mean the​​ values V*​​​‌(S1)‌,,V‌​‌*(S​​m) computed on​​​‌ a path of increasing‌ selection sets S1‌​‌S​​m, |S​​​‌t|=t‌. The new algorithm‌​‌ leverages the fact that​​ going from St​​​‌ to St+‌1 is done by‌​‌ adding only one hypothesis.​​ Compared to a more​​​‌ naive approach, the new‌ algorithm has a complexity‌​‌ in 𝒪(|​​𝒦|m)​​​‌ instead of O(‌|𝒦|2‌​‌), where |​​𝒦| is the​​​‌ cardinality of the family.‌

8.7.2 Robust estimation and‌​‌ outlier detection for stochastic​​ block models

Participants: Leonardo​​​‌ Martins Bianco, Christine‌ Keribin.

In this‌​‌ joint work with Zacharie​​ Naulet (INRAE-MaIAGE), we study​​​‌ robust estimation of graph‌ clustering 21. We‌​‌ first prove a bound​​ for the estimation error​​​‌ of stochastic block model‌ (SBM) parameters which generalizes‌​‌ the bound appearing in​​ Acharya et al. 49​​​‌ for Erdös-Renyi graphs to‌ the case of graphs‌​‌ with multiple communities. Interpreting​​ this bound, we then​​​‌ propose SubSearch, an‌ algorithm for robustly estimating‌​‌ SBM parameters by exploring​​ the space of subgraphs​​​‌ in search of one‌ that closely aligns with‌​‌ the model's assumptions. Our​​ approach also functions as​​​‌ an outlier detection method,‌ identifying nodes responsible for‌​‌ the graph's deviation from​​ the model and going​​​‌ beyond simple techniques like‌ pruning high-degree nodes. Extensive‌​‌ experiments on both synthetic​​ and real-world datasets demonstrate​​​‌ the effectiveness of our‌ method.

8.7.3 Pink discoloration‌​‌ defects associated with microbial​​ structure and metabolome changes​​​‌ in commercial bloomy cheeses‌

Participants: Christine Keribin.‌​‌

In this joint work​​ 13 with F. Irlinger,​​​‌ S. Helinck and A.-S.‌ Sarthou (Paris-Saclay Food and‌​‌ Bioproduct Engineering) and B.​​ Laroche (INRAE MaIAGE), we​​​‌ investigate pink discoloration defects‌ in French bloomy rind‌​‌ soft cheeses, which can​​​‌ negatively affect product appearance​ and lead to economic​‌ losses. Two batches of​​ cheese from the same​​​‌ processing plant were analyzed:​ one with pink defects​‌ and one without, allowing​​ for comparative analysis. A​​​‌ multi-omics approach was applied​ combining microbial profiling (16S​‌ rRNA and ITS2 sequencing)​​ and metabolomics (GC–MS and​​​‌ LC-MS) to identify the​ factors linked to the​‌ defect. We performed a​​ Gaussian latent block model​​​‌ (LBM) co-clustering (Nadif and​ Govaert 51) in​‌ order to detect associations​​ between groups of OTUs​​​‌ and groups of metabolites.​ Based on the LBM​‌ results, a Multiblock sPLS-DA​​ analysis was run to​​​‌ determine if the observed​ associations were also related​‌ to the spoilage status.​​ We found interesting correlations​​​‌ and notably with P.​ gangotriensis that had never​‌ previously been detected in​​ cheese. Its role was​​​‌ tested with a dedicated​ inoculation experiment. The results​‌ strongly suggest that P.​​ gangotriensis is responsible for​​​‌ the pink defect.

8.7.4​ Model selection

Participants: Pascal​‌ Massart, Vincent Rivoirard​​.

In this joint​​​‌ work 40 with Claire​ Lacour, we addressed the​‌ problem of model selection​​ in the sequence model​​​‌ Y=θ+​ξ, when ξ​‌ is sub-Gaussian, for non-Euclidian​​ loss functions. In this​​​‌ model, the penalized comparison​ to overfitting procedure was​‌ studied for the ℓ​​p-loss, p≥​​​‌1. Several oracle​ inequalities were derived from​‌ concentration inequalities for sub-Weibull​​ variables. Using judicious collections​​​‌ of models and penalty​ terms, minimax rates of​‌ convergence were stated for​​ Besov bodies r​​​‌,s.​ These results were applied​‌ to the functional model​​ of nonparametric regression.

8.7.5​​​‌ Concentration inequalities and cut-off​ phenomena for penalized model​‌ selection within a basic​​ Rademacher framework

Participants: Pascal​​​‌ Massart, Vincent Rivoirard​.

The work 41​‌ was conceived as a​​ tribute to Patrick Cattiaux.​​​‌ One of the authors​ has known Patrick Cattiaux​‌ for many years and​​ is deeply indebted to​​​‌ him. If one wished​ to illustrate the adage​‌ that life is shaped​​ by chance encounters, what​​​‌ better example could there​ be than the meeting,​‌ in the 1980s, of​​ two young people who​​​‌ both fell in love​ with the mathematics of​‌ randomness—one of whom profoundly​​ changed the other’s life​​​‌ by sharing a simple​ but decisive secret: if​‌ you truly believe in​​ it, a passion can​​​‌ become a profession. By​ another fortunate coincidence, this​‌ tribute appeared at a​​ particularly fitting moment, as​​​‌ Michel Talagrand has just​ been awarded the Abel​‌ Prize. The temptation to​​ pay a double homage​​​‌ was therefore irresistible. Following​ one of the many​‌ paths opened by mathematics,​​ we first established a​​​‌ connection between the work​ of Patrick Cattiaux and​‌ that of Michel Talagrand.​​ We then showed how​​​‌ the abstract probabilistic tools​ related to the concentration​‌ of product measures, revisited​​ in this light, can​​​‌ be used to illuminate​ cut-off phenomena in our​‌ own field of expertise,​​ namely mathematical statistics. There​​​‌ is nothing revolutionary here:​ the influence of Talagrand’s​‌ work on the development​​ of mathematical statistics since​​ the late 1990s is​​​‌ well known. Our contribution‌ rather lies in the‌​‌ choice of a very​​ simple framework, allowing the​​​‌ ideas to be presented‌ with minimal technicalities and‌​‌ letting the main concepts​​ stand out clearly.

9​​​‌ Bilateral contracts and grants‌ with industry

Participants: Christine‌​‌ Keribin, Jean-Michel Poggi​​, Gilles Stoltz,​​​‌ Claire Boyer.

9.1‌ Bilateral contracts with industry‌​‌

  • C. Keribin: Ongoing Cifre​​ PhD contract with Metafora​​​‌ (30 kE) on machine‌ learning in flow cytometry‌​‌ for early detection of​​ cancers started in March​​​‌ 2023.
  • C. Keribin: Ongoing‌ Cifre PhD contract with‌​‌ SNCF (54 kE to​​ be equally shared between​​​‌ LMO and UGE/Grettia) on‌ modeling/forecasting/managing passenger positioning on‌​‌ platforms and on-board trains​​ in densely populated areas,​​​‌ started in January 2025.‌
  • J.M. Poggi: Analysis and‌​‌ modelling of NO2 numerical​​ model biases for data​​​‌ fusion of heterogeneous measurement‌ networks, ATMO NORMANDIE, 20‌​‌ kE; started in December​​ 2022, ended in December​​​‌ 2025.
  • J.M. Poggi, G.‌ Stoltz: Participation in the‌​‌ EDF-Inria Grand défi, with​​ in particular a CIFRE​​​‌ PhD started in December‌ 2023 and a Postdoc‌​‌ that started in February​​ 2025.
  • G. Stoltz: CIFRE​​​‌ PhD contract with EDF‌ (for 55 kE), on‌​‌ reinforcement learning for optimizing​​ the production of nuclear​​​‌ plants; started in autumn‌ 2025
  • C. Boyer: PhD‌​‌ contract with Google DeepMind​​ on diffusion-based generative models;​​​‌ started in January 2025.‌

10 Partnerships and cooperations‌​‌

10.1 International research visitors​​

10.1.1 Visits to international​​​‌ teams

Research stays abroad‌
Claire Boyer
  • Visited institution:‌​‌ IPAM, UCLA
  • Country: USA​​
  • Dates: March-April 2025
  • Context​​​‌ of the visit: Thematic‌ program on optimal transport‌​‌
  • Mobility program/type of mobility:​​ Research stay
Claire Boyer​​​‌
  • Visited institution: CRM, Montreal‌
  • Country: Canada
  • Dates: May-June‌​‌ 2025
  • Context of the​​ visit: Spring school and​​​‌ thematic program on mathematics‌ of data science
  • Mobility‌​‌ program/type of mobility: Research​​ stay

10.2 National initiatives​​​‌

Participants: Sylvain Arlot,‌ Evgenii Chzhen, Christophe‌​‌ Giraud, Gilles Stoltz​​.

10.2.1 ANR

Sylvain​​​‌ Arlot, Evgenii Chzhen, Luca‌ Ganssali, Christophe Giraud and‌​‌ Gilles Stoltz are part​​ of the PEPR-IA grant​​​‌ CAUSALI-T-AI (CAUSALIty Teams up‌ with Artificial Intelligence), which‌​‌ is led by Marianne​​ Clausel (Univ. de Lorraine),​​​‌ during the period 2023-2028.‌

Sylvain Arlot, Christophe Giraud‌​‌ and Guillermo Durand are​​ part of the ANR​​​‌ Chair-IA grant Biscotte,‌ which is led by‌​‌ Gilles Blanchard (Université Paris​​ Saclay), for the period​​​‌ 2019-2026.

Guillermo Durand is‌ part of the ANR‌​‌ BACKUP: BAyesian nonparametrics,​​ Complex models and Kernels,​​​‌ Uncertainty quantification and deeP‌ methods, with Sorbonne Université‌​‌ and Université de Toulouse.​​ Period: 2023-2028. See here​​​‌.

Christophe Giraud and‌ Guillermo Durand are part‌​‌ of ANR ASCAI:​​ Active and batch segmentation,​​​‌ clustering, and seriation: toward‌ unified foundations in AI,‌​‌ with Potsdam University, Munich​​ University, Montpellier INRAE (Period​​​‌ 2022-2026). See here.‌

11 Dissemination

11.1 Promoting‌​‌ scientific activities

11.1.1 Scientific​​ events: organisation

General chair,​​​‌ scientific chair
  • J.-M. Poggi‌ is Past-President of ENBIS‌​‌ (European Network for Business​​ and Industrial Statistics)
  • C.​​​‌ Keribin is Vice-President of‌ the French Statistical Society‌​‌ (SFdS); member of the​​​‌ board of MALIA, SFdS​ specialized group in Machine​‌ Learning and AI.
  • V.​​ Rivoirard is a member​​​‌ of the Scientific Council​ of CIRM.
Member of​‌ the organizing committees
  • S.​​ Arlot is member of​​​‌ the scientific committee of​ the Séminaire Palaisien
  • S.​‌ Arlot, E. Chzhen, C.​​ Keribin, V. Rivoirard are​​​‌ part of the organizing​ committee of the Celeste​‌ conference to be held​​ in 2026 in CIRM​​​‌
  • A. Janon is co-organizer​ the of UQSay seminar​‌
  • E. Chzhen is co-organizer​​ of the DATAIA seminar​​​‌
  • J.-M. Poggi was chair​ of the ENBIS Nominations​‌ Committee 2025
  • J.-M. Poggi​​ was organizer of the​​​‌ ECAS-SFdS course 2025: Towards​ Reliable Machine Learning: Transfer​‌ & Physics Informed Learning,​​ and Conformal Prediction, Fréjus,​​​‌ France, December 1-5, 2025​
  • J.-M. Poggi was organizer​‌ of the ECAS-ENBIS course:​​ Statistical Process Monitoring of​​​‌ Functional Data, Piraeus, Greece,​ September 14, 2025
  • C.​‌ Keribin was co-organizer of​​ the AI4Maths workshop (November​​​‌ 18, 2025)
  • C. Keribin​ was co-organizer of the​‌ workshop Learning (and) statistics​​ with Talagrand (January 7-9,​​​‌ 2026)
  • C. Giraud is​ co-organizer of the biennal​‌ conference StatMathAppli in Frejus​​
  • C. Giraud is co-organizer​​​‌ of the ASCAI final​ meeting in Orsay (June​‌ 2025)
  • G. Durand and​​ V. Rivoirard are co-organizers​​​‌ of the Séminaire Parisien​ de Statistique

11.1.2 Scientific​‌ events: selection

Member of​​ the conference program committees​​​‌
  • C. Giraud, Area chair​ for COLT since 2021​‌
  • C. Boyer, Member of​​ scientific committee of NeurIPS​​​‌ in Paris 2025
  • J.-M.​ Poggi, Member of the​‌ Scientific Program Committee of​​ the ENBIS-25 Conference

11.1.3​​​‌ Journal

Reviewer
  • We performed​ many reviews for various​‌ international journals.
Member of​​ the editorial boards
  • S.​​​‌ Arlot: Associate editor for​ Annales de l'Institut Henri​‌ Poincaré B – Probability​​ and Statistics
  • C. Boyer:​​​‌ Associate editor for Electronic​ Journal of Statistics
  • C.​‌ Boyer: Associate editor for​​ Information & Inference, Oxford​​​‌
  • C. Boyer: Associate editor​ for Journal Of Royal​‌ Statistical Society, series B​​ (JRSS-B)
  • C. Giraud: Action​​​‌ Editor for JMLR
  • C.​ Giraud: Associate Editor for​‌ JEMS
  • C. Giraud: Associate​​ Editor for ESAIM-proc
  • P.​​​‌ Massart: Associate Editor for​ Panoramas et Synthèses (SMF),​‌ Foundations and Trends in​​ Machine Learning, and Confluentes​​​‌ Mathematici
  • J.-M. Poggi: Associate​ Editor for Advances in​‌ Data Analysis and Classification​​
  • J.-M. Poggi: Associate Editor​​​‌ for JDSSV J. Data​ Science, Statistics and Visualization​‌
  • J.-M. Poggi: Guest editor​​ for the Springer-Nature Book​​​‌ “Methodological and Applied Statistics​ and Demography” with the​‌ selected papers of this​​ Conference SIS 2024.
  • G.​​​‌ Stoltz: Associate editor for​ Mathematics of Operations Research​‌
  • C. Keribin: Member of​​ the editorial board, Statistique​​​‌ et Société (SFdS).
  • V.​ Rivoirard: Associate editor for​‌ Annales de l’IHP (B)​​, ALEA, Bernoulli​​​‌ and Stochastic Processes and​ their Applications
Other reviewing​‌ activities
  • We performed many​​ reviews for various top​​​‌ ML conferences.
  • G. Stoltz,​ Top reviewer distinction for​‌ ICML 2025

11.1.4 Invited​​ talks

  • B. Even, ASCAI,​​​‌ Orsay, June 2025
  • C.​ Boyer, Data science seminar,​‌ Oxford Mathematical Institute (UK),​​ February 2025
  • C. Boyer,​​​‌ Seminar, Halicioglu Data Science​ Institute, San Diego, March​‌ 2025
  • C. Boyer, Summer​​ School CRM Montreal, May​​ 2025
  • C. Boyer, Summer​​​‌ School EDF-INRIA, June 2025‌
  • C. Boyer, LMS-Bath Symposia‌​‌ on Inverse Problems and​​ Artificial Intelligence in Medicine​​​‌ Bath (UK), June 2025‌
  • C. Boyer, 1W-MINDS seminar,‌​‌ September 2025
  • C. Boyer,​​ Colloquium, University of Vienna,​​​‌ October 2025
  • C. Boyer,‌ Lecture on generative models,‌​‌ Université d'Aix-Marseille, Novembre 2025​​
  • C. Giraud, NITMB, Chicago,​​​‌ April 2025
  • C. Giraud,‌ Institute Mathematical Science, Singapore,‌​‌ May 2025
  • C. Giraud,​​ ENSAE, November 2025
  • C.​​​‌ Giraud, LSE, London, December‌ 2025
  • E. Boursier, Inria‌​‌ Lille, March, 2025
  • E.​​ Boursier, 10e journée statistique,​​​‌ IHES, Bures-sur-Yvette, April, 2025‌
  • E. Boursier, Inria Grenoble,‌​‌ March, 2025
  • G. Durand,​​ CBIO team seminar, École​​​‌ des Mines de Paris,‌ March 2025
  • G. Durand,‌​‌ Séminaire de modélisation mathématique​​ en sciences de la​​​‌ vie et santé, Univ‌ Paris-Cité, November 2025
  • R.‌​‌ Périer, BACKUP meeting, Paris,​​ June 2025
  • R. Périer,​​​‌ ASCAI meeting, Orsay, June‌ 2025
  • R. Périer, Séminaire‌​‌ parisien de statistique, Paris,​​ September 2025
  • J.-M. Poggi,​​​‌ JDSSV Special Invited Paper‌ Session, ISI World Statistics‌​‌ Congress, The Hague, the​​ Netherlands, October 2025
  • H.​​​‌ Cui, Séminaire parisien de‌ statistique, Paris, Novembre 2025‌​‌
  • H. Cui, ENS Paris,​​ November 2025
  • C. Keribin,​​​‌ CFE-CMStatistics, Londres, December 2025‌

11.1.5 Research administration

  • S.‌​‌ Arlot is a member​​ of the council of​​​‌ the Computer Science Graduate‌ School (GS ISN) of‌​‌ University Paris-Saclay.
  • S. Arlot​​ is a member of​​​‌ the council of the‌ Computer Science Doctoral School‌​‌ (ED STIC) of University​​ Paris-Saclay.
  • C. Boyer is​​​‌ a member of the‌ scientific committee of the‌​‌ PGMO (Programme Gaspard Monge​​ pour l'Optimisation) program.
  • C.​​​‌ Boyer is an elected‌ member of the liaison‌​‌ committee of SMAI-MODE group.​​
  • C. Giraud is a​​​‌ member of the Scientific‌ Committee of labex IRMIA+,‌​‌ Strasbourg.
  • C. Giraud is​​ deputy director of the​​​‌ Mathematics Graduate School of‌ University Paris-Saclay.
  • C. Giraud‌​‌ is in charge of​​ the whole Masters program​​​‌ in mathematics for University‌ Paris-Saclay.
  • C. Giraud is‌​‌ a member of the​​ local Scientific Committee of​​​‌ Institut Pascal.
  • C. Giraud‌ is a member of‌​‌ the council of the​​ Mathematics Doctoral School (EDMH)​​​‌ of Université Paris-Saclay.
  • C.‌ Keribin is member of‌​‌ the board of the​​ Computer Science Doctoral School​​​‌ (ED MSTIC) of Paris-Est‌ Sup.
  • C. Keribin is‌​‌ deputy director of Laboratoire​​ de mathématiques d'Orsay, director​​​‌ since 1/1/2026.
  • C. Keribin‌ is in charge of‌​‌ the M2-Math and IA​​ program master of the​​​‌ mathematical school
  • P. Massart‌ is director of the‌​‌ Fondation Mathématique Jacques Hadamard​​.

11.1.6 Service to​​​‌ the academic community

  • Kevin‌ Bleakley : Maintains the‌​‌ English version of the​​ LMO's website dedicated to​​​‌ research activities
  • E. Boursier:‌ member of Inria Saclay‌​‌ scientific committee
  • E. Chzhen:​​ member of Bibliothèque Jacques​​​‌ Hadamard scientific committee
  • C.‌ Giraud: coordinator of computing‌​‌ resources at the Institut​​ Mathématiques d'Orsay (10 engineers)​​​‌
  • C. Giraud: senior member‌ of CCUPS (Commission Consultative‌​‌ Université Paris Saclay)
  • G​​ . Durand: member of​​​‌ CCUPS
  • G . Durand:‌ member of the Teaching‌​‌ Council of the maths​​ department of Orsay
  • C.​​​‌ Giraud: recruting committee for‌ Data-AI associate professor positions‌​‌
  • C. Keribin is co-president​​​‌ of the scholarship allocation​ committee MixtAI of the​‌ SaclAI school.
  • C. Keribin​​ is member of the​​​‌ committee for awarding the​ Sophie Germain excellence scholarships​‌ (FMJH)
  • C. Keribin: member​​ of the follow-up committee​​​‌ for PhD student Sara​ Madad (UTT)
  • C. Keribin:​‌ member of the follow-up​​ committee for PhD student​​​‌ Anderson Augusma (Laboratoire d'informatique​ de Grenoble)
  • C. Keribin:​‌ member of the follow-up​​ committee for PhD student​​​‌ Augustin Pion (Laboratoire des​ Signaux et Systèmes, CentraleSupelec)​‌
  • C. Keribin: member of​​ the follow-up committee for​​​‌ PhD student Lucie Arts​ (LPSM)
  • C. Keribin: member​‌ of the follow-up committee​​ for PhD student Samy​​​‌ Vilhes (Insa Rouen)
  • G.​ Durand: member of the​‌ follow-up committee for PhD​​ student Nicola De Simone​​​‌ (CEA Grenoble)

11.2 Teaching​ - Supervision - Juries​‌

11.2.1 Teaching

Most of​​ the team members (especially​​​‌ Professors, Associate Professors and​ Ph.D. students) teach several​‌ courses at University Paris-Saclay,​​ as part of their​​​‌ teaching duty. We mention​ below some of the​‌ classes in which we​​ teach.

  • Masters: S. Arlot,​​​‌ Statistical learning and resampling,​ 30h, M2, Université Paris-Saclay​‌
  • Masters: S. Arlot, Preparation​​ for French mathematics agrégation​​​‌ (statistics), 25h, M2, Université​ Paris-Saclay
  • Masters: C. Boyer,​‌ Refresher courses in statistics,​​ 15h, M2, Université Paris-Saclay​​​‌
  • Masters: C. Boyer, Optimization​ meets generalization, mathematics of​‌ neural networks, 20h, M2,​​ Université Paris-Saclay
  • Masters: C.​​​‌ Boyer, Guidelines in Machine​ Learning, 20h, M2, Université​‌ Paris-Saclay
  • Masters: C. Giraud,​​ High-Dimensional Probability and Statistics,​​​‌ 45h, M2, Université Paris-Saclay​
  • Masters: C. Giraud, Mathematics​‌ for AI, 75h, M1,​​ Université Paris-Saclay
  • Masters: C.​​​‌ Keribin, unsupervised and supervised​ learning, M1, 42h, Université​‌ Paris-Saclay/ENSTA
  • Masters: C. Keribin,​​ Unsupervised learning, M1, 15h,​​​‌ Université Paris-Saclay
  • Masters: C.​ Keribin, Advanced Unsupervised Learning,​‌ M2, 24h, Université Paris-Saclay​​
  • Masters: C. Keribin, Internship​​​‌ supervision for M2-Maths &​ IA, Université Paris-Saclay
  • Masters:​‌ G. Durand, Deep Learning​​ Project, 16h, M1-Maths &​​​‌ AI, Université Paris-Saclay
  • Masters:​ G. Durand, Internship supervision,​‌ M1-Maths & AI, Université​​ Paris-Saclay
  • Masters: G. Durand,​​​‌ pré-requis de statistique, 6h,​ M1-Maths & AI, Université​‌ Paris-Saclay
  • Licence: G. Durand,​​ Statistical inference, 30h, L3​​​‌ mathématiques, Université Paris-Saclay
  • Licence:​ G. Durand, Multivariate data​‌ analysis, 18h, L3 mathématiques,​​ Université Paris-Saclay
  • Licence: G.​​​‌ Durand, Statistical tests for​ biology, 38h, L2 biology,​‌ Université Paris-Saclay
  • Licence: G.​​ Durand, Probability and statistics,​​​‌ 24h, L2 mathématiques et​ sciences du vivant, Université​‌ Paris-Saclay
  • Masters: G. Durand,​​ Mathematical statistics, 18h, mastères​​​‌ spécialisées, ENSAE
  • Licence/Masters: E.​ Chzhen, PCC Polytechnique
  • Masters:​‌ E. Chzhen, Statistical Theory​​ of Algorithmic Fairness, 20h,​​​‌ M2 Université Paris-Saclay
  • Masters:​ E. Boursier, Sequential Learning,​‌ 24h, M2 Université Paris-Saclay​​

11.2.2 Supervision

PhD defenses​​​‌
  • 2025-09-16: Daniil Tiapkin, Sample-efficient​ reinforcement learning: exploration, imitation,​‌ and online learning, started​​ October 2023, co-advised by​​​‌ G. Stoltz and E.​ Moulines (Polytechnique).
  • 2025-12-04 :​‌ Leo Martins Bianco, Outliers​​ and Hallucinations: Contributions to​​​‌ Robust Community Detection and​ Language Model Alignment, started​‌ 01/10/2022, co-advised by C.​​ Keribin, Z. Naulet (AgroParisTech)​​​‌ and J. Hoffmann (Google​ DeepMind).
  • 2025-12-12: Chiara Mignacco,​‌ A mathematical study of​​ policy orchestration for reinforcement​​​‌ learning, started October 2022,​ co-advised by G. Stoltz​‌ and M. Jonckheere (LAAS–CNRS,​​ Toulouse).
Current PhD students​​
  • PhD in progress: Gayane​​​‌ Taturyan, Fairness and Robustness‌ in Machine Learning, started‌​‌ Nov. 2021, co-advised by​​ E. Chzhen, J.-M. Loubes​​​‌ (Univ. Toulouse Paul Sabatier)‌ and M. Hebiri (Univ.‌​‌ Gustave Eiffel)
  • PhD in​​ progress: Samy Clementz, Data-driven​​​‌ Early Stopping Rules for‌ saving computation resources in‌​‌ AI, started Sept. 2021,​​ co-advised by S. Arlot​​​‌ and A. Celisse
  • PhD‌ in Progress: Aymeric Capitaine,‌​‌ Incitivizing Federated and Decentralized​​ Learning, started September 2023,​​​‌ co-advised by E. Boursier,‌ M. Jordan (Inria Paris)‌​‌ and A. Durmus (Polytechnique)​​
  • PhD in Progress: Antoine​​​‌ Scheid, Multi-agent bandits and‌ Markovian games, started September‌​‌ 2023, co-advised by E.​​ Boursier, M. Jordan (Inria​​​‌ Paris) and A. Durmus‌ (Polytechnique)
  • PhD in Progress:‌​‌ Guillaume Principato, Hierarchical conformal​​ prediction for smart electric​​​‌ vehicle charging, started December‌ 2023, co-advised by J.M.‌​‌ Poggi and G. Stoltz,​​ as well as Y.​​​‌ Amara-Ouali, Y. Goude, B.‌ Hamrouche (EDF)
  • PhD in‌​‌ progress: Pierre-André Mikem, Multiple​​ instance learning for the​​​‌ detection of tumor cells,‌ started March 2023, co-advised‌​‌ by C. Keribin and​​ P. Massart (Univ. Paris-Saclay).​​​‌ Cifre contract with Metafora‌
  • PhD in progress: Romain‌​‌ Périer, Développement de nouvelles​​ méthodes post hoc pour​​​‌ données structurées, started October‌ 2023, co-advised by G.‌​‌ Durand and Gilles Blanchard​​ (Univ Paris-Saclay)
  • PhD in​​​‌ progress: Bertrand Even, Compromis‌ Statistique-Computationnel et équité en‌​‌ apprentissage non-supervisé, started September​​ 2024, co-advised by C.​​​‌ Giraud and N. Verzelen‌ (INRAE)
  • PhD in progress:‌​‌ Victor Turmel, Repeated Games​​ and Sequential Learning: Towards​​​‌ Fair and Efficient Algorithms,‌ started October 2024, co-advised‌​‌ by G. Stoltz and​​ E. Boursier
  • PhD in​​​‌ progress: Dhia-Elhaq Ouerfelli, Change-point‌ detection and explainability of‌​‌ high-dimensional time series, started​​ October 2024, co-advised by​​​‌ S. Arlot, K. Bleakley,‌ and P. Pamphile
  • PhD‌​‌ in progress: Justine Lebrun,​​ Modeling / forecasting /​​​‌ managing the passenger positioning‌ on platforms and on‌​‌ board trains in densely​​ populated areas, started January​​​‌ 2025, co-advised by C.‌ Keribin and E. Come‌​‌ (UGE/Grettia). Cifre contract with​​ SNCF
  • PhD in progress:​​​‌ Simone Maria Giancola, Computational‌ barriers for modern learning‌​‌ problems, started November 2025,​​ co-advised by C. Giraud​​​‌ and N. Verzelen (INRAE)‌
  • PhD in progress: Hanqi‌​‌ Sun, Causal inference through​​ multi-group learning, started September​​​‌ 2025, co-advised by E.‌ Chzhen, L. Ganassali, G.‌​‌ Stoltz
  • PhD in progress:​​ Timothée Vinçon, Optimization of​​​‌ the control of a‌ nuclear reactor by reinforcement‌​‌ learning techniques, advised by​​ G. Stoltz, as well​​​‌ as by G. Simonini‌ (EDF)

11.2.3 Juries

We‌​‌ participated in many PhD​​ committees (too many to​​​‌ keep an exact record),‌ at University Paris-Saclay as‌​‌ well as at other​​ universities, and we refereed​​​‌ several of these PhDs.‌

11.3 Popularization

11.3.1 Education‌​‌

  • Christophe Giraud produces educational​​ videos on his YouTube​​​‌ channel “High-dimensional probability and‌ statistics”: see here.‌​‌
  • Gilles Stoltz held a​​ MATh.en.JEANS workshop in 2024-25​​​‌ in Lycée Douanier Rousseau‌ of Laval.

11.3.2 Interventions‌​‌

  • A perspective on statistics​​ and the summit for​​​‌ action on AI. See‌ here.

12 Scientific‌​‌ production

12.1 Major publications​​

  • 1 articleE.Etienne​​​‌ Boursier and N.Nicolas‌ Flammarion. Early alignment‌​‌ in two-layer networks training​​​‌ is a two-edged sword​.Journal of Machine​‌ Learning ResearchJuly 2025​​HAL
  • 2 miscA.​​​‌Alexandra Carpentier, C.​Christophe Giraud and N.​‌Nicolas Verzelen. Phase​​ Transition for Stochastic Block​​​‌ Model with more than​ n Communities.September​‌ 2025HAL
  • 3 article​​N.Nathan Doumèche,​​​‌ G.Gérard Biau and​ C.Claire Boyer.​‌ On the convergence of​​ PINNs.BernoulliVol.​​​‌ 312025, pp.​ 2127-2151HAL
  • 4 inproceedings​‌P.Pierre Marion,​​ R.Raphaël Berthier,​​​‌ G.Gérard Biau and​ C.Claire Boyer.​‌ Attention layers provably solve​​ single-location regression.Proceedings​​​‌ of the Thirteenth International​ Conference on Learning Representations​‌ICLR 2025 - Thirteenth​​ International Conference on Learning​​​‌ RepresentationsSingapore, SingaporeFebruary​ 2025HAL

12.2 Publications​‌ of the year

International​​ journals

International peer-reviewed conferences​​​‌

  • 16 inproceedingsA.Alex​ Barbier-Chebbah, C. L.​‌Christian L. Vestergaard,​​ J.-B.Jean-Baptiste Masson and​​​‌ E.Etienne Boursier.​ Approximate information maximization for​‌ bandit games.Proceedings​​ of Machine Learning Research​​​‌28th International Conference on​ Artificial Intelligence and Statistics​‌258Phuket, ThailandMay​​ 2025, 316-324HAL​​
  • 17 inproceedingsE.Etienne​​​‌ Boursier and N.Nicolas‌ Flammarion. Simplicity bias‌​‌ and optimization threshold in​​ two-layer ReLU networks.​​​‌Proceedings of the 42nd‌ International Conference on Machine‌​‌ Learning, Vancouver, Canada. PMLR​​ 267, 2025ICML 2025​​​‌ - International Conference on‌ Machine Learning267Vancouver,‌​‌ CanadaJuly 2025HAL​​back to text
  • 18​​​‌ inproceedingsE.Etienne Boursier‌, S.Scott Pesme‌​‌ and R.-A.Radu-Alexandru Dragomir​​. A Theoretical Framework​​​‌ for Grokking: Interpolation followed‌ by Riemannian Norm Minimisation‌​‌.Advances in Neural​​ Information Processing SystemsNeurIPS​​​‌ 2025 - Neural Information‌ Processing Systems38San‌​‌ Diego, United StatesDecember​​ 2025HALback to​​​‌ text
  • 19 inproceedingsA.‌Aymeric Capitaine, E.‌​‌Etienne Boursier, E.​​Eric Moulines, M.​​​‌ I.Michael I. Jordan‌ and A.Alain Durmus‌​‌. Prediction-Aware Learning in​​ Multi-Agent Systems.Proceedings​​​‌ of the 42nd International‌ Conference on Machine Learning,‌​‌ Vancouver, Canada. PMLR 267,​​ 2025.ICML 2025 -​​​‌ 42nd International Conference on‌ Machine LearningPMLR 267‌​‌Vancouver, CanadaJuly 2025​​HALback to text​​​‌
  • 20 inproceedingsP.Pierre‌ Marion, R.Raphaël‌​‌ Berthier, G.Gérard​​ Biau and C.Claire​​​‌ Boyer. Attention layers‌ provably solve single-location regression‌​‌.Proceedings of the​​ Thirteenth International Conference on​​​‌ Learning RepresentationsICLR 2025‌ - Thirteenth International Conference‌​‌ on Learning RepresentationsSingapore,​​ SingaporeFebruary 2025HAL​​​‌back to text
  • 21‌ inproceedingsL.Leonardo Martins‌​‌ Bianco, C.Christine​​ Keribin and Z.Zacharie​​​‌ Naulet. SubSearch: Robust‌ Estimation and Outlier Detection‌​‌ for Stochastic Block Models​​ via Subgraph Search.​​​‌AISTATS 2025 - 28th‌ International Conference Artificial Intelligence‌​‌ and Statistics258Mai​​ Khao, ThailandMay 2025​​​‌, 1297-1305HALback‌ to text
  • 22 inproceedings‌​‌R.Rodrigo Maulen-Soto,​​ P.Pierre Marion and​​​‌ C.Claire Boyer.‌ Attention-based clustering.Proceedings‌​‌ of The Thirty-ninth Annual​​ Conference on Neural Information​​​‌ Processing Systems (NeurIPS 2025)‌The Thirty-ninth Annual Conference‌​‌ on Neural Information Processing​​ Systems 2025San Diego,​​​‌ United States2025HAL‌back to text
  • 23‌​‌ inproceedingsS.Stanislas Strasman​​, S.Sobihan Surendran​​​‌, C.Claire Boyer‌, S.Sylvain Le‌​‌ Corff, V.Vincent​​ Lemaire and A.Antonio​​​‌ Ocello. Wasserstein Convergence‌ of Critically Damped Langevin‌​‌ Diffusions.Proceedings of​​ The Thirty-ninth Annual Conference​​​‌ on Neural Information Processing‌ Systems (NeurIPS) 2025The‌​‌ Thirty-ninth Annual Conference on​​ Neural Information Processing Systems​​​‌ (NeurIPS 2025)San Diego,‌ United States2025HAL‌​‌
  • 24 inproceedingsV.Victor​​ Thuot, A.Alexandra​​​‌ Carpentier, C.Christophe‌ Giraud and N.Nicolas‌​‌ Verzelen. Clustering with​​ bandit feedback: breaking down​​​‌ the computation/information gap.‌36. International Conference on‌​‌ Algorithmic Learning Theory, ALT​​ 2025272Proceedings of​​​‌ Machine Learning ResearchMilano,‌ ItalyPMLRFebruary 2025‌​‌, 1221 - 1284​​HAL
  • 25 inproceedingsG.​​​‌Gauthier Thurin, K.‌Kimia Nadjahi and C.‌​‌Claire Boyer. Optimal​​ Transport-based Conformal Prediction.​​​‌Proceedings of International Conference‌ on Machine Learning (ICML‌​‌ 2025)International Conference on​​ Machine Learning (ICML 2025)​​​‌Vancouver, Canada2025HAL‌back to text
  • 26‌​‌ inproceedingsD.Daniil Tiapkin​​​‌, E.Evgenii Chzhen​ and G.Gilles Stoltz​‌. Narrowing the Gap​​ between Adversarial and Stochastic​​​‌ MDPs via Policy Optimization​.The 28th International​‌ Conference on Artificial Intelligence​​ and Statistics (AISTATS)Mai​​​‌ Khao, ThailandMay 2025​HAL
  • 27 inproceedingsY.-H.​‌Yu-Han Wu, P.​​Pierre Marion, G.​​​‌Gérard Biau and C.​Claire Boyer. Taking​‌ a Big Step: Large​​ Learning Rates in Denoising​​​‌ Score Matching Prevent Memorization​.Proceedings of Thirty​‌ Eighth Conference on Learning​​ Theory, PMLR (COLT 2025)​​​‌Proceedings of Thirty Eighth​ Conference on Learning Theory,​‌ PMLRLyon, France2025​​HALback to text​​​‌

Conferences without proceedings

Reports & preprints

Other​​ scientific publications

  • 47 misc​​​‌C.Christophe Biernacki,‌ J.Julien Jacques and‌​‌ C.Christine Keribin.​​ Model based co-clustering: high​​​‌ dimension & estimation challenges‌.January 2025HAL‌​‌
  • 48 miscL.Leonardo​​ Martins Bianco, C.​​​‌Christine Keribin and Z.‌Zacharie Naulet. Robust‌​‌ estimation and outlier detection​​ for Stochastic Block Models​​​‌ via subgraph search.‌December 2025HAL

12.3‌​‌ Cited publications

  • 49 misc​​J.Jayadev Acharya,​​​‌ A.Ayush Jain,‌ G.Gautam Kamath,‌​‌ A. T.Ananda Theertha​​ Suresh and H.Huanyu​​​‌ Zhang. Robust Estimation‌ for Random Graphs.‌​‌2022, URL: https://arxiv.org/abs/2111.05320​​back to text
  • 50​​​‌ articleG.Guillermo Durand‌, G.Gilles Blanchard‌​‌, P.Pierre Neuvial​​ and E.Etienne Roquain​​​‌. Post hoc false‌ positive control for structured‌​‌ hypotheses.Scand. J.​​ Stat.4742020​​​‌, 1114--1148URL: https://doi.org/10.1111/sjos.12453‌DOIback to text‌​‌
  • 51 articleM.Mohamed​​ Nadif and G.Gérard​​​‌ Govaert. Algorithms for‌ Model-based Block Gaussian Clustering.‌​‌.DMIN82008​​, 14--17back to​​​‌ text
  • 52 articleT.‌Tselil Schramm and A.‌​‌ S.Alexander S. Wein​​​‌. Computational barriers to​ estimation from low-degree polynomials​‌.The Annals of​​ Statistics503June​​​‌ 2022, URL: http://dx.doi.org/10.1214/22-AOS2179​DOIback to text​‌