CELESTE

CELESTE - 2025

2025Activity reportProject-TeamCELESTE‌

RNSR: 201923222N

Research center Inria Saclay Centre at‌ Université Paris-Saclay
In partnership with:CNRS, Université Paris-Saclay‌
Team name: mathematical statistics and learning
In collaboration‌ with:Laboratoire de mathématiques d'Orsay de l'Université de‌ Paris-Saclay (LMO)

Creation of the Project-Team: 2019 June‌ 01

Each year, Inria research teams publish an‌ Activity Report presenting their work and results over‌ the reporting period. These reports follow a common‌ structure, with some optional sections depending on the‌ specific team. They typically begin by outlining the‌ overall objectives and research programme, including the main‌ research themes, goals, and methodological approaches. They also‌ describe the application domains targeted by the team,‌ highlighting the scientific or societal contexts in which‌ their work is situated.

The reports then present‌ the highlights of the year, covering major scientific‌ achievements, software developments, or teaching contributions. When relevant,‌ they include sections on software, platforms, and open‌ data, detailing the tools developed and how they‌ are shared. A substantial part is dedicated to‌ new results, where scientific contributions are described in‌ detail, often with subsections specifying participants and associated‌ keywords.

Finally, the Activity Report addresses funding, contracts,‌ partnerships, and collaborations at various levels, from industrial‌ agreements to international cooperations. It also covers dissemination‌ and teaching activities, such as participation in scientific‌ events, outreach, and supervision. The document concludes with‌ a presentation of scientific production, including major publications‌ and those produced during the year.

Keywords

Computer‌ Science and Digital Science

A3.1.1. Modeling, representation
A3.1.8.‌ Big data (production, storage, transfer)
A3.3. Data and‌ knowledge analysis
A3.3.3. Big data analysis
A3.4. Machine‌ learning and statistics
A3.5.1. Analysis of large graphs‌
A6.1. Methods in mathematical modeling
A9.2. Machine learning‌
A9.2.1. Supervised learning
A9.2.2. Unsupervised learning
A9.2.3. Reinforcement‌ learning
A9.2.4. Optimization and learning
A9.2.5. Bayesian methods‌
A9.2.6. Neural networks
A9.2.7. Kernel methods
A9.2.8. Deep‌ learning

1‌ Team members, visitors, external‌ collaborators

Research Scientists

Kevin‌‌ Bleakley [INRIA, Researcher]
Etienne Boursier‌ [INRIA, ISFP‌]
Gilles Celeux [‌‌INRIA, Emeritus]
Evgenii Chzhen [CNRS‌, Researcher]
Hugo‌ Cui [CNRS]‌‌
Gilles Stoltz [CNRS, Senior Researcher,‌ HDR]

Faculty Members‌

Sylvain Arlot [Team‌‌ leader, UNIV PARIS SACLAY, Professor,‌ until Nov 2025]‌
Claire Boyer [Team‌‌ leader, UNIV PARIS SACLAY, Professor,‌ from Dec 2025]‌
Sylvain Arlot [UNIV‌‌ PARIS SACLAY, Professor, from Dec 2025‌]
Claire Boyer [‌UNIV PARIS SACLAY,‌‌ Professor, from Apr 2025 until Nov 2025‌]
Guillermo Durand [‌UNIV PARIS SACLAY,‌‌ Associate Professor, from Apr 2025]
Luca‌ Ganassali [UNIV PARIS‌ SACLAY, Associate Professor‌‌, from Apr 2025]
Christophe Giraud [‌UNIV PARIS SACLAY,‌ Professor]
Christine Keribin‌‌ [UNIV PARIS SACLAY, Professor]
Pascal‌ Massart [UNIV PARIS‌ SACLAY, Professor]‌‌
Patrick Pamphile [UNIV PARIS SACLAY, Associate‌ Professor]
Vincent Rivoirard‌ [LMO, Professor‌‌ Delegation]

Post-Doctoral Fellows

Julien Aubert [INRIA‌, Post-Doctoral Fellow,‌ from Feb 2025 until‌‌ Oct 2025]
Margaux Zaffran [INRIA,‌ Post-Doctoral Fellow, from‌ Oct 2025]

PhD‌‌ Students

Bertrand Even [UNIV PARIS SACLAY]‌
Simone Maria Giancola [‌UNIV PARIS SACLAY,‌‌ from Nov 2025]
Justine Lebrun [SNCF‌, CIFRE, from‌ Feb 2025]
Leonardo‌‌ Martins Bianco [LMO, until Sep 2025‌]
Chiara Mignacco [‌UNIV PARIS SACLAY,‌‌ until Sep 2025]
Pierre-Andre Mikem [UNIV‌ PARIS SACLAY]
Dhia‌ Elhaq Ouerfelli [UNIV‌‌ PARIS SACLAY]
Romain Perier [UNIV PARIS‌ SACLAY]
Guillaume Principato‌ [EDF]
Antoine‌‌ Scheid [Ecole Polytechnique and Inria Paris]‌
Hanqi Sun [INRIA‌, from Sep 2025‌‌]
Gayane Taturyan [IRT SYSTEM X]‌
Daniil Tiapkin [ECOLE‌ POLY PALAISEAU]
Victor‌‌ Turmel [UNIV PARIS SACLAY]
Timothée Vincon‌ [EDF R&D,‌ CIFRE, from Oct‌‌ 2025]

Interns and Apprentices

Shuailong Zhu [‌INRIA, Intern,‌ until Mar 2025]‌‌

Administrative Assistant

Laetitia Jubely [INRIA, from‌ May 2025]

External‌ Collaborators

Benjamin Auder [‌‌CNRS]
Jean-Michel Poggi [UNIV PARIS SACLAY‌]

2 Overall objectives‌

2.1 Mathematical statistics and‌‌ learning

Data science—a vast field that includes statistics,‌ machine learning, signal processing,‌ data visualization, and databases—has‌‌ become front-page news due to its ever-increasing impact‌ on society, over and‌ above the important role‌‌ it already played in science over the last‌ few decades. Within data‌ science, the statistical community‌‌ has long-term experience in how to infer knowledge‌ from data, based on‌ solid mathematical foundations. The‌‌ recent field of machine learning has also made‌ important progress by combining‌ statistics and optimization, with‌‌ a fresh point of‌ view that originates in applications where prediction is‌ more important than building models.

The Celeste project-team‌ is positioned at the interface between statistics and‌ machine learning. We are statisticians in a mathematics‌ department, with strong mathematical backgrounds, interested in interactions‌ between theory, algorithms, and applications. Indeed, applications are‌ the source of many of our interesting theoretical‌ problems, while the theory we develop plays a‌ key role in (i) understanding how and why‌ successful statistical learning algorithms work—hence improving them—and (ii)‌ building new algorithms upon mathematical statistics-based foundations. Therefore,‌ we tackle several major challenges of machine learning‌ with our mathematical statistics point of view (in‌ particular the algorithmic fairness issue), always having in‌ mind that modern datasets are often high-dimensional and/or‌ large-scale, which must be taken into account at‌ the building stage of statistical learning algorithms. For‌ instance, there often are trade-offs between statistical accuracy‌ and complexity which we want to clarify as‌ much as possible.

In addition, most theoretical guarantees‌ that we prove are non-asymptotic, which is important‌ because the number of features $p$ is often‌ larger than the sample size $n$ in modern‌ datasets, hence asymptotic results with $p$ fixed and‌ $n \to + ∞$ are not relevant. The‌ non-asymptotic approach is also closer to the real-world‌ than specific asymptotic settings, since it is difficult‌ to say whether $p = 1000$ and $n‌ = 100$ corresponds to the setting $p =‌ 10 n$ or $p = n^{3 /‌ 2}$ .

Finally, a key ingredient in our‌ research program is connecting our theoretical and methodological‌ results with (a great number of) real-world applications.‌ This is the reason why a large part‌ of our work is devoted to industrial and‌ medical data modeling on a set of real-world‌ problems coming from our long-term collaborations with several‌ partners, as well as various opportunistic one-shot collaborations.‌

3 Research program

In 2025, the Celeste team‌ pursued a coherent research program at the intersection‌ of statistical learning, optimization, and probabilistic modeling, with‌ a constant emphasis on three cross-cutting requirements: rigorous‌ guarantees, scalability, and adaptation to structure (constraints, geometry,‌ dependence, and dynamics). The team’s contributions advance both‌ foundational theory—clarifying what can be learned, under which‌ assumptions, and at what computational cost—and practical methodologies‌ motivated by large-scale industrial and scientific applications.

3.1‌ Uncertainty Quantification for Structured Multivariate Outputs.

A first‌ pillar of the program develops distribution-free uncertainty quantification‌ tools that remain valid in finite samples while‌ adapting to multivariate structure. One contribution studies conformal‌ prediction for hierarchical data, where components satisfy known‌ linear relations. By integrating a reconciliation (projection) step‌ into split conformal prediction, the method leverages hierarchy‌ to build strictly more efficient prediction regions at‌ the same coverage level, including for the demanding‌ goal of component-wise coverage. This work also forges‌ links between conformal inference and the literature on‌ forecast reconciliation, thereby unifying perspectives from statistical learning‌ and forecasting.

Complementing this structural viewpoint, a second contribution introduces optimal transport–based‌ conformal prediction for multivariate‌ outputs. Using Monge–Kantorovich vector‌‌ ranks and quantiles, it constructs flexible (potentially non-convex)‌ prediction regions that better‌ reflect the geometry of‌‌ complex uncertainty patterns, while preserving finite-sample, distribution-free coverage.‌ Together, these works strengthen‌ the team’s capability to‌‌ deliver uncertainty sets that are both reliable and‌ informative in high-dimensional settings.‌

3.2 Learning with Structure,‌‌ Constraints, and Operational Dynamics.

A second major axis‌ designs learning methods that‌ incorporate constraints, domain structure,‌‌ and temporal/strategic dynamics, motivated by operational problems.

On‌ the applied side, the‌ team develops statistical models‌‌ for passenger movements within trains with communicating coaches,‌ using infra-red door sensors‌ to infer within-train flows‌‌ and improve coach-level occupancy estimation. The proposed family‌ of models—culminating in a‌ station-specific “local” modeling interpretable‌‌ as a recurrent neural architecture—yields both interpretable parameters‌ and a substantial forecasting‌ benefit (about 15% improvement‌‌ for alighting-count prediction), supporting the operational upgrade of‌ real-time crowding information in‌ the Greater Paris area.‌‌

On the methodological side, the team proposes unified‌ frameworks for time series‌ forecasting under linear constraints,‌‌ showing that constrained empirical risk minimization can be‌ solved exactly using only‌ linear algebra, enabling highly‌‌ scalable GPU implementations and strong performance on real‌ forecasting tasks (e.g., energy‌ demand and tourism). In‌‌ a related direction, the team addresses multi-class classification‌ under system-level constraints through‌ post-processing of randomized classifiers:‌‌ by formulating the problem as a constrained stochastic‌ program and using entropic‌ regularization with dual optimization,‌‌ the method enforces constraints such as fairness, abstention,‌ or churn without retraining,‌ while providing finite-sample guarantees.‌‌

This axis also includes learning in interactive settings:‌ work on prediction-aware learning‌ in multi-agent systems introduces‌‌ a framework where agents exploit forecasts of future‌ payoffs to improve performance‌ in time-varying games. The‌‌ proposed algorithm (POMWU) achieves convergence and welfare guarantees‌ close to static settings‌ when prediction errors are‌‌ controlled, refining classical regret analyses in dynamic environments.‌

Finally, the team contributes‌ to rare anomaly detection‌‌ through supervised active-learning frameworks that combine expert labeling‌ with both classifier-driven and‌ active-learning selection of candidates.‌‌ A distinctive aspect is to use the anomaly‌ scores from an ensemble‌ of unsupervised detectors as‌‌ features, generalizing aggregation methods and extending them to‌ ordered data such as‌ time series; the resulting‌‌ methodology is implemented in the open-source library acanag‌.

Across these contributions,‌ the overarching theme is‌‌ the design of learning procedures that are constraint-aware,‌ structure-exploiting, and deployable at‌ scale, while remaining anchored‌‌ in theoretical guarantees.

3.3 Optimization, Learning Dynamics, and‌ Reinforcement Learning Foundations.

A‌ third axis investigates how‌‌ optimization algorithms shape learned solutions, with particular attention‌ to overparameterized models and‌ sequential decision-making.

Three contributions‌‌ provide a detailed theoretical account of optimization dynamics‌ in two-layer ReLU networks,‌ identifying an early alignment‌‌ phase leading to sparse representations and showing that‌ this phenomenon can both‌ enable implicit compression and,‌‌ in some regimes, prevent interpolation even in large‌ networks. Building on these‌ dynamics, the team explains‌‌ a simplicity bias and‌ an optimization threshold: with enough data, training may‌ converge to non-interpolating solutions that nonetheless generalize optimally,‌ illuminating a principled transition from memorization to generalization.‌

In parallel, the team develops theoretical frameworks for‌ grokking, for instance establishing a two-stage limit behavior‌ (as weight decay vanishes): an initial phase resembling‌ unregularized gradient flow, followed by a slower phase‌ governed by a Riemannian norm-minimization flow along the‌ manifold of critical points. This program clarifies the‌ mechanism by which norm reduction can occur without‌ sacrificing training performance, yielding eventual generalization improvements.

In‌ reinforcement learning, the team revisits policy optimization for‌ adversarial MDPs, showing that policy improvement can be‌ framed as a generic reduction to adversarial learning‌ not only on Q-values but also on advantage‌ functions, and not limited to exponential weights. The‌ work provides convergence results for last iterates under‌ broad “monotone weight” strategies and transfers stronger regret‌ notions (e.g., strongly adaptive and tracking regret) into‌ the MDP setting. It also clarifies how these‌ reductions inform practical policy optimization when models are‌ unknown and value functions must be estimated.

Collectively,‌ this axis advances a principled understanding of the‌ interaction between optimization, regularization (explicit or implicit), and‌ generalization, and delivers tools for sequential decision-making with‌ stronger performance guarantees.

3.4 Generative Modeling: Score-Based Methods.‌

A fourth axis develops theory for score-based and‌ diffusion generative models, with concrete guidance for practice.‌ One contribution analyzes noise schedules in score-based generative‌ modeling, deriving explicit KL bounds (and improved Wasserstein‌ bounds under additional regularity) that quantify how schedule‌ choices impact learning accuracy. Another explains why memorization‌ is often limited in diffusion training: in denoising‌ score matching, the empirical optimum becomes irregular in‌ the small-noise regime, but sufficiently large learning rates‌ induce an implicit regularization that prevents stable convergence‌ to arbitrarily low-risk minima, thereby mitigating memorization. A‌ third contribution studies optimal stopping in latent diffusion‌ models, showing that deterioration in the final steps‌ can be intrinsic to dimensionality reduction: optimal stopping‌ depends systematically on latent dimension and interacts with‌ other training constraints.

Together, these results provide a‌ unified theoretical view of training hyperparameters and dynamics‌ in diffusion-type models, linking generalization, memorization, and sample‌ quality to principled quantitative criteria.

3.5 Computational Limits‌ and Statistical–Computational Gaps.

A core theoretical pillar of‌ the program studies the boundary between what is‌ statistically possible and what is computationally achievable.

Using‌ the low-degree polynomial framework, the team develops new‌ tools to derive computational lower bounds in latent‌ models, improving sharpness and simplifying proofs by better‌ leveraging latent structure; these are instantiated for clustering,‌ sparse clustering, and biclustering, with matching upper bounds‌ and accompanying statistical results. Extending beyond independence assumptions,‌ the team has introduced cumulant-based techniques for weakly‌ dependent structures such as permutations and sampling without‌ replacement, enabling evidence of statistical–computational gaps in permutation-based‌ tasks including feature matching and seriation.

The team‌ also proposes a direct approach to low-degree lower‌ bounds through almost orthonormal polynomial bases in random graph models, which both‌ recovers known results and‌ yields new lower bounds‌‌ while identifying low-degree optimal polynomials—thereby informing algorithm design.‌ Finally, the work on‌ stochastic block models with‌‌ many communities postulates and establishes a new threshold‌ below Kesten–Stigum when (‌ $K \geq \sqrt{n}$ ),‌‌ showing that optimal polynomial-time recovery may require motif-counting‌ strategies beyond classical spectral‌ methods in denser regimes.‌‌

This axis strengthens the team’s leadership on fine-grained‌ complexity barriers in modern‌ inference and clarifies which‌‌ algorithmic paradigms are necessary to approach statistical limits.‌

3.6 Attention Mechanisms: Theory‌ for Transformers

The team‌‌ also puts forward rigorous theory for attention mechanisms‌ as computational and statistical‌ primitives. One contribution introduces‌‌ the single-location regression task and shows that a‌ simplified nonlinear self-attention predictor‌ can achieve asymptotic Bayes‌‌ optimality, despite non-convex training. Another proves that simplified‌ attention layers can perform‌ clustering in Gaussian mixtures,‌‌ including an “in-context quantization” phenomenon where even fixed‌ identity projections can extract‌ structure. A third contribution‌‌ provides a statistical-physics analysis explaining the advantage of‌ softmax attention over linear‌ attention: softmax achieves population‌‌ Bayes optimality and remains superior in finite-sample regimes,‌ offering principled insight into‌ why softmax is central‌‌ to large language models and how activations interact‌ with generalization.

These works‌ collectively clarify the conditions‌‌ under which attention architectures provably recover latent structure‌ and sparse information in‌ sometimes asymptotic regimes.

3.7‌‌ Robust Statistical Inference, Multiple Testing, Reliability, and Model‌ Selection.

A final axis‌ addresses reliability in inference,‌‌ both through error control and through the selection‌ of appropriate models.

In‌ multiple testing, the team‌‌ proposes a fast algorithm to compute an entire‌ curve of confidence bounds‌ for the false discovery‌‌ proportion along nested selection paths, leveraging forest-structured reference‌ families and incremental updates‌ to reduce computational cost‌‌ to ( $𝒪 ( | 𝒦 | m‌)$ ). In network‌ inference, robustness is tackled‌‌ through SBM parameter estimation under misspecification, with error‌ bounds extending beyond Erdös–Rényi‌ settings and the proposal‌‌ of SubSearch, a subgraph exploration procedure that both‌ robustly estimates parameters and‌ identifies outlying nodes responsible‌‌ for departures from the SBM assumptions.

Robustness and‌ structured modeling also appear‌ in an applied multi-omics‌‌ study of pink discoloration defects in bloomy cheeses,‌ combining microbial profiling and‌ metabolomics. By using Gaussian‌‌ latent block model co-clustering to uncover associations between‌ microbial communities and metabolites,‌ and validating hypotheses through‌‌ inoculation experiments, the study provides strong evidence for‌ a microbial driver of‌ the defect, illustrating the‌‌ team’s capacity to deploy modern statistical modeling to‌ complex biological datasets.

Finally,‌ the team contributes to‌‌ model selection theory beyond the classical quadratic loss.‌ One work studies penalized‌ selection in the sequence‌‌ model under sub-Gaussian noise for non-Euclidean losses (notably‌ ( $ℓ_{p}$ )‌ losses), deriving oracle inequalities‌‌ via sub-Weibull concentration and establishing minimax rates over‌ Besov bodies with applications‌ to nonparametric regression. Another‌‌ contribution revisits concentration tools in a basic Rademacher‌ framework to illuminate cut-off‌ phenomena in penalized model‌‌ selection, linking ideas from‌ concentration of product measures to statistical procedures in‌ a conceptually streamlined setting.

3.8 Overall Positioning.

Across‌ these axes, Celeste’s 2025 program delivers a tightly‌ connected set of advances: reliable uncertainty quantification, constraint-aware‌ and structure-exploiting learning, theory-driven understanding of optimization and‌ generative modeling, and sharp characterizations of computational feasibility.‌ The year’s contributions combine foundational theory, scalable algorithmic‌ design, and demonstrated relevance to industrial and scientific‌ applications (transportation, energy, anomaly detection, and multi-omics), reinforcing‌ the team’s strategic positioning at the interface of‌ mathematical statistics and modern machine learning.

4 Application‌ domains

4.1 Electricity load consumption: forecasting and control‌

Celeste has a long-term collaboration with EDF R&D‌ on electricity consumption. An important problem is to‌ forecast consumption, e.g., for electric vehicles. We currently‌ work on hierarchical consumption data of electric vehicles,‌ for which we aim to output probabilistic forecasts,‌ e.g., through conformal inference methods.

4.2 Electricity production:‌ control

A new project started with EDF in‌ 2025 involves improving production control in nuclear plants,‌ in particular, in terms of limiting effluents and‌ with more reactive production plans (required due to‌ the increasing importance of renewable energy in the‌ electricity mix).

4.3 Cytometry

Celeste collaborates with Metafora‌ to explore the use of multiple instance learning‌ in flow cytometry as a means of early‌ detection of specific cancers. This collaboration involves Pascal‌ Massart and Christine Keribin, in the context of‌ Pierre-André Mikem's Cifre PhD, which follows on from‌ Louis Pujol's thesis defended in 2022.

4.4 Railway‌ operation

Following the CIFRE PhD of Rémi Coulaud,‌ we continue our ongoing collaboration with SNCF–Transilien to‌ exploit large datasets on railway operation and passenger‌ flows, obtained by automatic recording devices (for passenger‌ flows, these correspond to sensors at the door‌ level). We model and forecast passenger movement inside‌ train coaches so as to be able to‌ provide incoming passengers with information on how crowded‌ wagons are. We connect this problem to a‌ neural network framework in order to improve performance.‌ The next step is to take into account‌ the behavior of passengers on platforms. This is‌ part of a CIFRE PhD contract which started‌ in 2025.

4.5 Anomaly detection in industrial time‌ series

Celeste works with IRT SystemX and IRT‌ Saint Exupery to create statistical and machine learning‌ methods to detect rare anomalies in high-dimensional industrial‌ time series.

4.6 Reliability

Data collected on the‌ lifetime of complex systems is often non-homogeneous, affected‌ by variability in component production and differences in‌ real-world system use. In general, this variability is‌ neither controlled nor observed in any way, but‌ must be taken into account in reliability analysis.‌ We use latent structure models to identify the‌ main causes of failure, and to predict system‌ reliability as accurately as possible.

4.7 Neglected tropical‌ diseases

Celeste collaborates with researchers at Institut Pasteur‌ on encephalitis in South-East Asia, especially with Jean-David‌ Pommier.

4.8 Explainability in change-points detection in high‌ dimensional multivariate time series

Detecting changes in time series is essential in‌ many areas, such as‌ identifying anomalies in industrial‌‌ processes, monitoring medical conditions, detecting variations in climatic‌ conditions, or analyzing fluctuations‌ in financial markets. Numerous‌‌ change-point detection approaches have been developed, both offline‌ and online, and applied‌ to univariate and multivariate‌‌ series. In the multivariate context, where the components‌ of the series can‌ represent the measurements of‌‌ thousands of sensors, an important question remains after‌ the change-point has been‌ estimated: which sensors are‌‌ specifically involved in the detected change? Dhia-Elhaq Ouerfelli's‌ PhD thesis develops post-hoc‌ methods to identify the‌‌ coordinates involved in a detected change and to‌ evaluate the quality of‌ this detection.

4.9 Education‌‌ sciences

In collaboration with the EST laboratory at‌ Université Paris-Saclay, the Celeste‌ team conducts educational science‌‌ research focusing on the adaptation of first-year university‌ students to higher education.‌ The team investigates learning‌‌ and adaptation processes by analyzing highly heterogeneous data,‌ such as questionnaire responses‌ and verbatim texts. These‌‌ data's underlying latent structures are not directly observable.‌ Methodologically, the research relies‌ on statistical and machine‌‌ learning approaches to uncover these latent structures. These‌ approaches combine factor analysis,‌ unsupervised clustering methods, and‌‌ large language models for semantic representation and analysis.‌ Thus, this research contributes‌ to a data-driven, structure-aware‌‌ understanding of student success and teaching practices.

4.10‌ Ancient materials

Celeste collaborates‌ with CNRS-IPANEMA (Ancient Materials‌‌ Research Platform). The goal is to propose a‌ new image segmentation method‌ based on a dissimilarity‌‌ which is particularly well adapted to XRF images.‌ This will allow less‌ exposure to radiation, which‌‌ is important when dealing with antiques.

5 Social‌ and environmental responsibility

5.1‌ Footprint of research activities‌‌

The carbon emissions of Celeste team members related‌ to their jobs were‌ very low and came‌‌ essentially from:

limited levels of transport to and‌ from work, and a‌ small amount for essentially‌‌ land travel to conferences in France and Europe.‌
electronic communication (email, Google‌ searches, Zoom meetings, online‌‌ seminars, LLM requests, etc.).
the carbon emissions embedded‌ in their personal computing‌ devices (construction), either laptops‌‌ or desktops.
electricity for personal computing devices and‌ for the workplace, plus‌ also water, heating, and‌‌ maintenance for the latter. Note that only 7.1%‌ (2018) of France's electricity‌ is not sourced from‌‌ nuclear energy or renewables so team member carbon‌ emissions related to electricity‌ are minimal.

In terms‌‌ of magnitude, the largest per capita ongoing emissions‌ (excluding flying) are likely‌ simply to be those‌‌ from buying computers that have a carbon footprint‌ from their construction, in‌ the range of 100‌‌ kg Co2-e each. In contrast, typical email use‌ per year is around‌ 10 kg Co2-e per‌‌ person, and a Zoom call comes to around‌ 10g Co2-e per hour‌ per person, while web‌‌ browsing uses around 100g Co2-e per hour. Consequently,‌ 2025 was a low‌ carbon year for the‌‌ Celeste team.

The approximate (rounded for simplicity) kg‌ Co2-e values cited above‌ come from the book,‌‌ “How Bad are Bananas”‌ by Mike Berners-Lee (2020) which estimates carbon emissions‌ in everyday life.

5.2 Impact of research results‌

In addition to the long-term impact of our‌ theoretical work—which is of course impossible to assess‌ immediately—we are involved in several applied research projects‌ which aim to have short/mid-term positive impacts on‌ society.

First, the broad use of artificial intelligence/machine‌ learning/statistics nowadays comes with several major ethical issues,‌ one being to avoid making unfair or discriminatory‌ decisions. Our theoretical work on algorithmic fairness has‌ already led to several “fair” algorithms that could‌ be widely used in the short term (one‌ of them is already used for enforcing fair‌ decision-making in student admissions at the University of‌ Genoa).

Second, Patrick Pamphile's collaboration with the EST‌ laboratory led him to join the SYREP (Synergie‌ Réussite Étudiante et Pédagogie) working group at Université‌ Paris-Saclay. There, research insights contribute to institutional strategies‌ aimed at improving student success and informing teaching‌ practices (see Section 4.9).

Third, we expect‌ short-term positive impact on society from our direct‌ collaborations with companies such as EDF (forecasting and‌ control of electricity load consumption for electric vehicles),‌ Metafora (early detection of cancers), and SNCF (better‌ forecasting the numbers of passengers in each coach‌ so as to guide boarding passengers to the‌ coaches with most space available).

Last, we collaborate‌ with biologists on neglected tropical diseases; encephalitis in‌ particular, with implications in global health strategies.

6‌ Highlights of the year

6.1 Awards

Margaux Zaffran‌ (postdoctoral researcher) received the following distinctions:
- Jacques Neveu‌ PhD Thesis Prize 2024 (awarded in 2025 for‌ a thesis defended in 2024),
- PhD Thesis Prize‌ in Mathematics, Industry, and Society 2025,
- Paul Caseau‌ PhD Thesis Prize 2025.

6.2 Grants

The Géné-Pi‌ project (PI: Claire Boyer; co-PIs: Gérard Biau, Francis‌ Bach, and Pierre Marion) was awarded PEPR-IA funding‌ for the amount of 850,000 euros.

6.3 Selected‌ publications

New computational barrier for stochastic block models‌ (SBM) with many communities34. Cavity method‌ from statistical physics predicts that community recovery in‌ SBM is possible in polynomial time only above‌ the KS threshold. In collaboration with A. Carpentier‌ (Postdam University) and N. Verzelen (INRAE-Montpellier), C. Giraud‌ has proven that this prediction breaks down in‌ the many communities regime. We have shown that‌ community recovery is possible below the Kesten-Stigum (KS)‌ threshold by counting some specific blow-up motifs. In‌ particular, the non-backtracking counts originating from message passing‌ and Bethe free energy are sub-optimal in this‌ case. By developing a new technique for proving‌ low-degree lower bounds, we have also identified this‌ new computational barrier for community recovery in SBM‌ with many communities.

7 Latest software developments, platforms,‌ open data

7.1 Latest software developments

7.1.1 acanag‌

Keyword:
Anomaly detection
Functional Description:
La bibliothèque Python‌ acanag ou Active Anomaly Detection apprend à détecter‌ les anomalies dans les données multidimensionnelles de type‌ bags, lots, ou séries temporelles.
Contact:
Kevin Bleakley‌
Partners:
CNRS, IRT SystemX, IRT Saint Exupéry

7.1.2 sanssouci

Keyword:
Multiple testing‌
Functional Description:
In a‌ multiple testing context, sanssouci‌‌ provides statistical guarantees on possibly user-defined and/or data-driven‌ sets of hypotheses. Typical‌ use cases include differential‌‌ gene expression studies in genomics and fMRI studies‌ in neuroimaging. New contributions‌ include overall optimization and‌‌ documentation improvements, and, above all, the implementation of‌ the new algorithms described‌ in 11.
Contact:‌‌
Guillermo Durand
Partner:
Pierre Neuvial (CNRS, Université de‌ Toulouse)

7.1.3 KCPD

Name:‌
Kernel Change Point Detection‌‌
Keyword:
Change-point detection
Functional Description:
The library is‌ based on the kernel‌ change point detection methods‌‌ described in Sylvain Arlot and co-authors (2012,2017).
URL:‌
https://github.com/etaia/kernel-change-point-detection
Contact:
Kevin Bleakley‌
Partner:
IRT SystemX

7.2‌‌ Open data

8 New results

8.1 Uncertainty Quantification‌ and Conformal Prediction

8.1.1‌ Conformal prediction for hierarchical‌‌ data

Participants: Guillaume Principato, Gilles Stoltz,‌ Jean-Michel Poggi.

In‌ collaboration with colleagues from‌‌ EDF (Yvenn Amara-Ouali, Yannig Goude, and Bachir Hamrouche)‌ we study in 45‌ conformal prediction for multivariate‌‌ data, and more precisely, focus on hierarchical data,‌ where some components are‌ linear combinations of others.‌‌ Intuitively, the hierarchical structure can be leveraged to‌ reduce the size of‌ prediction regions for the‌‌ same coverage level. We implement this intuition by‌ including a projection step‌ (also called a reconciliation‌‌ step) in the split conformal prediction (SCP) procedure,‌ and prove that the‌ resulting prediction regions are‌‌ indeed globally smaller. We do so both under‌ the classic goal of‌ joint coverage, and under‌‌ a new and challenging task: component-wise coverage, for‌ which efficiency results are‌ more difficult to obtain.‌‌ The associated strategies and their analyses are based‌ both on the literature‌ of SCP and of‌‌ forecast reconciliation, which we connect. We also illustrate‌ the theoretical findings, for‌ different scales of hierarchies,‌‌ on simulated data.

8.1.2 Optimal transport-based conformal prediction‌

Participants: Claire Boyer.‌

This joint work 25‌‌ with Gauthier Thurin (ENS Paris) and Kimia Nadjahi‌ (ENS Paris) proposes a‌ novel conformal prediction framework‌‌ for multivariate outputs based on optimal transport. By‌ leveraging Monge–Kantorovich vector ranks‌ and quantiles, the method‌‌ constructs flexible, potentially non-convex prediction regions that better‌ capture the geometry of‌ complex uncertainty patterns, while‌‌ retaining finite-sample, distribution-free coverage guarantees.

8.2 Learning with‌ Structure, Constraints, and Dynamics‌

8.2.1 Modeling of passenger‌‌ movements in trains with communicating coaches

Participants: Christine‌ Keribin, Gilles Stoltz‌.

In collaboration with‌‌ colleagues from SNCF, namely, Mélissa Baietto and Rémi‌ Coulaud, we model in‌ 6 passenger movements within‌‌ communicating coaches equipped with infra-red sensors at each‌ door, counting the numbers‌ of passengers boarding and‌‌ alighting at that door. The business objective is‌ to better estimate the‌ real occupancy rate of‌‌ each coach instead of solely using boarding counts‌ and discarding passenger movements.‌ To do so, we‌‌ propose modelings based on stochastic transition matrices that‌ are specific to each‌ station in the most‌‌ complex modeling. The latter, called local modeling, also‌ has to estimate alighting‌ counts, which it does‌‌ through data-based alighting rates‌ rather than with origin-destination matrices. This piece of‌ the methodology is of independent interest. The local‌ modeling may actually be seen as a neural‌ network (a recurrent neural network with a many-to-many‌ architecture featuring one hidden layer). All modelings are‌ fit through least-squares minimizations. We evaluate them both‌ qualitatively and quantitatively, on data from line H‌ of the suburban railway network of the Greater‌ Paris area. The qualitative evaluation consists of successfully‌ interpreting the outcomes of the models (transition matrices,‌ alighting rates) based on the geographies of the‌ platforms of the boarding or alighting stations. The‌ quantitative evaluation consists of using the models constructed‌ to forecast alighting counts: modeling passenger movements improves‌ the forecasting performance by about at least $15‌ %$ compared to ignoring the existence of such‌ movements. All in all, this study backs up‌ upgrading the passenger-movement modeling layer in the real-time‌ crowding information deployed in the greater-Paris area from‌ the global modeling currently used to local modeling.‌

8.2.2 Forecasting time series with constraints

Participants: Claire‌ Boyer.

The collaborative work 36 with colleagues‌ from EDF proposes a unified framework for time‌ series forecasting that systematically integrates linear constraints into‌ learning algorithms. The framework encompasses and combines existing‌ approaches such as generalized additive models and hierarchical‌ forecasting, and shows that the exact minimizer of‌ the constrained empirical risk can be computed efficiently‌ using only linear algebra operations. This formulation enables‌ highly scalable implementations optimized for GPU architectures. Extensive‌ empirical evaluations on real-world applications, including electricity demand‌ and tourism forecasting, demonstrate state-of-the-art performance of the‌ proposed approach.

8.2.3 Randomized multi-class classification under system‌ constraints: a unified approach via post-processing

Participants: Evgenii‌ Chzhen, Gayane Taturyan.

In collaboration with‌ M. Hebiri, in 35 we study the problem‌ of multi-class classification under system-level constraints expressible as‌ linear functionals over randomized classifiers. We propose a‌ post-processing approach that adjusts a given base classifier‌ to satisfy general constraints without retraining. Our method‌ formulates the problem as a linearly constrained stochastic‌ program over randomized classifiers, and leverages entropic regularization‌ and dual optimization techniques to construct a feasible‌ solution. We provide finite-sample guarantees for the risk‌ and constraint satisfaction for the final output of‌ our algorithm under minimal assumptions. The framework accommodates‌ a broad class of constraints, including fairness, abstention,‌ and churn requirements.

8.2.4 Prediction-aware learning in multi-agent‌ systems

Participants: Etienne Boursier.

The work 19‌ proposes a prediction-aware learning framework for uncoupled online‌ learning in time-varying multiplayer games, where agents exploit‌ forecasts of future payoffs to adapt their strategies.‌ While classical regret guarantees degrade rapidly in dynamic‌ environments, this approach explicitly incorporates prediction to obtain‌ tighter performance bounds when payoff variations are predictable.‌ We introduce POMWU, a contextual extension of the‌ Optimistic Multiplicative Weight Update algorithm, and show that,‌ under bounded prediction errors, it achieves convergence and‌ social welfare guarantees comparable to those in static‌ games, up to terms depending on the prediction quality.

8.2.5 Detecting rare‌ anomalies in multidimensional data‌ using active and supervised‌‌ learning

Participants: Kevin Bleakley.

Detecting rare anomalies‌ in batches of multidimensional‌ data is challenging. We‌‌ have proposed an original supervised active-learning framework 7‌ that sends a small‌ number of data points‌‌ from each batch to an expert for labeling‌ as ‘anomaly’ or ‘nominal’‌ via two mechanisms: (i)‌‌ points most likely to be anomalies in the‌ eyes of a supervised‌ classifier trained on previously-labeled‌‌ data; and (ii) points suggested by an active‌ learner. Instead of training‌ the supervised classifier directly‌‌ on currently-labeled raw data, we treat the scores‌ calculated by an ensemble‌ of $M$ user-defined unsupervised‌‌ anomaly detectors as if they were the learner’s‌ input features. Our approach‌ generalizes earlier attempts to‌‌ linearly aggregate unsupervised anomaly detector scores, and broadens‌ the scope of these‌ methods from unordered bags‌‌ of data to ordered data such as time‌ series. Simulated and real‌ data trials suggest that‌‌ this method usually outperforms—often significantly—linear strategies. The Python‌ library acanag implements our‌ proposed method. This 2025‌‌ work, in collaboration with Benjamin Auder (LMO Orsay),‌ Martin Royer (IRT System‌ X), and Mouhcine Mendhil‌‌ (IRT Saint Exupéry), was subsequently published early 2026‌ in TMLR.

8.2.6 Physics-informed‌ kernel learning

Participants: Claire‌‌ Boyer.

The article 37 introduces physics-informed kernel‌ learning (PIKL), a principled‌ alternative to physics-informed neural‌‌ networks that integrates physical priors through a kernel-based‌ formulation. By approximating the‌ underlying kernel using Fourier‌‌ methods, the authors derive a tractable estimator that‌ minimizes a physics-informed risk‌ combining data fidelity and‌‌ PDE constraints. The framework comes with theoretical guarantees‌ that quantify the impact‌ of the physical prior‌‌ on convergence rates. Numerical experiments demonstrate that PIKL‌ can outperform physics-informed neural‌ networks in both accuracy‌‌ and computational efficiency, and in some settings even‌ surpass classical PDE solvers,‌ particularly in the presence‌‌ of noisy boundary conditions. This is a joint‌ work with Nathan Doumèche‌ (EDF & Sorbonne Université),‌‌ Francis Bach (Inria), and Gérard Biau (Sorbonne Université),‌ accepted for publication in‌ JMLR in 2025.

8.2.7‌‌ Fast kernel methods: Sobolev, physics-informed, and additive models‌

Participants: Claire Boyer.‌

The work 37 addresses‌‌ the scalability limitations of kernel methods by introducing‌ a GPU-accelerated framework for‌ kernel regression with $O‌‌ (n log n)$ computational complexity. Leveraging‌ Fourier representations of kernels‌ together with non-uniform fast‌‌ Fourier transforms (NUFFT), the proposed approach enables exact,‌ fast, and memory-efficient computations‌ at scale. The framework‌‌ is instantiated for Sobolev kernel regression, physics-informed regression,‌ and additive models, and‌ the resulting estimators are‌‌ shown—when applicable—to achieve minimax convergence rates consistent with‌ classical kernel theory. Extensive‌ experiments demonstrate the ability‌‌ to process datasets with tens of billions of‌ samples within minutes, combining‌ strong statistical guarantees with‌‌ unprecedented computational scalability.

8.3 Optimization, Learning Dynamics, and‌ Reinforcement Learning

8.3.1 Early‌ alignment in two-layer networks‌‌ training is a two-edged sword

Participants: Etienne Boursier‌.

The work 8‌ characterizes the early-stage optimization‌‌ dynamics of two-layer neural‌ networks with (leaky) ReLU activations. In a general‌ setting, it provides a precise description and quantitative‌ analysis of an early alignment phase, during which‌ neurons align with a small number of key‌ directions determined by the critical points of a‌ data-dependent function. Throughout this phase, the learned function‌ remains close to zero, while the representation becomes‌ increasingly sparse. This sparsification is typically preserved throughout‌ training and ultimately yields a final estimator that‌ is effectively equivalent to a much smaller network.‌ Building on this alignment phenomenon, we also present‌ an example with three data points showing that,‌ in the small-initialization regime, arbitrarily large overparameterized networks‌ may fail to interpolate the data. This result‌ highlights that the seminal convergence guarantees for infinitely‌ wide networks critically depend on the smoothness of‌ the activation function and do not extend to‌ networks with ReLU activations.

8.3.2 Simplicity bias and‌ optimization threshold in two-layer ReLU networks

Participants: Etienne‌ Boursier.

Building on the early alignment characterization‌ of 8, the work 17 shows that,‌ when sufficient data are available, trained two-layer ReLU‌ networks often converge to simpler solutions that do‌ not fully interpolate the training data yet generalize‌ better. In particular, for a specific linear data‌ model, we show that the trained network converges‌ to a solution that closely matches the least-squares‌ linear estimator, and is therefore optimal on unseen‌ data. This simple example illustrates the transition from‌ memorization to generalization—an effect observed in in-context learning‌ and diffusion model training—where, beyond a certain number‌ of training samples, the optimization dynamics fail to‌ reach an interpolating global minimum. Instead, they converge‌ to a spurious local minimum of the training‌ loss that nonetheless achieves minimal test error.

8.3.3‌ A theoretical framework for grokking: interpolation followed by‌ Riemannian norm minimisation

Participants: Etienne Boursier.

Grokking‌ is a training phenomenon characterized by two distinct‌ phases: an initial overfitting regime with near-zero training‌ loss and high test loss, followed—after a long‌ delay—by a generalization phase in which both training‌ and test losses become small. The work 18‌ provides a rigorous and general characterization of the‌ two-stage optimization dynamics underlying the grokking phenomenon. In‌ overparameterized settings, the critical points of the training‌ loss form manifolds. Under suitable smoothness assumptions, we‌ establish a two-stage convergence of the parameter trajectory‌ as the weight-decay parameter $λ \to 0$ .‌ During the first phase, the dynamics follow the‌ unregularized gradient flow, which may lead to poor‌ generalization, for instance in large-initialization regimes. In the‌ second phase, occurring on a time scale of‌ order $1 / λ$ , the trajectory converges‌ to a Riemannian flow that minimizes the parameter‌ norm over the critical manifold of the training‌ loss. This phase induces a decrease in parameter‌ norm while preserving training performance, a mechanism typically‌ associated with improved generalization and responsible for the‌ emergence of grokking.

8.3.4 Policy optimization via adversarial‌ learning on advantage functions

Participants: Chiara Mignacco, Gilles Stoltz.

In‌ collaboration with Matthieu Jonckheere‌ (LAAS–CNRS, Toulouse), We revisit‌‌ in 14 the reduction of learning in adversarial‌ Markov decision processes (MDPs)‌ to adversarial learning based‌‌ on Q-values; this reduction has been considered in‌ a number of recent‌ articles as one building‌‌ block to perform policy optimization. Namely, we first‌ consider and extend this‌ reduction in an ideal‌‌ setting where an oracle provides value functions: it‌ may involve any adversarial‌ learning strategy (not just‌‌ exponential weights) and it may be based indifferently‌ on Q-values or on‌ advantage functions. We then‌‌ present two extensions: first, convergence of the last‌ iterate for a vast‌ class of adversarial learning‌‌ strategies (again, not just exponential weights), satisfying a‌ property called monotonicity of‌ weights; and second, stronger‌‌ regret criteria for learning in MDPs, inherited from‌ the stronger regret criteria‌ of adversarial learning named‌‌ strongly adaptive regret and tracking regret. Then,‌ we demonstrate how adversarial‌ learning, also referred to‌‌ as aggregation of experts, relates to aggregation (orchestration)‌ of expert policies: we‌ obtain stronger forms of‌‌ performance guarantees in this setting than existing ones,‌ via yet another, simple‌ reduction. Finally, we discuss‌‌ the impact of the reduction of learning in‌ adversarial MDPs to adversarial‌ learning in practical scenarios‌‌ where transition kernels are unknown and value functions‌ must be learned. In‌ particular, we review the‌‌ literature and note that many strategies for policy‌ optimization feature a policy-improvement‌ step based on exponential‌‌ weights with estimated Q-values. Our main message is‌ that this step may‌ be replaced by the‌‌ application of any adversarial learning strategy on estimated‌ Q-values or on estimated‌ advantage functions.

The empirical‌‌ evaluation of this methodology, together with other twists,‌ is conducted in the‌ companion article 42.‌‌

8.4 Generative Models and Score-Based Methods

8.4.1 An‌ analysis of the noise‌ schedule for score-based generative‌‌ models

Participants: Claire Boyer.

In collaboration with‌ Stanislas Strasman (Sorbonne Université),‌ Antonio Ocello (ENSAE), Sylvain‌‌ Le Corff (Sorbonne Université), and Vincent Lemaire (Sorbonne‌ Université), the article 15‌ provides a theoretical analysis‌‌ of score-based generative models, deriving explicit upper bounds‌ on the Kullback–Leibler divergence‌ between the target and‌‌ learned distributions that depend on the noise schedule.‌ Under additional regularity assumptions,‌ we obtain improved Wasserstein‌‌ error bounds by exploiting contraction properties of the‌ underlying dynamics. These results‌ yield practical insights into‌‌ the choice of training hyperparameters, notably the noise‌ schedule, and are illustrated‌ through numerical experiments on‌‌ synthetic data and CIFAR-10, highlighting an optimal regime‌ within a parametric family‌ of schedules.

8.4.2 Taking‌‌ a big step: large learning rates in denoising‌ score matching prevent memorization‌

Participants: Claire Boyer.‌‌

The conference proceedings 27, conducted in collaboration‌ with Yu-Han Wu (Google‌ DeepMind), Pierre Marion (Inria)‌‌ and Gérard Biau (Sorbonne Université), investigate the origin‌ of memorization in diffusion-based‌ generative models and explain‌‌ why this is often limited in practice despite‌ the absence of explicit‌ regularization. Focusing on denoising‌‌ score matching, we show‌ that the empirical optimal score is highly irregular‌ in the small-noise regime and leads to memorization‌ of the training data. We then identify an‌ implicit regularization mechanism induced by sufficiently large learning‌ rates in stochastic gradient descent, proving that such‌ training dynamics prevent stable convergence toward arbitrarily low-risk‌ local minima. As a result, the learned score‌ cannot closely match the empirical optimum, thereby mitigating‌ memorization. The theoretical analysis, conducted in a simplified‌ one-dimensional setting with two-layer neural networks, is supported‌ by numerical experiments demonstrating the central role of‌ the learning rate in controlling memorization effects.

8.4.3‌ Optimal stopping in latent diffusion models

Participants: Claire‌ Boyer.

The collaborative work 46 with researchers‌ from Google, Sorbonne Université and Inria, investigates an‌ unexpected phenomenon in latent diffusion models (LDMs), namely‌ that the final steps of the diffusion process‌ can deteriorate sample quality. Going beyond standard numerical‌ arguments for early stopping, the authors show that‌ this effect is intrinsic to the dimensionality reduction‌ inherent in LDMs. Within a Gaussian setting with‌ linear autoencoders, they provide a theoretical characterization of‌ the interplay between latent dimension and optimal stopping‌ time, demonstrating that lower-dimensional latent representations benefit from‌ earlier stopping, while higher-dimensional ones require later termination.‌ The analysis further reveals interactions between latent dimensionality‌ and other key hyperparameters, such as constraints in‌ score matching. These findings are supported by experiments‌ on both synthetic and real datasets, establishing early‌ stopping as a critical hyperparameter for controlling generative‌ quality in LDMs.

8.5 Computational Limits and Statistical–Computational‌ Gaps

8.5.1 Computational lower bounds in latent models:‌ clustering, sparse-clustering, biclustering

Participants: Bertrand Even, Christophe‌ Giraud.

In collaboration with Bertrand Even and‌ Nicolas Verzelen, we investigate in 39 computational lower‌ bounds in latent models. In many high-dimensional problems,‌ like sparse-PCA, planted clique, and clustering, the best‌ known algorithms with polynomial time complexity fail to‌ reach the statistical performance provably achievable by algorithms‌ free of computational constraints. This observation has given‌ rise to the conjecture of the existence, for‌ some problems, of gaps—so called statistical-computational gaps—between the‌ best possible statistical performance achievable without computational constraints,‌ and the best performance achievable with poly-time algorithms.‌ A powerful approach to assess the best performance‌ achievable in poly-time is to investigate the best‌ performance achievable by polynomials with low-degree. We build‌ on the seminal paper of Schramm and Wein‌ 52 and propose a new scheme to derive‌ lower bounds on the performance of low-degree polynomials‌ in some latent space models. By better leveraging‌ the latent structures, we obtain new and sharper‌ results, with simplified proofs. We then instantiate our‌ scheme to provide computational lower bounds for the‌ problems of clustering, sparse clustering, and biclustering. We‌ also prove matching upper-bounds and some additional statistical‌ results, in order to provide a comprehensive description‌ of the statistical-computational gaps occurring in these three‌ problems.

8.5.2 Computational barriers for permutation-based problems, and‌ cumulants of weakly dependent random variables

Participants: Bertrand Even, Christophe Giraud‌.

In collaboration with‌ Bertrand Even and Nicolas‌‌ Verzelen, we investigate in 38 computational barriers for‌ permutation-based problems. In many‌ high-dimensional problems, polynomial-time algorithms‌‌ fall short of achieving the statistical limits attainable‌ without computational constraints. A‌ powerful approach to probe‌‌ the limits of polynomial-time algorithms is to study‌ the performance of low-degree‌ polynomials. Low-degree lower bounds‌‌ are tightly related to multivariate cumulants. Prior works‌ leverage independence among latent‌ variables to bound cumulants.‌‌ However, such approaches break down for problems with‌ latent structure lacking independence,‌ such as those involving‌‌ random permutations. To address this important restriction, we‌ develop a technique to‌ upper-bound cumulants under weak‌‌ dependencies—such as those arising from sampling without replacement‌ or random permutations. To‌ showcase the effectiveness of‌‌ our approach, we uncover evidence of statistical–computational gaps‌ in multiple feature matching‌ and in seriation problems.‌‌

8.5.3 Low-degree lower bounds via almost orthonormal bases‌

Participants: Simone Maria Giancola‌, Christophe Giraud.‌‌

In collaboration with Alexandra Carpentier, and Nicolas Verzelen,‌ S.M. Giancola and C.‌ Giraud investigate in 33‌‌ low-degree lower bounds via almost orthonormal bases. Low-degree‌ polynomials have emerged as‌ a powerful paradigm for‌‌ providing evidence of statistical-computational gaps across a variety‌ of high-dimensional statistical models.‌ For detection problems—where the‌‌ goal is to test a planted distribution ${ℙ‌}^{'}$ against a null‌ distribution $ℙ$ with independent‌‌ components—the standard approach is to bound the advantage‌ using an $L^{2‌} (ℙ)$ -orthonormal‌‌ family of polynomials. However, this method breaks down‌ for estimation tasks or‌ more complex testing problems‌‌ where $ℙ$ has some planted structure, so that‌ no simple $L^{2‌} (ℙ)$ -orthogonal‌‌ polynomial family is available. To address this challenge,‌ several technical workarounds have‌ been proposed, though their‌‌ implementation can be tricky.

In this work, we‌ propose a more direct‌ proof strategy. Focusing on‌‌ random graph models, we construct a basis of‌ polynomials that is almost‌ orthonormal under $ℙ$ ,‌‌ in precisely those regimes where statistical-computational gaps arise.‌ This almost orthonormal basis‌ not only yields a‌‌ direct route to establishing low-degree lower bounds, but‌ also allows us to‌ explicitly identify the polynomials‌‌ that optimize the low-degree criterion. This, in turn,‌ provides insights into the‌ design of optimal polynomial-time‌‌ algorithms. We illustrate the effectiveness of our approach‌ by recovering known low-degree‌ lower bounds, and establishing‌‌ new ones for problems such as hidden subcliques,‌ stochastic block models, and‌ seriation models.

8.5.4 Phase‌‌ transitions for stochastic block models with more than‌ sqrt(n) communities

Participants: Christophe‌ Giraud.

In‌‌ collaboration with Alexandra Carpentier, and Nicolas Verzelen, C.‌ Giraud investigated in 34‌ the problem of community‌‌ recovery in stochastic block models (SBM) with many‌ communities. Predictions from statistical‌ physics postulate that recovery‌‌ of the communities in SBM is possible in‌ polynomial time above, and‌ only above, the Kesten-Stigum‌‌ (KS) threshold. This conjecture has given rise to‌ a rich literature, proving‌ that non-trivial community recovery‌‌ is indeed possible in‌ SBM above the KS threshold. Failure of low-degree‌ polynomials below the KS threshold was also proven,‌ as long as the number $K$ of communities‌ remains smaller than $\sqrt{n}$ , where $n$ is‌ the number of nodes in the observed graph.‌

In this work, we postulate a new threshold‌ below the KS threshold for $K \geq \sqrt{n‌}$ , and we prove that:

1.
For any‌ graph density, low-degree polynomials fail to recover‌ communities below the postulated threshold.
2.
Community recovery‌ is possible in polynomial time above the postulated‌ threshold, essentially by counting occurrences of some specific‌ motifs, based on the blow-up of a cycle.‌

In particular, counting self-avoiding paths of length $log‌ (n)$ —which is closely related to‌ spectral algorithms based on the non-backtracking operator—is optimal‌ only in the sparse regime. Other motif counts—unrelated‌ to spectral properties—must be considered in denser regimes.‌

8.6 Attention Mechanisms

8.6.1 Attention layers provably solve‌ single-location regression

Participants: Claire Boyer.

The conference‌ proceedings 20, conducted in collaboration with Pierre‌ Marion (Inria), Gérard Biau (Sorbonne Université,) and Raphaël‌ Berthier (Inria), contributes to the theoretical understanding of‌ attention-based models by analyzing their ability to recover‌ sparse, token-level information and internal linear representations. We‌ introduce the single-location regression task, in which only‌ one token at a random and unknown location‌ in a sequence determines the output. We propose‌ a dedicated predictor, interpretable as a simplified non-linear‌ self-attention mechanism, and establish its asymptotic Bayes optimality.‌ Despite the non-convexity of the training problem, the‌ analysis shows that the model successfully learns the‌ underlying structure, providing theoretical insight into the effectiveness‌ of attention mechanisms in settings with sparse and‌ structured token dependencies.

8.6.2 Attention-based clustering

Participants: Claire‌ Boyer.

In collaboration with Rodrigo Maulen (Sorbonne‌ Université) and Pierre Marion (Inria), the conference proceedings‌ 22 provides a theoretical analysis of the ability‌ of transformer architectures to uncover latent structure in‌ data in an unsupervised manner. Focusing on data‌ generated from Gaussian mixture models, the authors show‌ that a simplified two-head attention layer can effectively‌ perform clustering: by minimizing a suitably defined population‌ risk using unlabeled data, the attention head parameters‌ provably align with the true mixture centroids. The‌ study further demonstrates that even an attention layer‌ with fixed key, query, and value matrices set‌ to the identity—thus involving no trainable parameters—can perform‌ in-context quantization. These results highlight the intrinsic capacity‌ of attention mechanisms to adapt to input-dependent distributions‌ and to capture underlying structural properties of the‌ data.

8.6.3 Statistical advantage of softmax attention: insights‌ from single location regression

Participants: Claire Boyer.‌

The work 28, conducted in collaboration with‌ colleagues from Inria Paris, ENS and EPFL, provides‌ a theoretical investigation of attention mechanisms in large‌ language models, with a particular focus on understanding‌ the role of the softmax activation. Through the‌ study of a single-location regression task, the authors‌ analyze attention-based predictors in a high-dimensional regime using tools from statistical physics.‌ They show that, at‌ the population level, softmax‌‌ attention achieves the Bayes-optimal risk, whereas linear attention‌ is intrinsically suboptimal. The‌ analysis further identifies key‌‌ properties of activation functions required for optimal performance.‌ In the finite-sample regime,‌ the authors derive an‌‌ asymptotic characterization of the test error, demonstrating that‌ although softmax is no‌ longer strictly Bayes-optimal, it‌‌ consistently outperforms linear attention. These results shed light‌ on the fundamental advantages‌ of softmax attention and‌‌ its connection to gradient-based optimization dynamics.

8.7 Robust‌ Statistical Inference, Multiple Testing,‌ Reliability, and Model Selection.‌‌

8.7.1 Fast confidence bounds for the false discovery‌ proportion over a path‌ of hypotheses

Participants: Guillermo‌‌ Durand.

In the work 11, in‌ a multiple testing context,‌ we present a new‌‌ algorithm (and an additional trick) that allows one‌ to quickly compute an‌ entire curve of confidence‌‌ bounds for the false discovery proportion when the‌ underlying bound $V_{ℜ‌}^{*}$ construction is based‌‌ on a reference family $ℜ$ with a forest‌ structure like in 50‌. By an entire‌‌ curve, we mean the values $V_{ℜ}^{*‌} (S_{1})‌, \dots, {V‌‌}_{ℜ}^{*} ({S}_{m})$ computed on‌ a path of increasing‌ selection sets $S_{1‌‌} ⊊ \dots ⊊ {S}_{m}$ , $| {S‌}_{t} | = t‌$ . The new algorithm‌‌ leverages the fact that going from $S_{t‌}$ to $S_{t +‌ 1}$ is done by‌‌ adding only one hypothesis. Compared to a more‌ naive approach, the new‌ algorithm has a complexity‌‌ in $𝒪 (| 𝒦 | m)‌$ instead of $O (‌ | 𝒦 |^{2‌‌})$ , where $| 𝒦 |$ is the‌ cardinality of the family.‌

8.7.2 Robust estimation and‌‌ outlier detection for stochastic block models

Participants: Leonardo‌ Martins Bianco, Christine‌ Keribin.

In this‌‌ joint work with Zacharie Naulet (INRAE-MaIAGE), we study‌ robust estimation of graph‌ clustering 21. We‌‌ first prove a bound for the estimation error‌ of stochastic block model‌ (SBM) parameters which generalizes‌‌ the bound appearing in Acharya et al. 49‌ for Erdös-Renyi graphs to‌ the case of graphs‌‌ with multiple communities. Interpreting this bound, we then‌ propose SubSearch, an‌ algorithm for robustly estimating‌‌ SBM parameters by exploring the space of subgraphs‌ in search of one‌ that closely aligns with‌‌ the model's assumptions. Our approach also functions as‌ an outlier detection method,‌ identifying nodes responsible for‌‌ the graph's deviation from the model and going‌ beyond simple techniques like‌ pruning high-degree nodes. Extensive‌‌ experiments on both synthetic and real-world datasets demonstrate‌ the effectiveness of our‌ method.

8.7.3 Pink discoloration‌‌ defects associated with microbial structure and metabolome changes‌ in commercial bloomy cheeses‌

Participants: Christine Keribin.‌‌

In this joint work 13 with F. Irlinger,‌ S. Helinck and A.-S.‌ Sarthou (Paris-Saclay Food and‌‌ Bioproduct Engineering) and B. Laroche (INRAE MaIAGE), we‌ investigate pink discoloration defects‌ in French bloomy rind‌‌ soft cheeses, which can‌ negatively affect product appearance and lead to economic‌ losses. Two batches of cheese from the same‌ processing plant were analyzed: one with pink defects‌ and one without, allowing for comparative analysis. A‌ multi-omics approach was applied combining microbial profiling (16S‌ rRNA and ITS2 sequencing) and metabolomics (GC–MS and‌ LC-MS) to identify the factors linked to the‌ defect. We performed a Gaussian latent block model‌ (LBM) co-clustering (Nadif and Govaert 51) in‌ order to detect associations between groups of OTUs‌ and groups of metabolites. Based on the LBM‌ results, a Multiblock sPLS-DA analysis was run to‌ determine if the observed associations were also related‌ to the spoilage status. We found interesting correlations‌ and notably with P. gangotriensis that had never‌ previously been detected in cheese. Its role was‌ tested with a dedicated inoculation experiment. The results‌ strongly suggest that P. gangotriensis is responsible for‌ the pink defect.

8.7.4 Model selection

Participants: Pascal‌ Massart, Vincent Rivoirard.

In this joint‌ work 40 with Claire Lacour, we addressed the‌ problem of model selection in the sequence model‌ $Y = θ + ξ$ , when $ξ‌$ is sub-Gaussian, for non-Euclidian loss functions. In this‌ model, the penalized comparison to overfitting procedure was‌ studied for the ${ℓ}_{p}$ -loss, $p \geq‌ 1 .$ Several oracle inequalities were derived from‌ concentration inequalities for sub-Weibull variables. Using judicious collections‌ of models and penalty terms, minimax rates of‌ convergence were stated for Besov bodies $ℬ_{r‌, \infty}^{s}$ . These results were applied‌ to the functional model of nonparametric regression.

8.7.5‌ Concentration inequalities and cut-off phenomena for penalized model‌ selection within a basic Rademacher framework

Participants: Pascal‌ Massart, Vincent Rivoirard.

The work 41‌ was conceived as a tribute to Patrick Cattiaux.‌ One of the authors has known Patrick Cattiaux‌ for many years and is deeply indebted to‌ him. If one wished to illustrate the adage‌ that life is shaped by chance encounters, what‌ better example could there be than the meeting,‌ in the 1980s, of two young people who‌ both fell in love with the mathematics of‌ randomness—one of whom profoundly changed the other’s life‌ by sharing a simple but decisive secret: if‌ you truly believe in it, a passion can‌ become a profession. By another fortunate coincidence, this‌ tribute appeared at a particularly fitting moment, as‌ Michel Talagrand has just been awarded the Abel‌ Prize. The temptation to pay a double homage‌ was therefore irresistible. Following one of the many‌ paths opened by mathematics, we first established a‌ connection between the work of Patrick Cattiaux and‌ that of Michel Talagrand. We then showed how‌ the abstract probabilistic tools related to the concentration‌ of product measures, revisited in this light, can‌ be used to illuminate cut-off phenomena in our‌ own field of expertise, namely mathematical statistics. There‌ is nothing revolutionary here: the influence of Talagrand’s‌ work on the development of mathematical statistics since the late 1990s is‌ well known. Our contribution‌ rather lies in the‌‌ choice of a very simple framework, allowing the‌ ideas to be presented‌ with minimal technicalities and‌‌ letting the main concepts stand out clearly.

9‌ Bilateral contracts and grants‌ with industry

Participants: Christine‌‌ Keribin, Jean-Michel Poggi, Gilles Stoltz,‌ Claire Boyer.

9.1‌ Bilateral contracts with industry‌‌

C. Keribin: Ongoing Cifre PhD contract with Metafora‌ (30 kE) on machine‌ learning in flow cytometry‌‌ for early detection of cancers started in March‌ 2023.
C. Keribin: Ongoing‌ Cifre PhD contract with‌‌ SNCF (54 kE to be equally shared between‌ LMO and UGE/Grettia) on‌ modeling/forecasting/managing passenger positioning on‌‌ platforms and on-board trains in densely populated areas,‌ started in January 2025.‌
J.M. Poggi: Analysis and‌‌ modelling of NO2 numerical model biases for data‌ fusion of heterogeneous measurement‌ networks, ATMO NORMANDIE, 20‌‌ kE; started in December 2022, ended in December‌ 2025.
J.M. Poggi, G.‌ Stoltz: Participation in the‌‌ EDF-Inria Grand défi, with in particular a CIFRE‌ PhD started in December‌ 2023 and a Postdoc‌‌ that started in February 2025.
G. Stoltz: CIFRE‌ PhD contract with EDF‌ (for 55 kE), on‌‌ reinforcement learning for optimizing the production of nuclear‌ plants; started in autumn‌ 2025
C. Boyer: PhD‌‌ contract with Google DeepMind on diffusion-based generative models;‌ started in January 2025.‌

10 Partnerships and cooperations‌‌

10.1 International research visitors

10.1.1 Visits to international‌ teams

Research stays abroad‌

Claire Boyer

Visited institution:‌‌ IPAM, UCLA
Country: USA
Dates: March-April 2025
Context‌ of the visit: Thematic‌ program on optimal transport‌‌
Mobility program/type of mobility: Research stay

Claire Boyer‌

Visited institution: CRM, Montreal‌
Country: Canada
Dates: May-June‌‌ 2025
Context of the visit: Spring school and‌ thematic program on mathematics‌ of data science
Mobility‌‌ program/type of mobility: Research stay

10.2 National initiatives‌

Participants: Sylvain Arlot,‌ Evgenii Chzhen, Christophe‌‌ Giraud, Gilles Stoltz.

10.2.1 ANR

Sylvain‌ Arlot, Evgenii Chzhen, Luca‌ Ganssali, Christophe Giraud and‌‌ Gilles Stoltz are part of the PEPR-IA grant‌ CAUSALI-T-AI (CAUSALIty Teams up‌ with Artificial Intelligence), which‌‌ is led by Marianne Clausel (Univ. de Lorraine),‌ during the period 2023-2028.‌

Sylvain Arlot, Christophe Giraud‌‌ and Guillermo Durand are part of the ANR‌ Chair-IA grant Biscotte,‌ which is led by‌‌ Gilles Blanchard (Université Paris Saclay), for the period‌ 2019-2026.

Guillermo Durand is‌ part of the ANR‌‌ BACKUP: BAyesian nonparametrics, Complex models and Kernels,‌ Uncertainty quantification and deeP‌ methods, with Sorbonne Université‌‌ and Université de Toulouse. Period: 2023-2028. See here‌.

Christophe Giraud and‌ Guillermo Durand are part‌‌ of ANR ASCAI: Active and batch segmentation,‌ clustering, and seriation: toward‌ unified foundations in AI,‌‌ with Potsdam University, Munich University, Montpellier INRAE (Period‌ 2022-2026). See here.‌

11 Dissemination

11.1 Promoting‌‌ scientific activities

11.1.1 Scientific events: organisation

General chair,‌ scientific chair

J.-M. Poggi‌ is Past-President of ENBIS‌‌ (European Network for Business and Industrial Statistics)
C.‌ Keribin is Vice-President of‌ the French Statistical Society‌‌ (SFdS); member of the‌ board of MALIA, SFdS specialized group in Machine‌ Learning and AI.
V. Rivoirard is a member‌ of the Scientific Council of CIRM.

Member of‌ the organizing committees

S. Arlot is member of‌ the scientific committee of the Séminaire Palaisien
S.‌ Arlot, E. Chzhen, C. Keribin, V. Rivoirard are‌ part of the organizing committee of the Celeste‌ conference to be held in 2026 in CIRM‌
A. Janon is co-organizer the of UQSay seminar‌
E. Chzhen is co-organizer of the DATAIA seminar‌
J.-M. Poggi was chair of the ENBIS Nominations‌ Committee 2025
J.-M. Poggi was organizer of the‌ ECAS-SFdS course 2025: Towards Reliable Machine Learning: Transfer‌ & Physics Informed Learning, and Conformal Prediction, Fréjus,‌ France, December 1-5, 2025
J.-M. Poggi was organizer‌ of the ECAS-ENBIS course: Statistical Process Monitoring of‌ Functional Data, Piraeus, Greece, September 14, 2025
C.‌ Keribin was co-organizer of the AI4Maths workshop (November‌ 18, 2025)
C. Keribin was co-organizer of the‌ workshop Learning (and) statistics with Talagrand (January 7-9,‌ 2026)
C. Giraud is co-organizer of the biennal‌ conference StatMathAppli in Frejus
C. Giraud is co-organizer‌ of the ASCAI final meeting in Orsay (June‌ 2025)
G. Durand and V. Rivoirard are co-organizers‌ of the Séminaire Parisien de Statistique

11.1.2 Scientific‌ events: selection

Member of the conference program committees‌

C. Giraud, Area chair for COLT since 2021‌
C. Boyer, Member of scientific committee of NeurIPS‌ in Paris 2025
J.-M. Poggi, Member of the‌ Scientific Program Committee of the ENBIS-25 Conference

11.1.3‌ Journal

Reviewer

We performed many reviews for various‌ international journals.

Member of the editorial boards

S.‌ Arlot: Associate editor for Annales de l'Institut Henri‌ Poincaré B – Probability and Statistics
C. Boyer:‌ Associate editor for Electronic Journal of Statistics
C.‌ Boyer: Associate editor for Information & Inference, Oxford‌
C. Boyer: Associate editor for Journal Of Royal‌ Statistical Society, series B (JRSS-B)
C. Giraud: Action‌ Editor for JMLR
C. Giraud: Associate Editor for‌ JEMS
C. Giraud: Associate Editor for ESAIM-proc
P.‌ Massart: Associate Editor for Panoramas et Synthèses (SMF),‌ Foundations and Trends in Machine Learning, and Confluentes‌ Mathematici
J.-M. Poggi: Associate Editor for Advances in‌ Data Analysis and Classification
J.-M. Poggi: Associate Editor‌ for JDSSV J. Data Science, Statistics and Visualization‌
J.-M. Poggi: Guest editor for the Springer-Nature Book‌ “Methodological and Applied Statistics and Demography” with the‌ selected papers of this Conference SIS 2024.
G.‌ Stoltz: Associate editor for Mathematics of Operations Research‌
C. Keribin: Member of the editorial board, Statistique‌ et Société (SFdS).
V. Rivoirard: Associate editor for‌ Annales de l’IHP (B), ALEA, Bernoulli‌ and Stochastic Processes and their Applications

Other reviewing‌ activities

We performed many reviews for various top‌ ML conferences.
G. Stoltz, Top reviewer distinction for‌ ICML 2025

11.1.4 Invited talks

B. Even, ASCAI,‌ Orsay, June 2025
C. Boyer, Data science seminar,‌ Oxford Mathematical Institute (UK), February 2025
C. Boyer,‌ Seminar, Halicioglu Data Science Institute, San Diego, March‌ 2025
C. Boyer, Summer School CRM Montreal, May 2025
C. Boyer, Summer‌ School EDF-INRIA, June 2025‌
C. Boyer, LMS-Bath Symposia‌‌ on Inverse Problems and Artificial Intelligence in Medicine‌ Bath (UK), June 2025‌
C. Boyer, 1W-MINDS seminar,‌‌ September 2025
C. Boyer, Colloquium, University of Vienna,‌ October 2025
C. Boyer,‌ Lecture on generative models,‌‌ Université d'Aix-Marseille, Novembre 2025
C. Giraud, NITMB, Chicago,‌ April 2025
C. Giraud,‌ Institute Mathematical Science, Singapore,‌‌ May 2025
C. Giraud, ENSAE, November 2025
C.‌ Giraud, LSE, London, December‌ 2025
E. Boursier, Inria‌‌ Lille, March, 2025
E. Boursier, 10e journée statistique,‌ IHES, Bures-sur-Yvette, April, 2025‌
E. Boursier, Inria Grenoble,‌‌ March, 2025
G. Durand, CBIO team seminar, École‌ des Mines de Paris,‌ March 2025
G. Durand,‌‌ Séminaire de modélisation mathématique en sciences de la‌ vie et santé, Univ‌ Paris-Cité, November 2025
R.‌‌ Périer, BACKUP meeting, Paris, June 2025
R. Périer,‌ ASCAI meeting, Orsay, June‌ 2025
R. Périer, Séminaire‌‌ parisien de statistique, Paris, September 2025
J.-M. Poggi,‌ JDSSV Special Invited Paper‌ Session, ISI World Statistics‌‌ Congress, The Hague, the Netherlands, October 2025
H.‌ Cui, Séminaire parisien de‌ statistique, Paris, Novembre 2025‌‌
H. Cui, ENS Paris, November 2025
C. Keribin,‌ CFE-CMStatistics, Londres, December 2025‌

11.1.5 Research administration

S.‌‌ Arlot is a member of the council of‌ the Computer Science Graduate‌ School (GS ISN) of‌‌ University Paris-Saclay.
S. Arlot is a member of‌ the council of the‌ Computer Science Doctoral School‌‌ (ED STIC) of University Paris-Saclay.
C. Boyer is‌ a member of the‌ scientific committee of the‌‌ PGMO (Programme Gaspard Monge pour l'Optimisation) program.
C.‌ Boyer is an elected‌ member of the liaison‌‌ committee of SMAI-MODE group.
C. Giraud is a‌ member of the Scientific‌ Committee of labex IRMIA+,‌‌ Strasbourg.
C. Giraud is deputy director of the‌ Mathematics Graduate School of‌ University Paris-Saclay.
C. Giraud‌‌ is in charge of the whole Masters program‌ in mathematics for University‌ Paris-Saclay.
C. Giraud is‌‌ a member of the local Scientific Committee of‌ Institut Pascal.
C. Giraud‌ is a member of‌‌ the council of the Mathematics Doctoral School (EDMH)‌ of Université Paris-Saclay.
C.‌ Keribin is member of‌‌ the board of the Computer Science Doctoral School‌ (ED MSTIC) of Paris-Est‌ Sup.
C. Keribin is‌‌ deputy director of Laboratoire de mathématiques d'Orsay, director‌ since 1/1/2026.
C. Keribin‌ is in charge of‌‌ the M2-Math and IA program master of the‌ mathematical school
P. Massart‌ is director of the‌‌ Fondation Mathématique Jacques Hadamard.

11.1.6 Service to‌ the academic community

Kevin‌ Bleakley : Maintains the‌‌ English version of the LMO's website dedicated to‌ research activities
E. Boursier:‌ member of Inria Saclay‌‌ scientific committee
E. Chzhen: member of Bibliothèque Jacques‌ Hadamard scientific committee
C.‌ Giraud: coordinator of computing‌‌ resources at the Institut Mathématiques d'Orsay (10 engineers)‌
C. Giraud: senior member‌ of CCUPS (Commission Consultative‌‌ Université Paris Saclay)
G . Durand: member of‌ CCUPS
G . Durand:‌ member of the Teaching‌‌ Council of the maths department of Orsay
C.‌ Giraud: recruting committee for‌ Data-AI associate professor positions‌‌
C. Keribin is co-president‌ of the scholarship allocation committee MixtAI of the‌ SaclAI school.
C. Keribin is member of the‌ committee for awarding the Sophie Germain excellence scholarships‌ (FMJH)
C. Keribin: member of the follow-up committee‌ for PhD student Sara Madad (UTT)
C. Keribin:‌ member of the follow-up committee for PhD student‌ Anderson Augusma (Laboratoire d'informatique de Grenoble)
C. Keribin:‌ member of the follow-up committee for PhD student‌ Augustin Pion (Laboratoire des Signaux et Systèmes, CentraleSupelec)‌
C. Keribin: member of the follow-up committee for‌ PhD student Lucie Arts (LPSM)
C. Keribin: member‌ of the follow-up committee for PhD student Samy‌ Vilhes (Insa Rouen)
G. Durand: member of the‌ follow-up committee for PhD student Nicola De Simone‌ (CEA Grenoble)

11.2 Teaching - Supervision - Juries‌

11.2.1 Teaching

Most of the team members (especially‌ Professors, Associate Professors and Ph.D. students) teach several‌ courses at University Paris-Saclay, as part of their‌ teaching duty. We mention below some of the‌ classes in which we teach.

Masters: S. Arlot,‌ Statistical learning and resampling, 30h, M2, Université Paris-Saclay‌
Masters: S. Arlot, Preparation for French mathematics agrégation‌ (statistics), 25h, M2, Université Paris-Saclay
Masters: C. Boyer,‌ Refresher courses in statistics, 15h, M2, Université Paris-Saclay‌
Masters: C. Boyer, Optimization meets generalization, mathematics of‌ neural networks, 20h, M2, Université Paris-Saclay
Masters: C.‌ Boyer, Guidelines in Machine Learning, 20h, M2, Université‌ Paris-Saclay
Masters: C. Giraud, High-Dimensional Probability and Statistics,‌ 45h, M2, Université Paris-Saclay
Masters: C. Giraud, Mathematics‌ for AI, 75h, M1, Université Paris-Saclay
Masters: C.‌ Keribin, unsupervised and supervised learning, M1, 42h, Université‌ Paris-Saclay/ENSTA
Masters: C. Keribin, Unsupervised learning, M1, 15h,‌ Université Paris-Saclay
Masters: C. Keribin, Advanced Unsupervised Learning,‌ M2, 24h, Université Paris-Saclay
Masters: C. Keribin, Internship‌ supervision for M2-Maths & IA, Université Paris-Saclay
Masters:‌ G. Durand, Deep Learning Project, 16h, M1-Maths &‌ AI, Université Paris-Saclay
Masters: G. Durand, Internship supervision,‌ M1-Maths & AI, Université Paris-Saclay
Masters: G. Durand,‌ pré-requis de statistique, 6h, M1-Maths & AI, Université‌ Paris-Saclay
Licence: G. Durand, Statistical inference, 30h, L3‌ mathématiques, Université Paris-Saclay
Licence: G. Durand, Multivariate data‌ analysis, 18h, L3 mathématiques, Université Paris-Saclay
Licence: G.‌ Durand, Statistical tests for biology, 38h, L2 biology,‌ Université Paris-Saclay
Licence: G. Durand, Probability and statistics,‌ 24h, L2 mathématiques et sciences du vivant, Université‌ Paris-Saclay
Masters: G. Durand, Mathematical statistics, 18h, mastères‌ spécialisées, ENSAE
Licence/Masters: E. Chzhen, PCC Polytechnique
Masters:‌ E. Chzhen, Statistical Theory of Algorithmic Fairness, 20h,‌ M2 Université Paris-Saclay
Masters: E. Boursier, Sequential Learning,‌ 24h, M2 Université Paris-Saclay

11.2.2 Supervision

PhD defenses‌

2025-09-16: Daniil Tiapkin, Sample-efficient reinforcement learning: exploration, imitation,‌ and online learning, started October 2023, co-advised by‌ G. Stoltz and E. Moulines (Polytechnique).
2025-12-04 :‌ Leo Martins Bianco, Outliers and Hallucinations: Contributions to‌ Robust Community Detection and Language Model Alignment, started‌ 01/10/2022, co-advised by C. Keribin, Z. Naulet (AgroParisTech)‌ and J. Hoffmann (Google DeepMind).
2025-12-12: Chiara Mignacco,‌ A mathematical study of policy orchestration for reinforcement‌ learning, started October 2022, co-advised by G. Stoltz‌ and M. Jonckheere (LAAS–CNRS, Toulouse).

Current PhD students

PhD in progress: Gayane‌ Taturyan, Fairness and Robustness‌ in Machine Learning, started‌‌ Nov. 2021, co-advised by E. Chzhen, J.-M. Loubes‌ (Univ. Toulouse Paul Sabatier)‌ and M. Hebiri (Univ.‌‌ Gustave Eiffel)
PhD in progress: Samy Clementz, Data-driven‌ Early Stopping Rules for‌ saving computation resources in‌‌ AI, started Sept. 2021, co-advised by S. Arlot‌ and A. Celisse
PhD‌ in Progress: Aymeric Capitaine,‌‌ Incitivizing Federated and Decentralized Learning, started September 2023,‌ co-advised by E. Boursier,‌ M. Jordan (Inria Paris)‌‌ and A. Durmus (Polytechnique)
PhD in Progress: Antoine‌ Scheid, Multi-agent bandits and‌ Markovian games, started September‌‌ 2023, co-advised by E. Boursier, M. Jordan (Inria‌ Paris) and A. Durmus‌ (Polytechnique)
PhD in Progress:‌‌ Guillaume Principato, Hierarchical conformal prediction for smart electric‌ vehicle charging, started December‌ 2023, co-advised by J.M.‌‌ Poggi and G. Stoltz, as well as Y.‌ Amara-Ouali, Y. Goude, B.‌ Hamrouche (EDF)
PhD in‌‌ progress: Pierre-André Mikem, Multiple instance learning for the‌ detection of tumor cells,‌ started March 2023, co-advised‌‌ by C. Keribin and P. Massart (Univ. Paris-Saclay).‌ Cifre contract with Metafora‌
PhD in progress: Romain‌‌ Périer, Développement de nouvelles méthodes post hoc pour‌ données structurées, started October‌ 2023, co-advised by G.‌‌ Durand and Gilles Blanchard (Univ Paris-Saclay)
PhD in‌ progress: Bertrand Even, Compromis‌ Statistique-Computationnel et équité en‌‌ apprentissage non-supervisé, started September 2024, co-advised by C.‌ Giraud and N. Verzelen‌ (INRAE)
PhD in progress:‌‌ Victor Turmel, Repeated Games and Sequential Learning: Towards‌ Fair and Efficient Algorithms,‌ started October 2024, co-advised‌‌ by G. Stoltz and E. Boursier
PhD in‌ progress: Dhia-Elhaq Ouerfelli, Change-point‌ detection and explainability of‌‌ high-dimensional time series, started October 2024, co-advised by‌ S. Arlot, K. Bleakley,‌ and P. Pamphile
PhD‌‌ in progress: Justine Lebrun, Modeling / forecasting /‌ managing the passenger positioning‌ on platforms and on‌‌ board trains in densely populated areas, started January‌ 2025, co-advised by C.‌ Keribin and E. Come‌‌ (UGE/Grettia). Cifre contract with SNCF
PhD in progress:‌ Simone Maria Giancola, Computational‌ barriers for modern learning‌‌ problems, started November 2025, co-advised by C. Giraud‌ and N. Verzelen (INRAE)‌
PhD in progress: Hanqi‌‌ Sun, Causal inference through multi-group learning, started September‌ 2025, co-advised by E.‌ Chzhen, L. Ganassali, G.‌‌ Stoltz
PhD in progress: Timothée Vinçon, Optimization of‌ the control of a‌ nuclear reactor by reinforcement‌‌ learning techniques, advised by G. Stoltz, as well‌ as by G. Simonini‌ (EDF)

11.2.3 Juries

We‌‌ participated in many PhD committees (too many to‌ keep an exact record),‌ at University Paris-Saclay as‌‌ well as at other universities, and we refereed‌ several of these PhDs.‌

11.3 Popularization

11.3.1 Education‌‌

Christophe Giraud produces educational videos on his YouTube‌ channel “High-dimensional probability and‌ statistics”: see here.‌‌
Gilles Stoltz held a MATh.en.JEANS workshop in 2024-25‌ in Lycée Douanier Rousseau‌ of Laval.

11.3.2 Interventions‌‌

A perspective on statistics and the summit for‌ action on AI. See‌ here.

12 Scientific‌‌ production

12.1 Major publications

1 articleE.Etienne‌ Boursier and N.Nicolas‌ Flammarion. Early alignment‌‌ in two-layer networks training‌ is a two-edged sword.Journal of Machine‌ Learning ResearchJuly 2025HAL
2 miscA.‌Alexandra Carpentier, C.Christophe Giraud and N.‌Nicolas Verzelen. Phase Transition for Stochastic Block‌ Model with more than $\sqrt{n}$ Communities.September‌ 2025HAL
3 articleN.Nathan Doumèche,‌ G.Gérard Biau and C.Claire Boyer.‌ On the convergence of PINNs.BernoulliVol.‌ 312025, pp. 2127-2151HAL
4 inproceedings‌P.Pierre Marion, R.Raphaël Berthier,‌ G.Gérard Biau and C.Claire Boyer.‌ Attention layers provably solve single-location regression.Proceedings‌ of the Thirteenth International Conference on Learning Representations‌ICLR 2025 - Thirteenth International Conference on Learning‌ RepresentationsSingapore, SingaporeFebruary 2025HAL

12.2 Publications‌ of the year

International journals

5 articleF.‌Fahd Al Qureshah, J.Jérémie Le Pen‌, N.Nicole de Weerd, M.Marcela‌ Moncada-Velez, M.Marie Materna, D.Daniel‌ Lin, B.Baptiste Milisavljevic, F.Fernanda‌ Vianna, L.Lucy Bizien, L.Lazaro‌ Lorenzo, M.Marc Lecuit, J.-D.Jean-David‌ Pommier, S.Sevgi Keles, T.Tayfun‌ Ozcelik, S.Sigifredo Pedraza-Sanchez, N.Nicolas‌ de Prost, L.Loubna El Zein,‌ H.Hassan Hammoud, L. F.Lisa F.P.‌ Ng, R.Rabih Halwani, N.Narjes‌ Saheb Sharif-Askari, Y. L.Yu Lung Lau‌, A.Anthony Tam, N.Neha Singh‌, S.Sagar Bhattad, Y.Yackov Berkun‌, W.Wasun Chantratita, R.Raúl Aguilar-López‌, M.Mohammad Shahrooei, L.Laurent Abel‌, A.Alessandro Aiuti, S.Saleh Al-Muhsen‌, A. B.Ana Bertha Alcántara-Garduño, E.‌Evangelos Andreakos, A.Andrés Arias, H.‌ B.Hagit Baris Feldman, P.Paul Bastard‌, A.Alexandre Bolze, A.Alessandro Borghesi‌, A.Ahmed Bousfiha, P.Petter Brodin‌, J.John Christodoulou, A.Aurélie Cobat‌, R.Roger Colobran, A.Antonio Condino-Neto‌, S.Sotiriјa Duvlis, X.Xavier Duval‌, M.Munis Dündar, S.Soha Fakhreddine‌, J.Jacques Fellay, C.Carlos Flores‌, J. L.José Luis Franco, G.‌Guy Gorochov, P.Peter Gregersen, D.‌David Hagin, R.Rabih Halwani, M.‌ T.María Teresa Herrera, I.-N. F.Ivan‌ Fan-Ngai Hung, E.Emmanuelle Jouanguy, Y.-L.‌Yu-Lung Lau, D.Daniel Leung, T.‌Tom Le-Voyer, D.Davood Mansouri, J.‌Jesús Mercado-García, I.Isabelle Meyts, T.‌Trine Mogensen, L. F.Lisa F.P. Ng‌, A.Antonio Novelli, G.Giuseppe Novelli‌, S.Satoshi Okada, F.Firat Ozcelik‌, T.Tayfun Ozcelik, R. P.Rebeca‌ Perez de Diego, J.Jordi Perez-Tur,‌ G.Graziano Pesole, A.Anne Puel,‌ L.Laurent Renia, I.Igor Resnick,‌ C.Carlos Rodríguez-Gallego, M.Manal Sbeity,‌ S.Sahar Sedighzadeh, M.Mohammad Shahrooei, P.Pere Soler-Palacín,‌ A.András Spaan,‌ S.Stuart Tangye,‌‌ A. A.Ahmad Abou Tayoun, Ş. G.‌Şehime Gülsün Temel,‌ C.Christian Thorball,‌‌ I.Ibrahim Torktaz, S.Sophie Trouillet-Assant,‌ S.Stuart Turvey,‌ F.Furkan Uddin,‌‌ F. S.Fernanda Sales Luiz Vianna, D.‌Donald Vinh, O.‌Oscar Zabaleta-Martínez, Q.‌‌Qian Zhang, S.-Y.Shen-Ying Zhang, J.-L.‌Jean-Laurent Casanova, C.‌Chanreaksmey Eng, K.‌‌Kimrong Bun, M.Mengheng Oum, P.‌Patrice Piola, A.‌Arnaud Tarantola, M.‌‌Mey Channa, V.Veasna Duong, P.‌Philippe Buchy, C.‌Chris Gorman, J.-D.‌‌Jean-David Pommier, Y.Yoann Crabol, P.‌Philippe Dussart, M.‌M. Bunleat, M.‌‌M. Panha, M.M.Kanarith Sim, E.‌Em Bunnakea, D.‌Denis Laurent, H.‌‌Heng Sothy, K.Ky Santy, A.‌Anousone Douangnouvong, D.‌Danoy Chommanam, K.‌‌Khansoudaphone Phakhounthong, M.Manivanh Vongsouvath, M.‌Malee Seephone, B.‌Bountoy Sibounheunang, S.‌‌Sayaphet Rattanavong, V.Viengmon Davong, M.‌Malavanh Vongsouvath, M.‌Mayfong Mayxay, A.‌‌Audrey Dubot-Pérès, P.Paul Newton, S.‌Sommanikhone Phangmanixay, K.‌Khounthavy Phongsavath, D.‌‌ D.Dang Duc Anh, D.Do Quyen‌, T. T.Tran‌ Thi Mai Hung,‌‌ N. T.Nguyen Thi Thu Thuy, L.‌ M.Luong Minh Tan‌, A. T.Anh‌‌ Tuan Pham, N.Nguyen Hien, D.‌ T.Do Thu Huong‌, L. T.Le‌‌ Thanh Hai, N. V.Nguyen Van Lam‌, P. N.Pham‌ Nhat An, P.‌‌ H.Phan Huu Phuc, P. B.Phung‌ Bich Thuy, T.‌ T.Tran Thi Thu‌‌ Huong, C. S.Chaw Su Hlaing,‌ A. M.Aye Mya‌ Min Aye, C.‌‌Cho Thair, K.Kyaw Linn, M.‌May July, W.‌Win Thein, L.‌‌ L.Latt Latt Kyaw, H. H.Htay‌ Htay Tin, O.‌ S.Ommar Swe Tin‌‌, K. Y.Khin Yi Oo, Y.‌Yoann Crabol, M.‌Magali Herrant, M.‌‌Magali Lago, M.Maud Seguy, M.‌Marc Jouan, L.‌Lukas Hafner, P.‌‌Philippe Pérot, M.Marc Eloit, M.‌Marc Lecuit, O.‌Olivier Lortholary, J.‌‌Julien Capelle, B.Bruno Rosset, V.‌Veronique Chevalier, J.‌Jérôme Honnorat, A.‌‌ L.Anne Laurie Pinto, A.Auey Dubot-Peres‌, X.Xavier de‌ Lamballerie, K.Kevin‌‌ Bleakley, B.Bernadette Murgue, C.Catherine‌ Ferrant, C.Christian‌ Devaux, H.Hervé‌‌ Tissot-Dupont, J.-P.Jean-Paul Moatti, M.Mayfong‌ Mayxay, P.Pascal‌ Bonnet, D.Didier‌‌ Fontenille, J.-F.Jean-François Delfraissy, P.Patrice‌ Debré, B.Benoit‌ Durand, L.Laurent‌‌ Abel, P.Paul Bastard, E.Emmanuelle‌ Jouanguy, V.Vivien‌ Béziat, P.Peng‌‌ Zhang, C.Charles‌ Rice, A.Aurélie Cobat, S.-Y.Shen-Ying‌ Zhang, P.Paul Hertzog, J.-L.Jean-Laurent‌ Casanova and Q.Qian Zhang. A common‌ form of dominant human IFNAR1 deficiency impairs IFN-α‌ and -ω but not IFN-β-dependent immunity.Journal‌ of Experimental Medicine2222February 2025,‌ e20241413HAL DOI
6 articleM.Mélissa Baietto‌, R.Rémi Coulaud, C.Christine Keribin‌ and G.Gilles Stoltz. Improved real-time crowding‌ information through the modeling of passenger movements in‌ trains with communicating coaches.Data Science for‌ Transportation2025HAL back to text
7 article‌K.Kevin Bleakley, M.Mouhcine Mendil,‌ M.Martin Royer and B.Benjamin Auder.‌ Supervised aggregation of anomaly score functions for active‌ anomaly detection.Transactions on Machine Learning Research‌ Journal2026. In press. HAL back to‌ text
8 articleE.Etienne Boursier and N.‌Nicolas Flammarion. Early alignment in two-layer networks‌ training is a two-edged sword.Journal of‌ Machine Learning ResearchJuly 2025HAL back to‌ text back to text
9 articleS.Simon‌ Briend, C.Christophe Giraud, G.Gábor‌ Lugosi and D.Déborah Sulem. Estimating the‌ history of a random recursive tree.Bernoulli‌2025. In press. HAL
10 articleN.‌Nathan Doumèche, G.Gérard Biau and C.‌Claire Boyer. On the convergence of PINNs‌.BernoulliVol. 312025, pp. 2127-2151‌HAL
11 articleG.Guillermo Durand. Fast‌ confidence bounds for the false discovery proportion over‌ a path of hypotheses.ComputoOctober 2025‌HAL DOI back to text back to text‌
12 articleS.Solenne Gaucher, G.Gilles‌ Blanchard and F.Frédéric Chazal. Supervised Contamination‌ Detection, with Flow Cytometry Application.Biometrika112‌4August 2025HALDOI
13 articleF.‌Françoise Irlinger, C.Christine Keribin, A.-S.‌Anne-Sophie Sarthou, B.Béatrice Laroche and S.‌Sandra Helinck. Pink discoloration defects associated with‌ microbial structure and metabolome changes in commercial bloomy‌ cheeses.International Journal of Food Microbiology442‌July 2025, 111363HAL DOI back to‌ text
14 articleM.Matthieu Jonckheere, C.‌Chiara Mignacco and G.Gilles Stoltz. Policy‌ Optimization via Adv2: Adversarial Learning on Advantage Functions‌.Transactions on Machine Learning Research JournalMay‌ 2025, https://openreview.net/forum?id=Oyueig10EdHALback to text
15‌ articleS.Stanislas Strasman, A.Antonio Ocello‌, C.Claire Boyer, S.Sylvain Le‌ Corff and V.Vincent Lemaire. An analysis‌ of the noise schedule for score-based generative models‌.Transactions on Machine Learning Research JournalJanuary‌ 2025HAL back to text

International peer-reviewed conferences‌

16 inproceedingsA.Alex Barbier-Chebbah, C. L.‌Christian L. Vestergaard, J.-B.Jean-Baptiste Masson and‌ E.Etienne Boursier. Approximate information maximization for‌ bandit games.Proceedings of Machine Learning Research‌28th International Conference on Artificial Intelligence and Statistics‌258Phuket, ThailandMay 2025, 316-324HAL
17 inproceedingsE.Etienne‌ Boursier and N.Nicolas‌ Flammarion. Simplicity bias‌‌ and optimization threshold in two-layer ReLU networks.‌Proceedings of the 42nd‌ International Conference on Machine‌‌ Learning, Vancouver, Canada. PMLR 267, 2025ICML 2025‌ - International Conference on‌ Machine Learning267Vancouver,‌‌ CanadaJuly 2025HALback to text
18‌ inproceedingsE.Etienne Boursier‌, S.Scott Pesme‌‌ and R.-A.Radu-Alexandru Dragomir. A Theoretical Framework‌ for Grokking: Interpolation followed‌ by Riemannian Norm Minimisation‌‌.Advances in Neural Information Processing SystemsNeurIPS‌ 2025 - Neural Information‌ Processing Systems38San‌‌ Diego, United StatesDecember 2025HAL back to‌ text
19 inproceedingsA.‌Aymeric Capitaine, E.‌‌Etienne Boursier, E.Eric Moulines, M.‌ I.Michael I. Jordan‌ and A.Alain Durmus‌‌. Prediction-Aware Learning in Multi-Agent Systems.Proceedings‌ of the 42nd International‌ Conference on Machine Learning,‌‌ Vancouver, Canada. PMLR 267, 2025.ICML 2025 -‌ 42nd International Conference on‌ Machine LearningPMLR 267‌‌Vancouver, CanadaJuly 2025HAL back to text‌
20 inproceedingsP.Pierre‌ Marion, R.Raphaël‌‌ Berthier, G.Gérard Biau and C.Claire‌ Boyer. Attention layers‌ provably solve single-location regression‌‌.Proceedings of the Thirteenth International Conference on‌ Learning RepresentationsICLR 2025‌ - Thirteenth International Conference‌‌ on Learning RepresentationsSingapore, SingaporeFebruary 2025HAL‌back to text
21‌ inproceedingsL.Leonardo Martins‌‌ Bianco, C.Christine Keribin and Z.Zacharie‌ Naulet. SubSearch: Robust‌ Estimation and Outlier Detection‌‌ for Stochastic Block Models via Subgraph Search.‌AISTATS 2025 - 28th‌ International Conference Artificial Intelligence‌‌ and Statistics258Mai Khao, ThailandMay 2025‌, 1297-1305HAL back‌ to text
22 inproceedings‌‌R.Rodrigo Maulen-Soto, P.Pierre Marion and‌ C.Claire Boyer.‌ Attention-based clustering.Proceedings‌‌ of The Thirty-ninth Annual Conference on Neural Information‌ Processing Systems (NeurIPS 2025)‌The Thirty-ninth Annual Conference‌‌ on Neural Information Processing Systems 2025San Diego,‌ United States2025HAL‌back to text
23‌‌ inproceedingsS.Stanislas Strasman, S.Sobihan Surendran‌, C.Claire Boyer‌, S.Sylvain Le‌‌ Corff, V.Vincent Lemaire and A.Antonio‌ Ocello. Wasserstein Convergence‌ of Critically Damped Langevin‌‌ Diffusions.Proceedings of The Thirty-ninth Annual Conference‌ on Neural Information Processing‌ Systems (NeurIPS) 2025The‌‌ Thirty-ninth Annual Conference on Neural Information Processing Systems‌ (NeurIPS 2025)San Diego,‌ United States2025HAL‌‌
24 inproceedingsV.Victor Thuot, A.Alexandra‌ Carpentier, C.Christophe‌ Giraud and N.Nicolas‌‌ Verzelen. Clustering with bandit feedback: breaking down‌ the computation/information gap.‌36. International Conference on‌‌ Algorithmic Learning Theory, ALT 2025272Proceedings of‌ Machine Learning ResearchMilano,‌ ItalyPMLRFebruary 2025‌‌, 1221 - 1284HAL
25 inproceedingsG.‌Gauthier Thurin, K.‌Kimia Nadjahi and C.‌‌Claire Boyer. Optimal Transport-based Conformal Prediction.‌Proceedings of International Conference‌ on Machine Learning (ICML‌‌ 2025)International Conference on Machine Learning (ICML 2025)‌Vancouver, Canada2025HAL‌back to text
26‌‌ inproceedingsD.Daniil Tiapkin‌, E.Evgenii Chzhen and G.Gilles Stoltz‌. Narrowing the Gap between Adversarial and Stochastic‌ MDPs via Policy Optimization.The 28th International‌ Conference on Artificial Intelligence and Statistics (AISTATS)Mai‌ Khao, ThailandMay 2025HAL
27 inproceedingsY.-H.‌Yu-Han Wu, P.Pierre Marion, G.‌Gérard Biau and C.Claire Boyer. Taking‌ a Big Step: Large Learning Rates in Denoising‌ Score Matching Prevent Memorization.Proceedings of Thirty‌ Eighth Conference on Learning Theory, PMLR (COLT 2025)‌Proceedings of Thirty Eighth Conference on Learning Theory,‌ PMLRLyon, France2025HAL back to text‌

Conferences without proceedings

28 inproceedingsO.Odilon Duranthon‌, P.Pierre Marion, C.Claire Boyer‌, B.Bruno Loureiro and L.Lenka Zdeborová‌. Statistical Advantage of Softmax Attention: Insights from‌ Single-Location Regression.ICLR 2026 - Fourteenth International‌ Conference on Learning RepresentationsRio de Janeiro, Brazil‌April 2026HAL back to text
29 inproceedings‌M.Maxime Megel, G.Gilles Celeux,‌ S. X.Serge X. Cohen, A.Agnès‌ Grimaud and C.Christine Keribin. Clustering of‌ XRF spectral images using a dissimilarity based on‌ likelihood loss.JdS 2025 - 56ièmes Journées‌ de StatistiqueMarseille, FranceJune 2025HAL
30‌ inproceedingsP.-A.Pierre-André Mikem, C.Christine Keribin‌ and P.Pascal Massart. Weakly supervised learning‌ methods for the classification of patients from flow‌ cytometry data.JDS 2025 - 56ème édition‌ des Journées de StatistiquesMarseille, FranceJune 2025‌HAL
31 inproceedingsP.Patrick Pamphile, S.‌Sonia Lefeuvre, J. O.Jacques Olivier Klein‌ and I.Isabelle Bournaud. Transition to the‌ Competency-Based Approach: Teachers' Perception in the Implementation of‌ the BUT.Ecosystème de formations : pour‌ quelles transformation(s) ?BREST, FranceMay 2025HAL‌

Reports & preprints

32 miscG.Gérard Biau‌ and C.Claire Boyer. A Note on‌ k-NN Gating in RAG.January 2026HAL‌
33 miscA.Alexandra Carpentier, S. M.‌Simone Maria Giancola, C.Christophe Giraud and‌ N.Nicolas Verzelen. Low-degree lower bounds via‌ almost orthonormal bases.December 2025, 53p.‌HAL back to text
34 miscA.Alexandra‌ Carpentier, C.Christophe Giraud and N.Nicolas‌ Verzelen. Phase Transition for Stochastic Block Model‌ with more than $\sqrt{n}$ Communities.September 2025‌HAL back to textback to text
35‌ miscE.Evgenii Chzhen, M.Mohamed Hebiri‌ and G.Gayane Taturyan. Randomized multi-class classification‌ under system constraints: a unified approach via post-processing‌.December 2025HALback to text
36‌ miscN.Nathan Doumèche, F.Francis Bach‌, É.Éloi Bedek, G.Gérard Biau‌, C.Claire Boyer and Y.Yannig Goude‌. Forecasting time series with constraints.February‌ 2025HAL back to text
37 miscN.‌Nathan Doumèche, F.Francis Bach, G.‌Gérard Biau and C.Claire Boyer. Fast‌ kernel methods: Sobolev, physics-informed, and additive models.September 2025HAL back‌ to text back to‌ text
38 miscB.‌‌Bertrand Even, C.Christophe Giraud and N.‌Nicolas Verzelen. Computational‌ barriers for permutation-based problems,‌‌ and cumulants of weakly dependent random variables.‌July 2025, 79‌ p.HAL back to‌‌ text
39 miscB.Bertrand Even, C.‌Christophe Giraud and N.‌Nicolas Verzelen. Computational‌‌ lower bounds in latent models: clustering, sparse-clustering, biclustering‌.June 2025HAL‌back to text
40‌‌ miscC.Claire Lacour, P.Pascal Massart‌ and V.Vincent Rivoirard‌. Is model selection‌‌ possible for the $_{p}$ -loss? PCO estimation for‌ regression models.April‌ 2025HAL back to‌‌ text
41 miscP.Pascal Massart and V.‌Vincent Rivoirard. Concentration‌ inequalities and cut-off phenomena‌‌ for penalized model selection within a basic Rademacher‌ framework.April 2025‌HAL back to text‌‌
42 miscC.Chiara Mignacco, M.Matthieu‌ Jonckheere and G.Gilles‌ Stoltz. Online Matching‌‌ via Reinforcement Learning: An Expert Policy Orchestration Strategy‌.October 2025HAL‌back to text
43‌‌ miscP.Patrick Pamphile and I.Isabelle Bournaud‌. Appropriation d'un dispositif‌ d'accompagnement méthodologique d'étudiants primoentrants‌‌ à l'université : une lecture par profils et‌ postures.September 2025‌HAL
44 miscP.‌‌Patrick Pamphile. Comprendre la réussite étudiante :‌ une comparaison méthodologique entre‌ régression multiple et analyse‌‌ par profils.June 2025HAL
45 misc‌G.Guillaume Principato,‌ G.Gilles Stoltz,‌‌ Y.Yvenn Amara-Ouali, Y.Yannig Goude,‌ B.Bachir Hamrouche and‌ J.Jean‐michel Poggi.‌‌ Conformal Prediction for Hierarchical Data.February 2025‌HAL back to text‌
46 miscY.-H.Yu-Han‌‌ Wu, Q.Quentin Berthet, G.Gérard‌ Biau, C.Claire‌ Boyer, E.Elie‌‌ Romuald and P.Pierre Marion. Optimal stopping‌ in latent diffusion models‌.October 2025HAL‌‌back to text

Other scientific publications

47 misc‌C.Christophe Biernacki,‌ J.Julien Jacques and‌‌ C.Christine Keribin. Model based co-clustering: high‌ dimension & estimation challenges‌.January 2025HAL‌‌
48 miscL.Leonardo Martins Bianco, C.‌Christine Keribin and Z.‌Zacharie Naulet. Robust‌‌ estimation and outlier detection for Stochastic Block Models‌ via subgraph search.‌December 2025HAL

12.3‌‌ Cited publications

49 miscJ.Jayadev Acharya,‌ A.Ayush Jain,‌ G.Gautam Kamath,‌‌ A. T.Ananda Theertha Suresh and H.Huanyu‌ Zhang. Robust Estimation‌ for Random Graphs.‌‌2022, URL: https://arxiv.org/abs/2111.05320back to text
50‌ articleG.Guillermo Durand‌, G.Gilles Blanchard‌‌, P.Pierre Neuvial and E.Etienne Roquain‌. Post hoc false‌ positive control for structured‌‌ hypotheses.Scand. J. Stat.4742020‌, 1114--1148URL: https://doi.org/10.1111/sjos.12453‌DOI back to text‌‌
51 articleM.Mohamed Nadif and G.Gérard‌ Govaert. Algorithms for‌ Model-based Block Gaussian Clustering.‌‌.DMIN82008, 14--17back to‌ text
52 articleT.‌Tselil Schramm and A.‌‌ S.Alexander S. Wein‌. Computational barriers to estimation from low-degree polynomials‌.The Annals of Statistics503June‌ 2022, URL: http://dx.doi.org/10.1214/22-AOS2179DOI back to text‌

CELESTE - 2025

CELESTE - 2025

2025﻿​﻿﻿Activity reportProject-TeamCELESTE​‌﻿﻿

Keywords

Computer​​​‌ Science and Digital Science﻿​﻿﻿

Other Research Topics﻿​﻿﻿ and Application Domains

1​​​‌ Team members, visitors, external﻿﻿﻿‌ collaborators

Research Scientists

Faculty Members﻿﻿﻿‌

Post-Doctoral Fellows﻿​​﻿

PhD﻿‌​‌ Students

Interns and﻿​​﻿ Apprentices

Administrative Assistant

External﻿﻿﻿‌ Collaborators

2 Overall objectives﻿﻿﻿‌

2.1 Mathematical statistics and﻿‌​‌ learning

3 Research program

3.1​‌﻿﻿ Uncertainty Quantification for Structured​​﻿﻿ Multivariate Outputs.

3.2 Learning with Structure,﻿‌​‌ Constraints, and Operational Dynamics.﻿​​﻿

3.3﻿​​﻿ Optimization, Learning Dynamics, and​​​‌ Reinforcement Learning Foundations.

3.4​​﻿﻿ Generative Modeling: Score-Based Methods.​​​‌

3.5 Computational Limits​‌﻿﻿ and Statistical–Computational Gaps.

3.6 Attention Mechanisms: Theory﻿﻿﻿‌ for Transformers

3.7﻿‌​‌ Robust Statistical Inference, Multiple﻿​​﻿ Testing, Reliability, and Model​​​‌ Selection.

3.8 Overall Positioning.

4 Application​‌﻿﻿ domains

4.1 Electricity load​​﻿﻿ consumption: forecasting and control​​​‌

4.2 Electricity production:​​​‌ control

4.3 Cytometry​​﻿﻿

4.4 Railway​​​‌ operation

4.5 Anomaly​​﻿﻿ detection in industrial time​​​‌ series

4.6 Reliability​​﻿﻿

4.7 Neglected tropical​​​‌ diseases

4.8 Explainability in﻿​﻿﻿ change-points detection in high​‌﻿﻿ dimensional multivariate time series​​﻿﻿

4.9 Education﻿‌​‌ sciences

4.10​​​‌ Ancient materials

5 Social​​​‌ and environmental responsibility

5.1﻿﻿﻿‌ Footprint of research activities﻿‌​‌

5.2​​﻿﻿ Impact of research results​​​‌

6​‌﻿﻿ Highlights of the year​​﻿﻿

6.1 Awards

6.2 Grants

6.3 Selected​​​‌ publications

7​​﻿﻿ Latest software developments, platforms,​​​‌ open data

7.1 Latest﻿​﻿﻿ software developments

7.1.1 acanag​‌﻿﻿

7.1.2﻿​​﻿ sanssouci

7.1.3 KCPD

7.2﻿‌​‌ Open data

8 New﻿​​﻿ results

8.1 Uncertainty Quantification​​​‌ and Conformal Prediction

8.1.1﻿﻿﻿‌ Conformal prediction for hierarchical﻿‌​‌ data

8.1.2﻿​​﻿ Optimal transport-based conformal prediction​​​‌

8.2 Learning with​​​‌ Structure, Constraints, and Dynamics﻿﻿﻿‌

8.2.1 Modeling of passenger﻿‌​‌ movements in trains with﻿​​﻿ communicating coaches

8.2.2 Forecasting time series﻿​﻿﻿ with constraints

8.2.3 Randomized﻿​﻿﻿ multi-class classification under system​‌﻿﻿ constraints: a unified approach​​﻿﻿ via post-processing

8.2.4​​﻿﻿ Prediction-aware learning in multi-agent​​​‌ systems

8.2.5 Detecting rare​​​‌ anomalies in multidimensional data﻿﻿﻿‌ using active and supervised﻿‌​‌ learning

8.2.6 Physics-informed﻿﻿﻿‌ kernel learning

8.2.7﻿‌​‌ Fast kernel methods: Sobolev,﻿​​﻿ physics-informed, and additive models​​​‌

8.3﻿​​﻿ Optimization, Learning Dynamics, and​​​‌ Reinforcement Learning

8.3.1 Early﻿﻿﻿‌ alignment in two-layer networks﻿‌​‌ training is a two-edged﻿​​﻿ sword

8.3.2 Simplicity bias and​​​‌ optimization threshold in two-layer﻿​﻿﻿ ReLU networks

8.3.3​‌﻿﻿ A theoretical framework for​​﻿﻿ grokking: interpolation followed by​​​‌ Riemannian norm minimisation

8.3.4﻿​﻿﻿ Policy optimization via adversarial​‌﻿﻿ learning on advantage functions​​﻿﻿

8.4 Generative Models and﻿​​﻿ Score-Based Methods

8.4.1 An​​​‌ analysis of the noise﻿﻿﻿‌ schedule for score-based generative﻿‌​‌ models

8.4.2 Taking﻿‌​‌ a big step: large﻿​​﻿ learning rates in denoising​​​‌ score matching prevent memorization﻿﻿﻿‌

8.4.3​​​‌ Optimal stopping in latent﻿​﻿﻿ diffusion models

8.5﻿​﻿﻿ Computational Limits and Statistical–Computational​‌﻿﻿ Gaps

8.5.1 Computational lower​​﻿﻿ bounds in latent models:​​​‌ clustering, sparse-clustering, biclustering

8.5.2 Computational barriers﻿​﻿﻿ for permutation-based problems, and​‌﻿﻿ cumulants of weakly dependent​​﻿﻿ random variables

8.5.3 Low-degree lower bounds﻿​​﻿ via almost orthonormal bases​​​‌

8.5.4 Phase﻿‌​‌ transitions for stochastic block﻿​​﻿ models with more than​​​‌ sqrt(n) communities

8.6 Attention Mechanisms

8.6.1﻿​﻿﻿ Attention layers provably solve​‌﻿﻿ single-location regression

8.6.2﻿​﻿﻿ Attention-based clustering

8.6.3 Statistical advantage​​﻿﻿ of softmax attention: insights​​​‌ from single location regression﻿​﻿﻿

8.7 Robust​​​‌ Statistical Inference, Multiple Testing,﻿﻿﻿‌ Reliability, and Model Selection.﻿‌​‌

2025Activity reportProject-TeamCELESTE‌

Computer‌ Science and Digital Science

Other Research Topics and Application Domains

1‌ Team members, visitors, external‌ collaborators

Faculty Members‌

Post-Doctoral Fellows

PhD‌‌ Students

Interns and Apprentices

External‌ Collaborators

2 Overall objectives‌

2.1 Mathematical statistics and‌‌ learning

3.1‌ Uncertainty Quantification for Structured Multivariate Outputs.

3.2 Learning with Structure,‌‌ Constraints, and Operational Dynamics.

3.3 Optimization, Learning Dynamics, and‌ Reinforcement Learning Foundations.

3.4 Generative Modeling: Score-Based Methods.‌

3.5 Computational Limits‌ and Statistical–Computational Gaps.

3.6 Attention Mechanisms: Theory‌ for Transformers

3.7‌‌ Robust Statistical Inference, Multiple Testing, Reliability, and Model‌ Selection.

4 Application‌ domains

4.1 Electricity load consumption: forecasting and control‌

4.2 Electricity production:‌ control

4.3 Cytometry

4.4 Railway‌ operation

4.5 Anomaly detection in industrial time‌ series

4.6 Reliability

4.7 Neglected tropical‌ diseases

4.8 Explainability in change-points detection in high‌ dimensional multivariate time series

4.9 Education‌‌ sciences

4.10‌ Ancient materials

5 Social‌ and environmental responsibility

5.1‌ Footprint of research activities‌‌

5.2 Impact of research results‌

6‌ Highlights of the year

6.3 Selected‌ publications

7 Latest software developments, platforms,‌ open data

7.1 Latest software developments

7.1.1 acanag‌

7.1.2 sanssouci

7.2‌‌ Open data

8 New results

8.1 Uncertainty Quantification‌ and Conformal Prediction

8.1.1‌ Conformal prediction for hierarchical‌‌ data

8.1.2 Optimal transport-based conformal prediction‌

8.2 Learning with‌ Structure, Constraints, and Dynamics‌

8.2.1 Modeling of passenger‌‌ movements in trains with communicating coaches

8.2.2 Forecasting time series with constraints

8.2.3 Randomized multi-class classification under system‌ constraints: a unified approach via post-processing

8.2.4 Prediction-aware learning in multi-agent‌ systems

8.2.5 Detecting rare‌ anomalies in multidimensional data‌ using active and supervised‌‌ learning

8.2.6 Physics-informed‌ kernel learning

8.2.7‌‌ Fast kernel methods: Sobolev, physics-informed, and additive models‌

8.3 Optimization, Learning Dynamics, and‌ Reinforcement Learning

8.3.1 Early‌ alignment in two-layer networks‌‌ training is a two-edged sword

8.3.2 Simplicity bias and‌ optimization threshold in two-layer ReLU networks

8.3.3‌ A theoretical framework for grokking: interpolation followed by‌ Riemannian norm minimisation

8.3.4 Policy optimization via adversarial‌ learning on advantage functions

8.4 Generative Models and Score-Based Methods

8.4.1 An‌ analysis of the noise‌ schedule for score-based generative‌‌ models

8.4.2 Taking‌‌ a big step: large learning rates in denoising‌ score matching prevent memorization‌

8.4.3‌ Optimal stopping in latent diffusion models

8.5 Computational Limits and Statistical–Computational‌ Gaps

8.5.1 Computational lower bounds in latent models:‌ clustering, sparse-clustering, biclustering

8.5.2 Computational barriers for permutation-based problems, and‌ cumulants of weakly dependent random variables

8.5.3 Low-degree lower bounds via almost orthonormal bases‌

8.5.4 Phase‌‌ transitions for stochastic block models with more than‌ sqrt(n) communities

8.6.1 Attention layers provably solve‌ single-location regression

8.6.2 Attention-based clustering

8.6.3 Statistical advantage of softmax attention: insights‌ from single location regression

8.7 Robust‌ Statistical Inference, Multiple Testing,‌ Reliability, and Model Selection.‌‌

8.7.1 Fast confidence bounds for the false discovery‌ proportion over a path‌ of hypotheses

8.7.2 Robust estimation and‌‌ outlier detection for stochastic block models

8.7.3 Pink discoloration‌‌ defects associated with microbial structure and metabolome changes‌ in commercial bloomy cheeses‌

8.7.4 Model selection

8.7.5‌ Concentration inequalities and cut-off phenomena for penalized model‌ selection within a basic Rademacher framework

9‌ Bilateral contracts and grants‌ with industry

9.1‌ Bilateral contracts with industry‌‌

10 Partnerships and cooperations‌‌

10.1 International research visitors

10.1.1 Visits to international‌ teams

Research stays abroad‌