STATIFY

STATIFY - 2025

2025Activity reportProject-TeamSTATIFY

RNSR: 202023582A

Research‌ center Inria Centre at Université Grenoble Alpes
In‌ partnership with:CNRS, Université de Grenoble Alpes
Team‌ name: Bayesian and extreme value statistical models for‌ structured and high dimensional data
In collaboration with:‌Laboratoire Jean Kuntzmann (LJK)

Creation of the Project-Team:‌ 2020 April 01

Each year, Inria research teams‌ publish an Activity Report presenting their work and‌ results over the reporting period. These reports follow‌ a common structure, with some optional sections depending‌ on the specific team. They typically begin by‌ outlining the overall objectives and research programme, including‌ the main research themes, goals, and methodological approaches.‌ They also describe the application domains targeted by‌ the team, highlighting the scientific or societal contexts‌ in which their work is situated.

The reports‌ then present the highlights of the year, covering‌ major scientific achievements, software developments, or teaching contributions.‌ When relevant, they include sections on software, platforms,‌ and open data, detailing the tools developed and‌ how they are shared. A substantial part is‌ dedicated to new results, where scientific contributions are‌ described in detail, often with subsections specifying participants‌ and associated keywords.

Finally, the Activity Report addresses‌ funding, contracts, partnerships, and collaborations at various levels,‌ from industrial agreements to international cooperations. It also‌ covers dissemination and teaching activities, such as participation‌ in scientific events, outreach, and supervision. The document‌ concludes with a presentation of scientific production, including‌ major publications and those produced during the year.‌

Keywords

Computer Science and Digital Science

A3. Data‌ and knowledge
A3.1. Data
A3.1.1. Modeling, representation
A3.1.4.‌ Uncertain data
A3.3. Data and knowledge analysis
A3.3.2.‌ Data mining
A3.3.3. Big data analysis
A5. Interaction,‌ multimedia and robotics
A5.3. Image processing and analysis‌
A5.3.3. Pattern recognition
A5.9. Signal processing
A5.9.2. Estimation,‌ modeling
A6. Modeling, simulation and control
A6.2. Scientific‌ computing, Numerical Analysis & Optimization
A6.2.3. Probabilistic methods‌
A6.2.4. Statistical methods
A6.3. Computation-data interaction
A6.3.1. Inverse‌ problems
A6.3.3. Data processing
A6.3.5. Uncertainty Quantification
A9.‌ Artificial intelligence
A9.2. Machine learning
A9.2.1. Supervised learning‌
A9.2.2. Unsupervised learning
A9.2.4. Optimization and learning
A9.2.5.‌ Bayesian methods
A9.2.7. Kernel methods
A9.3. Signal processing‌

1‌ Team members, visitors, external‌‌ collaborators

Research Scientists

Florence Forbes [Team leader‌, INRIA, Senior‌ Researcher, HDR]‌‌
Sophie Achard [CNRS, Senior Researcher,‌ HDR]
Julyan Arbel‌ [INRIA, HDR‌‌]
Pedro Luiz Coelho Rodrigues [INRIA,‌ ISFP]
Michel Dojat‌ [INSERM]
Henrique‌‌ Donancio [INRIA, Starting Research Position,‌ from Sep 2025 until‌ Nov 2025]
Stephane‌‌ Girard [INRIA, Senior Researcher, HDR‌]

Faculty Members

Julien‌ Chevallier [UGA,‌‌ Associate Professor, until Aug 2025]
Jean-Baptiste‌ Durand [CIRAD,‌ Associate Professor]
Jonathan‌‌ El Methni [UGA, Associate Professor]‌

Post-Doctoral Fellows

Loic Chalmandrier‌ [UGA, Post-Doctoral‌‌ Fellow]
Henrique Donancio [INRIA, Post-Doctoral‌ Fellow, until Aug‌ 2025]
Anton Francois‌‌ [CNRS, Post-Doctoral Fellow, until Sep‌ 2025]
Thomas Guilmeau‌ [UGA, Post-Doctoral‌‌ Fellow, from Jul 2025]
Yiye Jiang‌ [UGA, Post-Doctoral‌ Fellow]
Mamadou Kanoute‌‌ [INRIA, Post-Doctoral Fellow, from Sep‌ 2025]
Tam Le‌ Minh [INRIA,‌‌ Post-Doctoral Fellow]
Rafael Mouallem Rosa [INRIA‌, Post-Doctoral Fellow]‌
Paul-Gauthier Noe [INRIA‌‌, Post-Doctoral Fellow, until Jan 2025]‌

PhD Students

Arturo Cabrera‌ Vazquez [INRIA]‌‌
Alice Chevaux [UGA]
Isabella Costa Maia‌ [UGA]
Luben‌ Miguel Cruz Cabezas [‌‌FAPESP, from Oct 2025]
Antoine Franchini‌ [INRIA]
Jacopo‌ Iollo [INRIA,‌‌ until Sep 2025]
Pearl Laveur [UGA‌]
Brice Marc [‌CEREMA]
Razan Mhanna‌‌ [UGA]
Geoffroy Oudoumanessah [POLE EMPLOI‌, from Dec 2025‌]
Geoffroy Oudoumanessah [‌‌INRIA, from Oct 2025 until Nov 2025‌]
Geoffroy Oudoumanessah [‌INSERM, until Sep‌‌ 2025]
Pierre-Louis Ruhlmann [INRIA]
Ababacar‌ Sembene [INRIA,‌ from Nov 2025]‌‌
Camille Touron [UGA]

Technical Staff

Adam‌ Fragkiadakis [INRIA,‌ Engineer, from Sep‌‌ 2025]
Jacopo Iollo [INRIA, Engineer‌, from Oct 2025‌]
Marc Saghiah [‌‌INRIA, Engineer, from Jul 2025]‌
Laurent Vallet [INRIA‌, Engineer, from‌‌ Dec 2025]

Interns and Apprentices

Mateo Amazo‌ [UGA, Intern‌, from Feb 2025‌‌ until Feb 2025]
Mateo Amazo [INRIA‌, Intern, until‌ Jan 2025]
Hani‌‌ Anouar Bourrous [INRIA, Intern, until‌ May 2025]
Ba‌ Khuong Dang [INRIA‌‌, Intern, from‌ May 2025 until Oct 2025]
Quentin Faye‌ [INPG SA, Intern, from Apr‌ 2025 until Jul 2025]
Rithy Sochet [‌INRIA, Intern, from Apr 2025 until‌ Sep 2025]

Administrative Assistants

Luce Coelho [‌INRIA]
Diane Courtiol [INRIA]
Marie-Anne‌ Dauphin-Rizzi [INRIA]
Julia Di Toro [‌INRIA]
Myriam Etienne [INRIA]
Nathalie‌ Gillot [INRIA]
Laura Leone [Randstad‌, from Aug 2025]
Helen Pouchot-Rouge-Blanc [‌INRIA]
Maria Immaculada Presseguer [INRIA]‌
Annie Simon [INRIA]

Visiting Scientists

Luben‌ Miguel Cruz Cabezas [UFScar, Brazil, from‌ Aug 2025]
Patrycja Scislewska [UNIV VARSOVIE‌, from Mar 2025 until Jul 2025]‌
Darren Wraith [UNIV QUEENSLAND, from Aug‌ 2025]

2 Overall objectives

The statify team‌ focuses on statistics. Statistics can be defined as‌ a science of variation where the main question‌ is how to acquire knowledge in the face‌ of variation. In the past, statistics were seen‌ as an opportunity to play in various backyards.‌ Today, the statistician sees his own backyard invaded‌ by data scientists, machine learners and other computer‌ scientists of all kinds. Everyone wants to do‌ data analysis and some (but not all) do‌ it very well. Generally, data analysis algorithms and‌ associated network architectures are empirically validated using domain-specific‌ datasets and data challenges. While winning such challenges‌ is certainly rewarding, statistical validation lies on more‌ fundamentally grounded bases and raises interesting theoretical, algorithmic‌ and practical insights. Statistical questions can be converted‌ to probability questions by the use of probability‌ models. Once certain assumptions about the mechanisms generating‌ the data are made, statistical questions can be‌ answered using probability theory. However, the proper formulation‌ and checking of these probability models is just‌ as important, or even more important, than the‌ subsequent analysis of the problem using these models.‌ The first question is then how to formulate‌ and evaluate probabilistic models for the problem at‌ hand. The second question is how to obtain‌ answers after a certain model has been assumed.‌ This latter task can be more a matter‌ of applied probability theory, and in practice, contains‌ optimization and numerical analysis.

The statify team aims‌ at bringing strengths, at a time when the‌ number of solicitations received by statisticians increases considerably‌ because of the successive waves of big data‌, data science and deep learning. The‌ difficulty is to back up our approaches with‌ reliable mathematics while what we have is often‌ only empirical observations that we are not able‌ to explain. Guiding data analysis with statistical justification‌ is a challenge in itself. statify has the‌ ambition to play a role in this task‌ and to provide answers to questions about the‌ appropriate usage of statistics.

Often statistical assumptions do‌ not hold. Under what conditions then can we‌ use statistical methods to obtain reliable knowledge? These conditions are rarely the‌ natural state of complex‌ systems. The central motivation‌‌ of statify is to establish the conditions under‌ which statistical assumptions and‌ associated inference procedures approximately‌‌ hold and become reliable.

However, as George Box‌ said "Statisticians and artists‌ both suffer from being‌‌ too easily in love with their models". To‌ moderate this risk, we‌ choose to develop, in‌‌ the team, expertise from different statistical domains to‌ offer different solutions to‌ attack a variety of‌‌ problems. This is possible because these domains share‌ the same mathematical food‌ chain, from probability and‌‌ measure theory to statistical modeling, inference and data‌ analysis.

Our goal is‌ to exploit methodological resources‌‌ from statistics and machine learning to develop models‌ that handle variability and‌ that scale to high‌‌ dimensional data while maintaining our ability to assess‌ their correctness, typically the‌ uncertainty associated with the‌‌ provided solutions. To reach this goal, the team‌ offers a unique range‌ of expertise in statistics,‌‌ combining probabilistic graphical models and mixture models to‌ analyze structured data, Bayesian‌ analysis to model knowledge‌‌ and regularize ill-posed problems, non-parametric statistics, risk modeling‌ and extreme value theory‌ to face the lack,‌‌ or impossibility, of precise modeling information and data.‌ In the team, this‌ expertise is organized to‌‌ target five key challenges:

1.
Models for high‌ dimensional, multimodal, heterogeneous data;‌
2.
Spatial (structured) data‌‌ science;
3.
Scalable Bayesian models and procedures;
4.‌
Understanding mathematical properties of‌ statistical and machine learning‌‌ methods;
5.
The big problem of small data.‌

The first two challenges‌ address sources of complexity‌‌ coming from data, namely, the fact that observations‌ can be: 1) high‌ dimensional, collected from multiple‌‌ sensors in varying conditions i.e. multimodal and heterogeneous‌ and 2) inter-dependent with‌ a known structure between‌‌ variables or with unknown interactions to be discovered.‌ The other three challenges‌ focus on providing reliable‌‌ and interpretable models: 3) making the Bayesian approach‌ scalable to handle large‌ and complex data; 4)‌‌ quantifying the information processing properties of machine learning‌ methods and 5) allowing‌ to draw reliable conclusions‌‌ from datasets that are too small or not‌ large enough to be‌ used for training machine/deep‌‌ learning methods.

These challenges rely on our four‌ research axes:

1.
Models‌ for graphs and networks;‌‌
2.
Dimension reduction and latent variable modeling;
3.‌
Bayesian modeling;
4.
Modeling‌ and quantifying extreme risk.‌‌

In terms of applied work, we will target‌ high-impact applications in neuroimaging,‌ environmental and earth sciences.‌‌

3 Research program

3.1 Models for graphs and‌ networks

Participants: Jean-Baptiste Durand‌, Florence Forbes,‌‌ Julyan Arbel, Sophie Achard, Michel Dojat‌, Julien Chevallier.‌

Keywords: graphical models, Markov‌‌ properties, hidden Markov models, clustering, missing data, mixture‌ of distributions, EM algorithm,‌ image analysis, Bayesian inference.‌‌

Graphs arise naturally as versatile structures for capturing‌ the intrinsic organization of‌ complex datasets. The literature‌‌ on graphical modeling is growing rapidly and covers‌ a wide range of‌ applications, from bioinformatics to‌‌ document modeling, image analysis,‌ social network analysis, etc. When faced with multivariate,‌ possibly high dimensional, data acquired at different sites‌ (or nodes) and structured according to an underlying‌ network (or graph), the objective is generally to‌ understand the dependencies or associations present in the‌ data so as to provide a more accurate‌ statistical analysis and a better understanding of the‌ phenomenon under consideration.

Structure learning.

This refers to‌ the inference of the existing dependences between variables‌ from observed samples. The limits of obtaining graph‌ edges using sample correlation between nodes is well‌ known. We have investigated alternative approaches, both Bayesian‌ and frequentist, the former were rather used to‌ account for constraints on the structure while for‌ the latter we focused on robust modeling and‌ estimation in presence of outliers. We proposed a‌ fast Bayesian structure learning based on pre-screening of‌ categorical variables, in the PhD thesis of T.‌ Rahier with Schneider Electric. In the continuous variable‌ case, we studied the design of tractable estimators‌ and algorithms that can provide robust estimation of‌ covariance structures. Many covariance estimation methods rely on‌ the Gaussian graphical model but a viable model‌ for data contaminated by outliers requires the use‌ of more robust and complex procedures and is‌ therefore more challenging to build. Then, the problem‌ of robust structure learning is especially acute in‌ the high-dimensional setting, in which the number of‌ variables $p$ is of the same order or‌ is much larger than the number of available‌ observations $n$ . We have investigated different ways‌ to handle both the above mentioned issues, in‌ order to provide models for application such as‌ modeling brain connectivity from functional magnetic resonance imaging‌ (fMRI) data. Each brain region is associated with‌ a time series, and the goal is to‌ study the connectivity among these regions. Interactions between‌ the regions can be described by covariance or‌ precision matrices that quantify the links between time‌ series and can then be represented as graphs.‌ We have first proposed an approach, initiated with‌ the PhD of K. Ashurbekova, to generalize the‌ Gaussian approach to multivariate heavy-tailed distributions with dimensionality‌ relatively larger than the number of observations. This‌ encompasses methods related to shrinkage and M-estimators for‌ which we aimed at designing algorithms with proved‌ convergence results and optimal values for shrinkage coefficients.‌ Second, still motivated by the brain connectivity application,‌ we have investigated in the PhD of H.‌ Lbath (QFunC project), the possibility to compute more‌ subtle correlations between brain regions using a new‌ notion of correlation of local averages. At last,‌ to go beyond the Gaussian assumption, we also‌ investigated copulas approaches or characterized graphical dependencies for‌ multivariate counts, with potential applications to branching processes.‌

Structure modelling.

Once the structure is identified, the‌ following questions are about comparing the discovered graph‌ structures together, or with regards to a reference‌ graph. If the structure is not itself the‌ object of consideration, the goal is usually to account for it in‌ a subsequent analysis. Except‌ for simple graphs (chains‌‌ or trees), this is problematic because mainstream statistical‌ models and algorithms are‌ based on the independence‌‌ assumption and become intractable for even moderate graph‌ sizes. The analysis of‌ graphs as the objects‌‌ of interest with the design of tools to‌ model and compare them‌ has been studied in‌‌ the PhD of L. Carboni. We proposed new‌ mathematical tools based on‌ equivalence relation between graph‌‌ statistics in order to be able to take‌ into account the location‌ in space of the‌‌ nodes. To account for dependences in a tractable‌ way we often rely‌ on Markov modelling and‌‌ variational inference. When dependence in time is considered,‌ Gaussian processes are an‌ interesting tractable tool. With‌‌ the PhD of A. Constantin, we have investigated‌ those in the context‌ of a collaboration with‌‌ INRAE and CNES in Toulouse, for the classification‌ and reconstruction of irregularly‌ sampled satellite image times‌‌ series. The proposed approach is able to deal‌ with irregular temporal sampling‌ and missing data directly‌‌ in the classification process. It is based on‌ Gaussian processes and allows‌ to perform jointly the‌‌ classification of the pixel labels as well as‌ the reconstruction of the‌ pixel time series. The‌‌ method complexity scales linearly with the number of‌ pixels, making it amenable‌ in large scale scenario.‌‌ In a different context, we have developed hidden‌ semi-Markov models for the‌ analysis of eye movements,‌‌ in particular with the PhD of B. Olivier‌ in collaboration with A.‌ Guérin-Dugué (GIPSA-lab) and B.‌‌ Lemaire (Laboratoire de Psychologie et Neurocognition). New coupling‌ methods for hidden semi-Markov‌ models driven by several‌‌ underlying state processes have been proposed.

Structured anomaly‌ detection.

The vast majority‌ of deep learning architectures‌‌ for medical image analysis are based on supervised‌ models requiring the collection‌ of large datasets of‌‌ annotated examples. Building such annotated datasets, which requires‌ skilled medical experts, is‌ time consuming and hardly‌‌ achievable, especially for some specific tasks, including the‌ detection of small and‌ subtle lesions that are‌‌ sometimes impossible to visually detect and thus manually‌ outline. This critical aspect‌ significantly impairs performances of‌‌ supervised models and hampers their deployment in clinical‌ neuroimaging applications, especially for‌ brain pathologies that require‌‌ the detection of small size lesions (e.g.‌ multiple sclerosis, microbleeds) or‌ subtle structural or morphological‌‌ changes (e.g. Parkinson's disease). We have developed‌ unsupervised anomaly detection methods‌ based on generalized Student‌‌ mixture models and deep statistical unsupervised learning model‌ for the detection of‌ early forms of Parkinson's‌‌ disease. We have also compared parametric mixture approaches‌ to non parametric machine‌ learning techniques for change‌‌ detection in the context of time series analysis‌ of glycemic curves for‌ diabetes.

3.2 Dimension reduction‌‌ and latent variable modeling

Participants: Jean-Baptiste Durand,‌ Florence Forbes, Stephane‌ Girard, Julyan Arbel‌‌, Pedro Luiz Coelho Rodrigues.

Keywords: mixture‌ of distributions, EM algorithm,‌ missing data, conditional independence,‌‌ statistical pattern recognition, clustering,‌ unsupervised and partially supervised learning.

Extracting information from‌ raw data is a complex task, all the‌ more so as this information is measured in‌ a high dimensional space. Fortunately, this information usually‌ lives in a subspace of smaller size. Identifying‌ this subspace is crucial but difficult. One approach‌ is to perform appropriate changes of representation that‌ facilitate the identification and characterization of the desired‌ subspace. Latent random variables are a key concept‌ to encode in a structured way representations that‌ are easier to handle and capture the essential‌ features of the data.

Regression in high dimensions.‌

Methods adapted to high dimensions include inverse regression‌ methods, i.e. SIR, partial least squares (PLS), approaches‌ based on mixtures of regressions with different variants,‌ e.g. Gaussian locally linear mapping (GLLiM) and extensions,‌ Mixtures of Experts, cluster weighted models, etc. SIR-like‌ methods are flexible in that they reduce the‌ dimension in a way optimal for the subsequent‌ regression task that can itself be carried out‌ by any desired regression tool. In that sense‌ these methods are said to be non parametric‌ or semi-parametric and they have a potential to‌ provide robust procedures. We have also proposed a‌ new approach, called Extreme-PLS, for dimension reduction in‌ conditional extreme values settings where the goal is‌ to best explain the extreme values of the‌ response variable.

Simulation-based inference (SBI) for high dimensional‌ inverse problems.

To account for uncertainty in a‌ principled manner, we also considered Bayesian inversion techniques.‌ We investigated the use of learning approaches to‌ handle Bayesian inverse problems in a computationally efficient‌ way when the observations to be inverted present‌ a moderately high number of dimensions and are‌ in large number. We proposed tractable inverse regression‌ approaches, based on GLLiM and normalizing flows. They‌ have the advantage to produce full probability distributions‌ as approximations of the target posterior distributions. These‌ distributions have several interesting features. They provide confidence‌ indices on the predictions and can be combined‌ with importance sampling or approximate Bayesian computation (ABC)‌ schemes for a better exploration when multiple equivalent‌ solutions exist. They generalise easily to variants that‌ can handle non Gaussian data, dependent or missing‌ observations. The relevance of the proposed approach has‌ been illustrated on synthetic examples and on two‌ real data applications, in the context of planetary‌ remote sensing and neuroimaging. In addition, we addressed‌ the issue of model selection for some of‌ the GLLiM models, i.e. Mixture of experts (MoE)‌ models and contributed to a number of theoretical‌ results.

Online and incremental inference.

Most SBI methods‌ scale poorly when the number of observations is‌ too large, which makes them unsuitable for modern‌ data, which are often acquired in real time,‌ in an incremental nature, and are often available‌ in large volume. Computation of inferential quantities in‌ an incremental manner may be forcibly imposed by‌ the nature of data acquisition (e.g. streaming‌ and sequential data) but may also be seen as a solution to‌ handle larger data volumes‌ in a more resource‌‌ friendly way, with respect to memory, energy, and‌ time consumption. To produce‌ feasible and practical online‌‌ algorithms for streaming data and complex models, we‌ have investigated the family‌ of stochastic approximation (SA)‌‌ algorithms combined with the class of majorization-minimization (MM)‌ and expectation-maximization (EM) algorithms‌ for a certain class‌‌ of models, e.g., exponential family distributions and‌ their mixtures.

3.3 Bayesian‌ modelling

Participants: Julyan Arbel‌‌, Florence Forbes, Jean-Baptiste Durand, Pedro‌ Coelho Rodrigues.

Keywords:‌ Bayesian statistics, Bayesian nonparametrics,‌‌ Markov Chain Monte Carlo, Experimental design, Bayesian neural‌ networks, Approximate Bayesian Computation.‌

Bayesian methods have become‌‌ the center of attraction to model the underlying‌ uncertainty of statistical models.‌ Bayesian models and methods‌‌ are already used in all of our other‌ axes, whenever the Bayesian‌ choice provides interesting features,‌‌ e.g. for model selection, dependence modeling (copulas), inverse‌ problems, etc. This axis‌ emphasizes more specifically our‌‌ theoretical and methodological research in Bayesian learning. In‌ particular, we will focus‌ on techniques referred to‌‌ as Bayesian nonparametrics (BNP).

Markov priors for Bayesian‌ nonparametric models.

We have‌ proposed Bayesian nonparametric priors‌‌ for hidden Markov random fields, first for continuous,‌ Gaussian observations with an‌ illustration in image segmentation.‌‌ Second, for discrete observed data typically issued from‌ counts, e.g. Poisson distributed‌ observations with an illustration‌‌ on risk mapping model. The inference was done‌ by Variational Bayesian Expectation‌ Maximization (VBEM).

Asymptotic properties‌‌ of BNP models.

A common way to assess‌ a Bayesian procedure is‌ to study the asymptotic‌‌ behavior of posterior distributions, that is their ability‌ to estimate a true‌ distribution when the number‌‌ of observations grows. Mixture models have attracted a‌ lot of attention in‌ the last decade due‌‌ to some negative results regarding the number of‌ clusters. More specifically, it‌ was shown that Bayesian‌‌ nonparametric mixture models are inconsistent for some choices‌ of priors. We proposed‌ ways to compute the‌‌ prior distribution of the number of clusters. This‌ is a notoriously difficult‌ task, and we proposed‌‌ approximations in order to enable such computations for‌ real-world applications. We studied‌ and justified BNP models‌‌ based on their asymptotic properties. We showed that‌ mixture models based on‌ many different BNP processes‌‌ are inconsistent in the number of clusters and‌ discuss possible solutions. Notably,‌ we showed that a‌‌ post-processing algorithm introduced for the simplest process (Dirichlet‌ process) extends to more‌ general models and provides‌‌ a consistent method to estimate the number of‌ components.

Amortized Approximate Bayesian‌ computation.

Approximate Bayesian computation‌‌ (ABC) has become an essential part of the‌ Bayesian toolbox for addressing‌ problems in which the‌‌ likelihood is prohibitively expensive or entirely unknown. A‌ key ingredient in ABC‌ is the choice of‌‌ a discrepancy that describes how different the simulated‌ and observed data are,‌ often based on a‌‌ set of summary statistics when the data cannot‌ be compared directly. The‌ choice of the appropriate‌‌ discrepancies is an active‌ research topic, which has mainly considered data discrepancies‌ requiring samples of observations or distances between summary‌ statistics. We have first investigated sample-based discrepancies and‌ established new asymptotic results using so-called energy-based distances.‌ We have then considered a summary-based approach and‌ proposed a new ABC procedure that can be‌ seen as an extension of the semi-automatic ABC‌ framework to a functional summary statistics setting and‌ can also be used as an alternative to‌ sample-based approaches. The resulting ABC approach also exhibits‌ amortization properties via the use of the GLLiM‌ inverse regression model.

Bayesian neural networks.

The connection‌ between Bayesian neural networks and Gaussian processes gained‌ a lot of attention in the last few‌ years, with the flagship result that hidden units‌ converge to a Gaussian process limit when the‌ layers width tends to infinity. Underpinning this result‌ is the fact that hidden units become independent‌ in the infinite-width limit. Our aim is to‌ shed some light on hidden units dependence properties‌ in practical finite-width Bayesian neural networks. In addition‌ to theoretical results, we assessed empirically the depth‌ and width impacts on hidden units dependence properties.‌ Hidden units are proven to follow a Gaussian‌ process limit when the layer width tends to‌ infinity. Recent work has suggested that finite Bayesian‌ neural networks may outperform their infinite counterparts because‌ they adapt their internal representations flexibly. To establish‌ solid ground for future research on finite-width neural‌ networks, our goal is to study the prior‌ induced on hidden units. Our main result is‌ an accurate description of hidden units tails which‌ shows that unit priors become heavier-tailed going deeper,‌ thanks to the introduced notion of generalized Weibull-tail.‌ This finding sheds light on the behavior of‌ hidden units of finite Bayesian neural networks.

3.4‌ Modelling and quantifying extreme risk

Participants: Julyan Arbel‌, Stephane Girard, Florence Forbes, Sophie‌ Achard, Jonathan El Methni.

Keywords: dimension‌ reduction, extreme value analysis, functional estimation.

Extreme events‌ have a major impact on a wide variety‌ of domains from environmental sciences (heat waves, flooding),‌ reliability, to finance and insurance (financial crashes, reinsurance).‌ While usual statistical approaches focus on the modeling‌ of the bulk of the distribution, extreme-value analysis‌ aims at building models adapted to distribution tails,‌ where by nature, observations are rare. Extreme value‌ analysis is a relatively recent domain in statistics‌ focusing on distribution tails.

Extreme quantile estimation.

One‌ of the most popular risk measures is the‌ Value-at-Risk (VaR) introduced in the 1990’s. In statistical‌ terms, the VaR at level $α \in (‌ 0, 1)$ corresponds to the upper‌ $α$ -quantile of the loss distribution. We have‌ proposed estimators and studied their theoretical properties for‌ extreme quantiles, that is when $α \to 0‌$ . We have also investigated Weissman extrapolation device‌ for estimating extreme quantiles from heavy-tailed distributions. This‌ is based on two estimators: an order statistic‌ to estimate an intermediate quantile and an estimator of the tail-index. The‌ common practice is to‌ select the same intermediate‌‌ sequence for both estimators. We showed how an‌ adapted choice of two‌ different intermediate sequences leads‌‌ to a reduction of the asymptotic bias associated‌ with the resulting refined‌ Weissman estimator. This new‌‌ bias reduction method is fully automatic and does‌ not involve the selection‌ of extra parameters.

New‌‌ measures of extreme risk.

A simple way to‌ assess the (environmental, industrial‌ or financial) risk is‌‌ to compute a measure linked to the value‌ of the phenomena of‌ interest (rainfall height, wind‌‌ speed, river flow). Candidate measures include quantiles (which‌ correspond to traditional Value‌ at Risk or return‌‌ levels), expectiles, tail conditional moments, spectral risk measures,‌ distorsion risk measures, etc.‌ We have mainly focused‌‌ on the first two measures, quantiles and expectiles,‌ and investigated estimation procedures‌ for extensions of these‌‌ measures. The main drawback of quantiles is that‌ they do not provide‌ a coherent risk measure.‌‌ Two distributions may have the same extreme quantile‌ but very different tail‌ behaviors. Moreover, standard estimators‌‌ do not use the most extreme values of‌ the sample and consequently‌ induce a loss of‌‌ information. Our strategy was to adapt the definition‌ of quantiles to take‌ into account the whole‌‌ distribution tail.

We have introduced new measures of‌ extreme risk based on‌ $L_{p} -$ quantiles‌‌ encompassing both expectiles and quantiles. We believe this‌ generalization of the concept‌ of extreme quantile to‌‌ extreme $L_{p} -$ quantile opens promising new‌ research directions. We have‌ first explored to what‌‌ extent univariate extreme-value estimators can be improved on‌ the basis of these‌ novel $L_{p} -‌‌$ quantiles. We built tractable estimators of these quantities‌ with guaranteed theoretical properties.‌

Extremes with covariates.

A‌‌ second challenge was to extend this concept to‌ the regression framework where‌ the variable of interest‌‌ depends on a set of covariates. When the‌ number of covariates is‌ large, two research directions‌‌ have been explored to overcome the curse of‌ dimensionality: 1) we designed‌ a dimension reduction method‌‌ for the extreme-value context, 2) we also considered‌ semi-parametric models to reduce‌ the complexity of the‌‌ fitted model.

Another challenge with expectiles is that‌ their sample versions do‌ not benefit from a‌‌ simple explicit form, making their analysis significantly harder‌ than that of quantiles‌ and order statistics. This‌‌ difficulty is compounded when one wishes to integrate‌ auxiliary information about the‌ phenomenon of interest through‌‌ a finite-dimensional covariate, in which case the problem‌ becomes the estimation of‌ conditional expectiles. We exploited‌‌ the fact that the expectiles of a distribution‌ are in fact the‌ quantiles of another distribution‌‌ explicitly linked to the former one, in order‌ to construct nonparametric kernel‌ estimators of extreme conditional‌‌ expectiles. We analyze the asymptotic properties of our‌ estimators in the context‌ of conditional heavy tailed‌‌ distributions. The extension to functional covariates was investigated.‌ Since quantiles and expectiles‌ belong to the wider‌‌ family of $L_{p‌} -$ quantiles, we also proposed to construct kernel‌ estimators of extreme conditional $L_{p} -$ quantiles.‌ We studied their asymptotic properties in the context‌ of conditional heavy-tailed distributions and we showed through‌ a simulation study that taking $p \in (‌ 1, 2)$ may allow to recover‌ extreme conditional quantiles and expectiles accurately.

We built‌ a general theory for the estimation of extreme‌ conditional expectiles in heteroscedastic regression models with heavy-tailed‌ noise. Our approach is supported by general results‌ of independent interest on residual-based extreme value estimators‌ in heavy-tailed regression models, and is intended to‌ cope with covariates having a large but fixed‌ dimension. We demonstrated how our results could be‌ applied to a wide class of important examples,‌ among which linear models, single-index models as well‌ as ARMA and GARCH time series models.

Extremes‌ and machine learning.

This is the topic of‌ a more recent collaboration with E. Gobet from‌ CMAP. Feedforward neural networks based on Rectified linear‌ units (ReLU) cannot efficiently approximate quantile functions which‌ are not bounded, especially in the case of‌ heavy-tailed distributions. We have thus proposed a new‌ parametrization for the generator of a Generative adversarial‌ network (GAN) adapted to this framework, basing on‌ extreme-value theory. We provided an analysis of the‌ uniform error between the extreme quantile and its‌ GAN approximation. It appears that the rate of‌ convergence of the error is mainly driven by‌ the second-order parameter of the data distribution. A‌ similar investigation has been conducted to simulate fractional‌ Brownian motion with ReLU neural networks.

4 Application‌ domains

4.1 Image Analysis

Participants: Florence Forbes,‌ Jean-Baptiste Durand, Stephane Girard, Pedro Coelho‌ Rodrigues, Sophie Achard, Michel Dojat.‌

As regards applications, several areas of image analysis‌ can be covered using the tools developed in‌ the team. More specifically, we have addressed various‌ issues in computer vision involving Bayesian modelling and‌ probabilistic clustering techniques. Other applications in medical imaging‌ are natural. We work more specifically on MRI‌ and functional MRI data, in collaboration with the‌ Grenoble Institute of Neuroscience (GIN). We also consider‌ other statistical 2D fields coming from other domains‌ such as remote sensing, in collaboration with the‌ Institut de Planétologie et d'Astrophysique de Grenoble (IPAG)‌ and the Centre National d'Etudes Spatiales (CNES).

4.2‌ Biology, Environment and Medicine

Participants: Florence Forbes,‌ Stephane Girard, Jean-Baptiste Durand, Julyan Arbel‌, Sophie Achard, Pedro Coelho Rodrigues,‌ Julien Chevallier, Michel Dojat, Jonathan El‌ Methni.

A second domain of applications concerns‌ biology and medicine. We considered the use of‌ mixture models to identify biomakers. We also investigated‌ statistical tools for the analysis of fluorescence signals‌ in molecular biology. Applications in neurosciences are also‌ considered. In the environmental domain, we considered the‌ modelling of high-impact weather events and the use‌ of hyperspectral data as a new tool for‌ quantitative ecology.

5 Social and environmental responsibility

5.1 Footprint of research activities‌

The footprint of our‌ research activities has not‌‌ been assessed yet. Most of the team members‌ have validated the “charte‌ d'éco-responsabilité” written by a‌‌ working group from Laboratoire Jean Kuntzmann, which should‌ have practical implications in‌ the near future.

5.2‌‌ Impact of research results

A lot of our‌ developments are motivated by‌ and target applications in‌‌ medicine and environmental sciences. As such they have‌ a social impact with‌ a better handling and‌‌ treatment of patients, in particular with brain diseases‌ or disorders. On the‌ environmental side, our work‌‌ has an impact on geoscience-related decision making with‌ e.g. extreme events risk‌ analysis, planetary science studies‌‌ and tools to assess biodiversity markers. However, how‌ to truly measure and‌ report this impact in‌‌ practice is another question we have not really‌ addressed yet.

6 Latest‌ software developments, platforms, open‌‌ data

6.1 Latest software developments

6.1.1 Planet-GLLiM

Name:‌
Planet-GLLiM
Keyword:
Inverse problem‌
Functional Description:
The application‌‌ implements the GLLiM statistical learning technique in its‌ different variants for the‌ inversion of a physical‌‌ model of reflectance on spectro-(gonio)-photometric data. The latter‌ are of two types:‌ 1. laboratory measurements of‌‌ reflectance spectra acquired according to different illumination and‌ viewing geometries, 2. and‌ 4D spectro-photometric remote sensing‌‌ products from multi-angular CRISM or Pléiades acquisitions.
URL:‌
https://gitlab.inria.fr/kernelo-mistis/planet-gllim-front-end/-/wikis/Home
Publications:
insu-03705153,‌ hal-02908364
Contact:
Sylvain Douté‌‌
Participant:
5 anonymous participants
Partner:
Institut de Planétologie‌ et d’Astrophysique de Grenoble‌

6.1.2 xLLiM (Kernelo)

Name:‌‌
xLLiM
Keywords:
Inverse problem, Clustering, Regression, Gaussian mixture,‌ Python, C++
Scientific Description:‌
Building a regression model‌‌ for the purpose of prediction is widely used‌ in all disciplines. A‌ large number of applications‌‌ consists of learning the association between responses and‌ predictors and focusing on‌ predicting responses for the‌‌ newly observed samples. In this work, we go‌ beyond simple linear models‌ and focus on predicting‌‌ low-dimensional responses using high-dimensional covariates when the associations‌ between responses and covariates‌ are non-linear.
Functional Description:‌‌
xLLiM is a Gaussian Locally-Linear Mapping (GLLiM) solver.‌ xLLiM provides a C++‌ library with Python bindings‌‌ for non linear mapping (non linear regression) using‌ a mixture of regression‌ model and an inverse‌‌ regression strategy. The methods include the GLLiM model‌ (Deleforge et al (2015)‌ ) based on Gaussian‌‌ mixtures.
URL:
https://xllim.gitlabpages.inria.fr/xllim/
Publications:
hal-04437626, hal-00863468,‌ hal-02908364
Contact:
Florence Forbes‌
Participant:
6 anonymous participants‌‌
Partner:
Institut de Planétologie et d’Astrophysique de Grenoble‌

7 New results

7.1‌ Models for graphs and‌‌ networks

7.1.1 Leaf Area estimation and Semantic segmentation‌ of forest point clouds‌ using neural networks.

Participants:‌‌ Jean-Baptiste Durand, Florence Forbes.

Joint work‌ with: Grégoire Vincent‌ and Yuchen Bai, IRD,‌‌ AMAP, Montpellier, France.

Tropical forests, covering only 7%‌ of the Earth’s land‌ surface, play a disproportionately‌‌ vital role in biosphere, storing 25% of the‌ terrestrial carbon and contribute‌ to over a third‌‌ of the global terrestrial productivity. They also recycle‌ about a third of‌ the precipitations through evapotranspiration‌‌ and thus contribute to‌ generate and maintain a humid climate regionally, with‌ positive effects also extending well beyond the tropics.‌ However, the seasonal variability in fluxes between tropical‌ rainforests and atmosphere is still poorly understood. Better‌ understanding the processes underlying flux seasonality in tropical‌ forests is thus critical to improve our predictive‌ ability on global biogeochemical cycles. Leaf area index‌ (LAI), a key parameter governing water and carbon‌ fluxes, is inadequately characterised, necessitating advances in monitoring‌ technologies such as aerial and terrestrial laser scanning‌ (LiDAR). In this work, we address key challenges‌ in quantifying leaf area in tropical forests using‌ LiDAR technology.

In a previous work, we developed‌ an end-to-end Deep Learning approach for semantic segmentation‌ of Unmanned Aerial Vehicle (UAV) Laser Scans (ULS)‌ in presence of two classes: wood and leaves.‌ This approach is referred to as SOUL and‌ was published at Neurips 2023.

A remaining challenge‌ was the analysis of various sources of uncertainty‌ and biases that affect LAI estimation from LiDAR‌ surveys. These biases include limitations in sensor sensitivity‌ (censoring), unknown clumping of targets, inadequate weighting of‌ multiple LiDAR returns, unknown leaf angle distribution, leaf‌ size, and the presence of woody components within‌ the canopy. Since there is currently no efficient‌ and comprehensive method to obtain the true LAI‌ of a forest plot, the study uses simulated‌ ULS data generated by the DART software based‌ on two forest mock-ups: Wytham Woods and RAMI-V‌ Järvselja Birch Stand. The simulated data mimics the‌ characteristics of real ULS data while providing full‌ access to details about the forest, particularly the‌ LAI. Among the various biases, woody components pose‌ a unique challenge because woody organ structure is‌ naturally different from the other sources of bias.‌ Therefore, our approach prioritises addressing this bias to‌ isolate and understand the individual contributions of other‌ factors of bais in LAI estimation. To eliminate‌ the impact of woody components, we propose a‌ robust protocol that combines the SOUL method with‌ AMAPVox, a ray tracing software. Once the woody‌ component bias removed, a quantitative analysis of the‌ remaining biases is conducted, laying the foundation for‌ future work in this area.

7.1.2 Graph modelling‌ for the study of language dynamics

Participants: Sophie‌ Achard.

Joint work with: Clément Guichet,‌ Monica Bacciu and Martial Mermillod from LPNC, Univ.‌ Grenoble Alpes.

In 21, we worked on‌ lifespan oscillatory dynamics in lexical production. Lexical production‌ performances have been associated with cognitive control demands‌ increase with age to support efficient semantic access,‌ thus suggesting an interplay between a domain-general and‌ a language-specific component. Current neurocognitive models suggest the‌ Default Mode Network (DMN) and Fronto-Parietal Network (FPN)‌ connectivity may drive this interplay, impacting the trajectory‌ of production performance with a pivotal shift around‌ midlife. However, the corresponding time-varying architecture still needs‌ clarification. Here, we leveraged MEG resting-state data from‌ healthy adults aged 18–88 years from a CamCAN‌ population-based sample. We found that DMN-FPN dynamics shift from anterior-ventral to posterior-dorsal‌ states until midlife to‌ mitigate word-finding challenges, concurrent‌‌ with heightened alpha-band oscillations. Specifically, sensorimotor integration along‌ this posterior path could‌ facilitate cross-talk with lower-level‌‌ circuitry as dynamic information flow with more anterior,‌ higher-order cognitive states gets‌ compromised. This suggests a‌‌ bottom-up, exploitation-based form of cognitive control in the‌ aging brain, highlighting the‌ interplay between abstraction, control,‌‌ and perceptive-motor systems in preserving lexical production.

7.1.3‌ Link between Graphs and‌ artificial neural networks

Participants:‌‌ Sophie Achard, Lucrezia Carboni.

Joint work‌ with: Michel Dojat‌ from GIN, Univ. Grenoble‌‌ Alpes

Artificial neural networks are prone to being‌ fooled by carefully perturbed‌ inputs which cause an‌‌ egregious misclassification. These adversarial attacks have been the‌ focus of extensive research.‌ Likewise, there has been‌‌ an abundance of research in ways to detect‌ and defend against them.‌ In 17, we‌‌ introduce a novel approach of detection and interpretation‌ of adversarial attacks from‌ a graph perspective. For‌‌ an input image, we compute an associated sparse‌ graph using the layer-wise‌ relevance propagation algorithm (Bach‌‌ et al., 2015). Specifically, we only keep edges‌ of the neural network‌ with the highest relevance‌‌ values. Three quantities are then computed from the‌ graph which are then‌ compared against those computed‌‌ from the training set. The result of the‌ comparison is a classification‌ of the image as‌‌ benign or adversarial. To make the comparison, two‌ classification methods are introduced:‌ (1) an explicit formula‌‌ based on Wasserstein distance applied to the degree‌ of node and (2)‌ a logistic regression. Both‌‌ classification methods produce strong results which lead us‌ to believe that a‌ graph-based interpretation of adversarial‌‌ attacks is valuable.

7.1.4 Benchmark for graph inference‌

Participants: Sophie Achard,‌ Alice Chevaux, Ali‌‌ Fakhar.

Joint work with: Kevin Polisano,‌ CNRS and Irène Gannaz,‌ Grenoble-INP.

In a series‌‌ of papers 30, 28, 29,‌ we propose to work‌ on the generation of‌‌ theoretical correlation matrices with specific sparsity patterns, associated‌ to graph structures. We‌ present a novel approach‌‌ based on convex optimization, offering greater flexibility compared‌ to existing techniques, notably‌ by controlling the mean‌‌ of the entry distribution in the generated correlation‌ matrices. This allows for‌ the generation of correlation‌‌ matrices that better represent realistic data and can‌ be used to benchmark‌ statistical methods for graph‌‌ inference.

7.1.5 Graphs for coma patients

Participants: Sophie‌ Achard, Michel Dojat‌, Arturo Cabrera Vazquez‌‌.

Joint work with: Stein Silva, CHU‌ Toulouse.

During the first‌ year of Arturo's PhD,‌‌ we developed several approaches to characterize the brain‌ connectivity of coma patients.‌ The originality of the‌‌ work is to use multimodal data combining both‌ fMRI and PET TSPO‌ with new graph methods‌‌ to combine graphs from the two modalities. This‌ work was presented in‌ different conferences 64,‌‌ 63, 62

7.1.6 Biological neural network

Participants:‌ Julien Chevallier.

Joint‌ work with: Eva‌‌ Löcherbach from Paris 1,‌ Guilherme Ost from UFRJ.

The main objective is‌ to estimate the connectivity parameter $p$ of a‌ biological neural network based only on the observation‌ of the action potentials of $N$ neurons over‌ $T$ time units. In our main result, we‌ show that $p$ can be estimated with rate‌ $N^{- 1 / 2} + N^{1‌ / 2} / T + {(log (‌ T) / T)}^{1 / 2‌}$ through an easy-to-compute estimator. Our analysis relies on‌ a precise study of the spatio-temporal decay of‌ correlations of the interacting chains. This is done‌ through the study of coalescing random walks defining‌ a backward regeneration representation of the system.

7.1.7‌ Community detection for binary graphical models in high‌ dimension

Participants: Julien Chevallier.

Joint work with‌: Guilherme Ost from UFRJ.

The main objective‌ is to find two the communities (one excitating‌ and one inhibiting) based on the observation of‌ the action potentials of $N$ neurons over $T‌$ time units. More specifically, we propose a simple‌ algorithm for which the probability of exact recovery‌ converges to 1 as long as $(N‌ / T^{1 / 2}) log (‌ N T) \to 0$ as $T$ and‌ $N$ diverge. Interestingly, this simple algorithm does not‌ required any prior knowledge on the other model‌ parameters (e.g. the edge probability $p$ ).

7.1.8‌ Contrastive Normalizing Flows for anomaly detection in Engineering‌ Structures

Participants: Florence Forbes, Brice Marc.‌

Joint work with: Philippe Fouchier and Pierre‌ Charbonnier from CEREMA endsum, Strasbourg.

Among unsupervised anomaly‌ detection methods in the context of civil engineering‌ (CE) monitoring, those using Normalizing Flows (NF) have‌ reached state-of-the-art performance. Using only defect-free images, they‌ learn to detect anomalies as elements departing from‌ the healthy parts distribution. In this work, we‌ propose to increase the discriminative power of these‌ methods by leveraging the possibility to produce synthetic‌ anomalies. Starting with CFlow-AD, one of the best-performing‌ NF-based methods, we augment its loss with different‌ complementary learning objectives using anomalies generated by POISSON‌ interpolation. In this work 32, we demonstrate‌ the interest of these new augmented losses on‌ several CE-related datasets.

7.1.9 Coupled hidden Markov and‌ semi-Markov processes

Participants: Jean-Baptiste Durand.

Joint work‌ with: Hanna Bacave, Nathalie Peyrard, Sandra Plancade‌ and Régis Sabbadin from MIAT INRAE - Unité‌ de Mathématiques et Informatique Appliquées de Toulouse; Alain‌ Franc from Biogeco INRAE, Bordeaux.

The concept of‌ multichain (H)SMM has not been already rigorously formalized,‌ even if a few models have been proposed‌ in the HMM literature. We achieved a review‌ on existing multichain HSMMs and proposed a sound‌ formalization of two classes of models that extend‌ standard and general semi-Markov models to the multichain‌ setting. Then, we addressed the hidden framework and‌ built various classes of multichain-H(S)MMs – M(H)SMMs –‌ that generalize some MHMM structures. A generative definition‌ based on hazard rates instead of probability distribution functions enabled us to‌ account for flexible interactions‌ between dynamics of observed‌‌ and hidden chains. Adaptation of these general classes‌ into models for practical‌ situations still raises challenges‌‌ in terms of inference, but also in terms‌ of parameterization. Indeed, the‌ dimension of the functions‌‌ (hazard rates and probability distribution functions) involved in‌ the multichain distribution increases‌ with the model richness.‌‌ Details in 71, 68.

—————————————

7.2‌ Latent variable modelling

7.2.1‌ Stochastic Majorization-Minimization with sample-average‌‌ approximation

Participants: Florence Forbes.

Joint work with‌: Hien Nguyen, School‌ of Computing, Engineering and‌‌ Mathematical Sciences, La Trobe Univ., Bundoora 3086, Victoria‌ Australia, and Institute of‌ Mathematics for Industry, Kyushu‌‌ Univ., Nishi Ward, Fukuoka 819-0395, Japan, Gersende Fort,‌ IMT and LAAS-CNRS, Université‌ de Toulouse, CNRS, Toulouse.‌‌

Many statistical inference and machine learning methods rely‌ on the ability to‌ optimize an expectation functional,‌‌ whose explicit form is intractable. The typical method‌ for conducting such optimization‌ is to approximate the‌‌ expected value problem by a size-N sample average,‌ often referred to as‌ sample average approximation (SAA)‌‌ or M-estimation. When the solution to the SAA‌ problem cannot be obtained‌ in closed form, the‌‌ Majorization-Minimization (MM) algorithm framework constitutes a broad class‌ of incremental optimization solutions,‌ relying on the iterative‌‌ construction of surrogates, known as majorizers, of the‌ original problem. The ability‌ to solve an SAA‌‌ problem depends on the availability of all N‌ observations, contemporaneously, which is‌ difficult when N is‌‌ large or data are observed as a stream.‌ In this work 19‌, we propose a‌‌ stochastic MM algorithm that solves the expected value‌ problem via iterative SAA‌ majorizer constructions using sequential‌‌ subsets of data, which we call Sequential Sample‌ Average Majorization-Minimization (SAM2). Compared‌ to previous stochastic MM‌‌ algorithm variants, our method permit an extended definition‌ of majorizers, and does‌ not rely on convexity‌‌ assumptions, smoothness assumptions, or restrictions on functional classes‌ for objectives and majorizers.‌ We develop a theory‌‌ of stochastic convergence for SAM2, made possible via‌ the presentation of a‌ novel double array uniform‌‌ strong law of large numbers. Examples of SAM2‌ algorithms are given along‌ with a numerical demonstration‌‌ of SAM2 to quantile regression problems, in the‌ regular and sparse parameter‌ settings, including both convex‌‌ and non-convex objective functions.

7.2.2 Natural Variational Annealing‌ for Multimodal Optimization

Participants:‌ Tam Le Minh,‌‌ Florence Forbes, Julyan Arbel.

Joint work‌ with: Emtiyaz Khan‌ and Thomas Mollenhoff from‌‌ Riken, Tokyo, Japan

We introduce a new multimodal‌ optimization approach called Natural‌ Variational Annealing (NVA) that‌‌ combines the strengths of three foundational concepts to‌ simultaneously search for multiple‌ global and local modes‌‌ of black-box nonconvex objectives. First, it implements a‌ simultaneous search by using‌ variational posteriors, such as,‌‌ mixtures of Gaussians. Second, it applies annealing to‌ gradually trade off exploration‌ for exploitation. Finally, it‌‌ learns the variational search distribution using natural-gradient learning‌ where updates resemble well-known‌ and easy-to-implement algorithms. The‌‌ three concepts come together‌ in NVA giving rise to new algorithms and‌ also allowing us to incorporate "fitness shaping", a‌ core concept from evolutionary algorithms. We assess the‌ quality of search on simulations and compare them‌ to methods using gradient descent and evolution strategies.‌ We also provide an application to a real-world‌ inverse problem in planetary science. More details in‌ 59. An extension to the situations where‌ only samples are available can be found in‌ 58.

7.2.3 Scalable magnetic resonance fingerprinting: Incremental‌ inference of high dimensional elliptical mixtures from large‌ data volumes

Participants: Florence Forbes, Geoffroy Oudoumanessah‌.

Joint work with: Luc Meyer from‌ SED, Michel Dojat, Thomas Coudert, Thomas Christen from‌ Grenoble Institute of Neurosciences, Carole Lartizien from Creatis.‌

Magnetic Resonance Fingerprinting (MRF) is an emerging technology‌ with the potential to revolutionize radiology and medical‌ diagnostics. In comparison to traditional magnetic resonance imaging‌ (MRI), MRF enables the rapid, simultaneous, non-invasive acquisition‌ and reconstruction of multiple tissue parameters, paving the‌ way for novel diagnostic techniques. In the original‌ matching approach, reconstruction is based on the search‌ for the best matches between in vivo acquired‌ signals and a dictionary of high-dimensional simulated signals‌ (fingerprints) with known tissue properties. A critical and‌ limiting challenge is that the size of the‌ simulated dictionary increases exponentially with the number of‌ parameters, leading to an extremely costly subsequent matching.‌ In this work, we propose to address this‌ scalability issue by considering probabilistic mixtures of high-dimensional‌ elliptical distributions, to learn more efficient dictionary representations.‌ Mixture components are modelled as flexible ellipitic shapes‌ in low dimensional subspaces. They are exploited to‌ cluster similar signals and reduce their dimension locally‌ cluster-wise to limit information loss. To estimate such‌ a mixture model, we provide a new incremental‌ algorithm capable of handling large numbers of signals,‌ allowing us to go far beyond the hardware‌ limitations encountered by standard implementations. We demonstrate, on‌ simulated and real data, that our method effectively‌ manages large volumes of MRF data with maintained‌ accuracy. It offers a more efficient solution for‌ accurate tissue characterization and significantly reduces the computational‌ burden, making the clinical application of MRF more‌ practical and accessible. This work has been presented‌ at the International Symposium on Biomedical Imaging (ISBI‌ 2025) 33 and published in Statistics and Computing‌ 60.

7.2.4 Assessing a dose-response relationship after‌ brain radiotherapy via Mixture of Regressions

Participants: Florence‌ Forbes.

Joint work with: Theo Sylvestre,‌ Sophie Ancelet from IRSN.

Brain radiotherapy (RT) is‌ one of the key tools in the treatment‌ of tumors of the central nervous system (CNS).‌ However, its potential toxicity to the CNS remains‌ one of the major research issues in radioprotection.‌ In particular, cognitive decline, which may significantly impair‌ the quality of life of long-term survivors, has‌ been reported in patients treated with RT for‌ a brain tumor. The intracerebral radiation-induced mechanisms that‌ could explain this cognitive decline are only partially understood. The EpiBrainRad project,‌ within which the doctoral‌ work of Theo Sylvestre‌‌ has been conducted, investigates the role that leukoencephalopathy‌ may play in these‌ mechanisms. It is based‌‌ on data from the EpiBrainRad cohort, which includes‌ patients treated with RT‌ for glioblastoma at Pitié-Salpêtrière‌‌ Hospital or at the Strasbourg Institute of Oncology.‌

The aim was to‌ demonstrate, if it exists,‌‌ and to estimate the association between the brain‌ dose and the spatio-temporal‌ progression of irreversible white‌‌ matter abnormalities characteristic of leukoencephalopathy, identified on MRI‌ as white matter hyperintensities‌ (WMH). It also seeks‌‌ to provide insights into the radiosensitivity of white‌ matter.

Embedded in the‌ ANR RADIO-AIDE project (itself‌‌ part of EpiBrainRad), this work relied primarily on‌ imaging data from a‌ sub-cohort of 50 patients‌‌ from the EpiBrainRad cohort. For each patient, a‌ dosimetric CT scan from‌ which a voxel-wise dose‌‌ map is extracted is available, along with a‌ longitudinal collection of MRIs‌ in which various brain‌‌ lesions are segmented.

Three main contributions were made:‌ 1) A preprocessing pipeline‌ for segmented MRIs is‌‌ proposed to make them suitable for estimating the‌ dose–response association of interest.‌ 2) Longitudinal intra-individual MRI‌‌ registration and inter-individual registration are performed to enable‌ a population-level voxel-wise analysis‌ on a common brain,‌‌ in the spirit of voxel-based studies. 3) An‌ algorithm is defined and‌ implemented to distinguish leukoencephalopathy‌‌ lesions (LL) from edema—both characterized on MRI by‌ WMH—and to correct for‌ brain deformations associated with‌‌ different lesions.

7.2.5 Massive analysis of multidimensional astrophysical‌ data by inverse regression‌ of physical models

Participants:‌‌ Florence Forbes.

Joint work with: Sylvain‌ Douté IPAG, Stan Borkowski‌ and Luc Meyer from‌‌ SED Grenoble

With the tremendous progress made in‌ AI, data acquisition and‌ processing are now possible‌‌ at a much larger scale. In earth and‌ space (E&S) science, although‌ wider and richer representations‌‌ are desirable to effectively and quantitatively characterize information,‌ we still struggle to‌ turn them into real-world‌‌ breakthroughs, partially due to data processing bottlenecks. Computationally‌ efficient modeling and inference‌ techniques have been developed‌‌ in order to meet computing resource constraints, energy‌ considerations and the inherent‌ complexity of algorithms. However,‌‌ most approaches are designed for batch data and‌ thus have limitations in‌ processing large amount of‌‌ data. It thus appears most timely to develop‌ the theory and practice‌ of a new form‌‌ of learning that targets potentially heterogeneous remote sensing‌ data that are both‌ large in size and‌‌ dimension, while providing quantitative and rigorous statements about‌ methods performance.

7.2.6 An‌ analysis of distributional reinforcement‌‌ learning with Gaussian mixtures

Participants: Florence Forbes,‌ Henrique Donancio, Mathis‌ Antonetti.

Distributional Reinforcement‌‌ Learning (DRL) seeks to optimize risk-sensitive objectives by‌ modeling the full return‌ distribution rather than only‌‌ its expectation. A key challenge is to choose‌ a return distribution representation‌ that allows (i) efficient‌‌ estimation of risk measures, (ii) tractable optimization, and‌ (iii) sufficient expressiveness. Gaussian‌ mixtures (GM) provide a‌‌ flexible and powerful representation‌ for this purpose, yet they remain underexplored in‌ DRL, with most existing methods relying on the‌ L $_{2}$ norm as a tractable metric between‌ GM. In this work 13, we conduct‌ a theoretical and empirical study of alternative metrics‌ for GM-based DRL. We show that the L‌ $_{2}$ norm is not suitable and introduce two‌ principled alternatives: a mixture-specific optimal transport distance (MW)‌ and a maximum mean discrepancy (MMD) distance. For‌ the MW metric, we establish convergence guarantees for‌ a dynamic programming algorithm related to temporal-difference (TD)‌ learning. Leveraging multivariate GM representations, we also highlight‌ the potential of MW in multi-objective RL. Experimental‌ results on selected Atari Learning Environment tasks illustrate‌ the practical benefits of the proposed metrics, showing‌ promising performance.

7.2.7 Dynamic Learning Rate for Deep‌ Reinforcement Learning: A Bandit Approach

Participants: Florence Forbes‌, Henrique Donancio.

Joint work with:‌ Leah South, Queensland University of Technology, Brisbane Australia‌ and Antoine Barrier, Grenoble Institute of Neuroscience.

In‌ Deep Reinforcement Learning models trained using gradient-based techniques,‌ the choice of optimizer and its learning rate‌ are crucial to achieving good performance: higher learning‌ rates can prevent the model from learning effectively,‌ while lower ones might slow convergence. Additionally, due‌ to the non-stationarity of the objective function, the‌ best-performing learning rate can change over the training‌ steps. To adapt the learning rate, a standard‌ technique consists of using decay schedulers. However, these‌ schedulers assume that the model is progressively approaching‌ convergence, which may not always be true, leading‌ to delayed or premature adjustments. In this work,‌ we propose dynamic Learning Rate for deep Reinforcement‌ Learning (LRRL), a meta-learning approach that selects the‌ learning rate based on the agent's performance during‌ training. LRRL is based on a multi-armed bandit‌ algorithm, where each arm represents a different learning‌ rate, and the bandit feedback is provided by‌ the cumulative returns of the RL policy to‌ update the arms' probability distribution. Our empirical results‌ demonstrate that LRRL can substantially improve the performance‌ of deep RL algorithms.

7.2.8 Bandits and sequential‌ learning

Participants: Julyan Arbel, Julien Zhou.‌

Joint work with: Pierre Gaillard (Inria Thoth),‌ Thibaud Rahier (Criteo AI Lab).

Bandit algorithms address‌ the exploration-exploitation trade-off by balancing learning about actions‌ and maximizing cumulative rewards, with applications in areas‌ like online advertising, recommendation systems, and A/B testing.‌ We improve existing regret bounds in two settings:‌ stochastic combinatorial semi-bandits, and online unconstrained submodular maximization‌ with stochastic bandit feedback 35.

7.2.9 Optimal‌ sub-Gaussian variance proxy

Participants: Julyan Arbel.

Joint‌ work with: Mathias Barreto (National Research University‌ Higher School of Economics, Moscow), Olivier Marchal (Institut‌ Camille Jordan, Lyon).

In 15, we establish‌ the optimal sub-Gaussian variance proxy for truncated Gaussian‌ and truncated exponential random variables. The proofs rely‌ on first characterizing the optimal variance proxy as‌ the unique solution to a set of two‌ equations and then observing that for these two truncated distributions, one may‌ find explicit solutions to‌ this set of equations.‌‌ Moreover, we establish the conditions under which the‌ optimal variance proxy coincides‌ with the variance, thereby‌‌ characterizing the strict sub-Gaussianity of the truncated random‌ variables. Specifically, we demonstrate‌ that truncated Gaussian variables‌‌ exhibit strict sub-Gaussian behavior if and only if‌ they are symmetric, meaning‌ their truncation is symmetric‌‌ with respect to the mean. Conversely, truncated exponential‌ variables are shown to‌ never exhibit strict sub-Gaussian‌‌ properties. These findings contribute to the understanding of‌ these prevalent probability distributions‌ in statistics and machine‌‌ learning, providing a valuable foundation for improved and‌ optimal modeling and decision-making‌ processes.

7.2.10 Mixed hidden‌‌ semi-Markov processes

Participants: Jean-Baptiste Durand.

Joint work‌ with: Nathalie Peyrard,‌ Sandra Plancade, Marie-Josée Cros,‌‌ Ronan Trépos and Mathieu Valdeyron from MIAT INRAE‌ - Unité de Mathématiques‌ et Informatique Appliquées de‌‌ Toulouse; Alain Franc from Biogeco INRAE, Bordeaux; Corentin‌ Lothodé, CNRS, Angers; Nicolas‌ Vergne and Caroline Bérard‌‌ from Université de Rouen Normandie; Irene Vosti from‌ Université de Lorraine, Metz.‌

Parameter estimation in hidden‌‌ semi-Markov processes is frequently addressed by the EM‌ algorithm or Newton iterative‌ algorithms. These rely on‌‌ the classical forward-backward recursion. When mixed effects are‌ incorporated in model parameters‌ (emission distributions, transition probabilities‌‌ and sojourn time distributions), integration of the forward-backward‌ formulas has to be‌ performed, leading to intractable‌‌ algorithms. As a consequence, further approximations have to‌ be achieved: for example‌ Monte-Carlo EM, Monte-Carlo Newton,‌‌ variational EM... We produced a state of the‌ art of available methods‌ used in hidden Markov‌‌ models (HMMs) and hidden semi-Markov models (HSMMs), with‌ a detailed report to‌ the restrictions associated with‌‌ each algorithm (for example: fixed effects only, random‌ effects in emission distributions‌ only, etc.). We also‌‌ provided a catalogue of available python and R‌ software, considering also plain‌ HSMMs and Multichain HMMS‌‌ (see also Section 7.1.9). Eventually, a new‌ MCEM algorithm was developed‌ to address the case‌‌ of HSMMs with mixed effects in all model‌ parameters (emission distributions, transition‌ probabilities and sojourn time‌‌ distributions), which has never been addressed before. Alternatives‌ are currently being studied‌ in M Valdeyron's doctoral‌‌ work. Details in 69, 70, 72‌.

7.3 Bayesian modelling‌

7.3.1 Convergence of projected‌‌ stochastic natural gradient variational inference for various step‌ size and sample or‌ batch size schedules

Participants:‌‌ Florence Forbes, Thomas Guilmeau.

Joint work‌ with: Hadrien Hendrickx‌ from THOTH team.

Stochastic‌‌ natural gradient variational inference (NGVI) is a popular‌ and efficient algorithm for‌ Bayesian inference. Despite empirical‌‌ success, the convergence of this method is still‌ not fully understood. In‌ this work, we define‌‌ and study a projected stochastic NGVI when variational‌ distributions form an exponential‌ family. Stochasticity arises when‌‌ either gradients are intractable expectations or large sums.‌ We prove new non-asymptotic‌ convergence results for combinations‌‌ of constant or decreasing step sizes and constant‌ or increasing sample/batch sizes.‌ When all hyperparameters are‌‌ fixed, NGVI is shown‌ to converge geometrically to a neighborhood of the‌ optimum, while we establish convergence to the optimum‌ with rates of the form $𝒪 (\frac{1}{{T‌}^{ρ}})$ , possibly with $ρ \geq 1$ ,‌ for all other combinations of step size and‌ sample/batch size schedules. These rates apply when the‌ target posterior distribution is close in some sense‌ to the considered exponential family. Our theoretical results‌ extend existing NGVI and stochastic optimization results and‌ provide more flexibility to adjust, in a principled‌ way, step sizes and sample/batch sizes in order‌ to meet speed, resources, or accuracy constraints. More‌ details can be found in the paper accepted‌ at AISTATS 2026.

7.3.2 Concentration results for approximate‌ Bayesian computation without identifiability

Participants: Florence Forbes,‌ Julyan Arbel.

Joint work with: Hien‌ Nguyen and Trung Tin Nguyen, University of Queensland,‌ Brisbane Australia.

We study the large sample behaviors‌ of approximate Bayesian computation (ABC) posterior measures in‌ situations when the data generating process is dependent‌ on unidentifiable parameters. In particular, we establish the‌ concentration of posterior measures on sets of arbitrarily‌ small measure that contain the equivalence set of‌ the data generative parameter, when the sample size‌ tends to infinity. Our theory also makes weak‌ assumptions regarding the measurement of discrepancy between the‌ data set and simulations. In particular, it does‌ not require the use of summary statistics and‌ is applicable to a broad class of kernelized‌ ABC algorithms. We provide useful illustrations and demonstrations‌ of our theory in practice, and offer a‌ comprehensive assessment of how our findings complement other‌ results in the literature

7.3.3 Diagnosing convergence of‌ Markov chain Monte Carlo

Participants: Julyan Arbel,‌ Stephane Girard.

Joint work with: A. Dutfoy‌ (EDF R&D) and T. Moins (Ecole Nationale des‌ Chartes, PSL).

Diagnosing convergence of Markov chain Monte‌ Carlo (MCMC) is crucial in Bayesian analysis. Among‌ the most popular methods, the potential scale reduction‌ factor (commonly named $\hat{R}$ ) is an‌ indicator that monitors the convergence of output chains‌ to a stationary distribution, based on a comparison‌ of the between- and within-variance of the chains.‌ Several improvements have been suggested since its introduction‌ in the 90'ss. In the PhD work of‌ Théo Moins, we analyse some properties of the‌ theoretical value $R$ associated to $\hat{R}$ in‌ the case of a localized version that focuses‌ on quantiles of the distribution. This leads to‌ proposing a new indicator 23, which is‌ shown to allow both for localizing the MCMC‌ convergence in different quantiles of the distribution, and‌ at the same time for handling some convergence‌ issues not detected by other $\hat{R}$ versions.‌

7.3.4 Bayesian deep learning

Participants: Julyan Arbel,‌ Pierre Wolinski.

25 studies feature propagation at‌ initialization in neural networks, which lies at the‌ root of numerous initialization designs. An assumption very‌ commonly made in the field states that the‌ pre-activations are Gaussian. Although this convenient Gaussian hypothesis can be justified when‌ the number of neurons‌ per layer tends to‌‌ infinity, it is challenged by both theoretical and‌ experimental works for finite-width‌ neural networks. Our major‌‌ contribution of this work is to construct a‌ family of pairs of‌ activation functions and initialization‌‌ distributions that ensure that the pre-activations remain Gaussian‌ throughout the network's depth,‌ even in narrow neural‌‌ networks. In the process, we discover a set‌ of constraints that a‌ neural network should fulfill‌‌ to ensure Gaussian pre-activations. Additionally, we provide a‌ critical review of the‌ claims of the Edge‌‌ of Chaos line of works and build an‌ exact Edge of Chaos‌ analysis. We also propose‌‌ a unified view on pre-activations propagation, encompassing the‌ framework of several well-known‌ initialization procedures. Finally, our‌‌ work provides a principled framework for answering the‌ much-debated question: is it‌ desirable to initialize the‌‌ training of a neural network whose pre-activations are‌ ensured to be Gaussian?‌

7.3.5 Bayesian Experimental Design‌‌ via Contrastive Diffusions.

Participants: Florence Forbes, Jacopo‌ Iollo.

Joint work‌ with: Pierre Alliez,‌‌ Inria Titane and Christophe Heinkele, Cerema Strasbourg.

Bayesian‌ Optimal Experimental Design (BOED)‌ is a powerful tool‌‌ to reduce the cost of running a sequence‌ of experiments. When based‌ on the Expected Information‌‌ Gain (EIG), design optimization corresponds to the maximization‌ of some intractable expected‌ contrast between prior and‌‌ posterior distributions. Scaling this maximization to high dimensional‌ and complex settings has‌ been an issue due‌‌ to BOED inherent computational complexity. In this work,‌ we introduce a pooled‌ posterior distribution with cost-effective‌‌ sampling properties and provide a tractable access to‌ the EIG contrast maximization‌ via a new EIG‌‌ gradient expression. Diffusion-based samplers are used to compute‌ the dynamics of the‌ pooled posterior and ideas‌‌ from bi-level optimization are leveraged to derive an‌ efficient joint sampling-optimization loop.‌ The resulting efficiency gain‌‌ allows to extend BOED to the well-tested generative‌ capabilities of diffusion models.‌ By incorporating generative models‌‌ into the BOED framework, we expand its scope‌ and its use in‌ scenarios that were previously‌‌ impractical. Numerical experiments and comparison with state-of-the-art methods‌ show the potential of‌ the approach. This work‌‌ has been accepted at ICLR 2025 31.‌

7.3.6 Active MRI Acquisition‌ with Diffusion Guided Bayesian‌‌ Experimental Design.

Participants: Florence Forbes, Jacopo Iollo‌, Geoffroy Oudoumanessah,‌ Michel Dojat.

Joint‌‌ work with: Carole Lartizien, Creatis Lyon.

A‌ key challenge in maximizing‌ the benefits of Magnetic‌‌ Resonance Imaging (MRI) in clinical settings is to‌ accelerate acquisition times without‌ significantly degrading image quality.‌‌ This objective requires a balance between under-sampling the‌ raw k-space measurements for‌ faster acquisitions and gathering‌‌ sufficient raw information for high-fidelity image reconstruction and‌ analysis tasks. To achieve‌ this balance, we propose‌‌ to use sequential Bayesian experimental design (BED) to‌ provide an adaptive and‌ task-dependent selection of the‌‌ most informative measurements. Measurements are sequentially augmented with‌ new samples selected to‌ maximize information gain on‌‌ a posterior distribution over‌ target images. Selection is performed via a gradient-based‌ optimization of a design parameter that defines a‌ subsampling pattern. In this work, we introduce a‌ new active BED procedure that leverages diffusion-based generative‌ models to handle the high dimensionality of the‌ images and employs stochastic optimization to select among‌ a variety of patterns while meeting the acquisition‌ process constraints and budget. So doing, we show‌ how our setting can optimize, not only standard‌ image reconstruction, but also any associated image analysis‌ task. The versatility and performance of our approach‌ are demonstrated on several MRI acquisitions

7.3.7 Simulation-based‌ inference using score-diffusion: algorithm and theoretical analysis

Participants:‌ Pedro Rodrigues, Julyan Arbel, Julia Linhart‌, Camille Touron.

Joint work with:‌ Gabriel Cardoso from École de Mines de Paris‌ and Sylvain Le Corff from Sorbonne Université and‌ Alexandre Gramfort from Meta.

Simulation-based inference (SBI) estimates‌ parameters of complex non-linear models with intractable likelihoods‌ by training generative models on simulated data to‌ approximate the posterior linking inputs to observations.

In‌ 42, we study the compositional score produced‌ by the GAUSS algorithm of 73 and establish‌ an upper bound on its mean squared error‌ in terms of both the individual score errors‌ and the number of observations. We illustrate our‌ theoretical findings on a Gaussian example, where all‌ analytical expressions can be derived in a closed‌ form.

7.3.8 Conformal prediction for simulation-based inference

Participants:‌ Pedro Rodrigues, Luben Miguel Cruz Cabezas.‌

Joint work with: Rafael Izbicki from UFScar,‌ Brazil.

Current experimental scientists have been increasingly relying‌ on simulation-based inference (SBI) to invert complex non-linear‌ models with intractable likelihoods. However, posterior approximations obtained‌ with SBI are often miscalibrated, causing credible regions‌ to undercover true parameters. We develop CP4SBI, a‌ model-agnostic conformal calibration framework that constructs credible sets‌ with local Bayesian coverage. Our two proposed variants,‌ namely local calibration via regression trees and CDF-based‌ calibration, enable finite-sample local coverage guarantees for any‌ scoring function, including HPD, symmetric, and quantile-based regions.‌ Experiments on widely used SBI benchmarks demonstrate that‌ our approach improves the quality of uncertainty quantification‌ for neural posterior estimators using both normalizing flows‌ and score-diffusion modeling 47.

7.3.9 Simulation-based inference‌ under model misspecification

Participants: Pedro Rodrigues, Florence‌ Forbes, Pierre-Louis Ruhlmann.

Joint work with‌: Michael Arbel from Inria (THOTH team).

Simulation-based‌ inference (SBI) is transforming experimental sciences by enabling‌ parameter estimation in complex non-linear models from simulated‌ data. A persistent challenge, however, is model misspecification:‌ simulators are only approximations of reality, and mismatches‌ between simulated and real data can yield biased‌ or overconfident posteriors. We address this issue by‌ introducing Flow Matching Corrected Posterior Estimation (FMCPE), a‌ framework that leverages the flow matching paradigm to‌ refine simulation-trained posterior estimators using a small set‌ of real calibration samples. Our approach proceeds in‌ two stages: first, a posterior approximator is trained‌ on abundant simulated data; second, flow matching transports its predictions toward the‌ true posterior supported by‌ real observations, without requiring‌‌ explicit knowledge of the misspecification. This design enables‌ FMCPE to combine the‌ scalability of SBI with‌‌ robustness to distributional shift. Across synthetic benchmarks and‌ real-world datasets, we show‌ that our proposal consistently‌‌ mitigates the effects of misspecification, delivering improved inference‌ accuracy and uncertainty calibration‌ compared to standard SBI‌‌ baselines, while remaining computationally efficient 61.

7.3.10‌ Simulation-based inference applied to‌ biology

Participants: Pedro Rodrigues‌‌, Julyan Arbel, Eloise Touron.

Joint‌ work with: Michael‌ Arbel from Inria (THOTH‌‌ team).

The chromatin folding and the spatial arrangement‌ of chromosomes in the‌ cell play a crucial‌‌ role in DNA replication and genes expression. An‌ improper chromatin folding could‌ lead to malfunctions and,‌‌ over time, diseases. For eukaryotes, centromeres are essential‌ for proper chromosome segregation‌ and folding. Despite extensive‌‌ research using de novo sequencing of genomes and‌ annotation analysis, centromere locations‌ in yeasts remain difficult‌‌ to infer and are still unknown in most‌ species. Recently, genome-wide chromosome‌ conformation capture coupled with‌‌ next-generation sequencing (Hi-C) has become one of the‌ leading methods to investigate‌ chromosome structures. Some recent‌‌ studies have used Hi-C data to give a‌ point estimate of each‌ centromere, but those approaches‌‌ highly rely on a good pre-localization. Here, we‌ present a novel approach‌ that infers in a‌‌ stochastic manner the locations of all centromeres in‌ budding yeast based on‌ both the experimental Hi-C‌‌ map and simulated contact maps 34.

7.3.11‌ Tutorial guide to simulation-based‌ inference

Participants: Pedro Rodrigues‌‌.

Joint work with: Thomas Moreau from‌ Inria (MIND team) and‌ several colleagues from Tuebingen‌‌ University.

In this tutorial, we provide a practical‌ guide for practitioners aiming‌ to apply SBI methods.‌‌ We outline a structured SBI workflow and offer‌ practical guidelines and diagnostic‌ tools for every stage‌‌ of the process – from setting up the‌ simulator and prior, choosing‌ and training inference networks,‌‌ to performing inference and validating the results. We‌ illustrate these steps through‌ examples from astrophysics, psychophysics,‌‌ and neuroscience. This tutorial empowers researchers to apply‌ state-of-the-art SBI methods, facilitating‌ efficient parameter inference for‌‌ scientific discovery 50 and 16.

7.4 Modelling‌ and quantifying extreme risk‌

7.4.1 Extreme events and‌‌ neural networks

Participants: Stephane Girard.

Joint work‌ with: M. Allouche (Kaiko)‌ and E. Gobet (CMAP,‌‌ Ecole Poytechnique).

Dealing with extreme values is a‌ major challenge in probabilistic‌ modeling, of great importance‌‌ in various application domains such as economics, engineering‌ and life sciences. In‌ the context of Generative‌‌ Modeling, it is known that models based on‌ transformations of light-tailed distribution,‌ such as Generative Adversarial‌‌ Networks (GANs), fail to capture the behaviour in‌ the tails. In particular,‌ these models are not‌‌ able to capture the dependence in extreme regions.‌ In 20, we‌ study a modified version‌‌ of the GAN algorithm, where the input is‌ a heavy-tailed distribution (and‌ we call it HTGAN).‌‌ Recalling the stable tail‌ dependence function (stdf), a tool from extreme-value theory‌ that measures the dependence structure in extreme regions,‌ we provide a bound on the approximation of‌ the stdf of the target with the output‌ of a HTGAN.This bound scales as $N^{-‌ 1) / ( d - 1)‌}$ , where $N$ is the dimension of the‌ input noise of the network and $d$ is‌ the dimension of the data of interest. This‌ suggests increasing the dimension of the latent noise‌ to gain precision in the estimation of dependence.‌ We perform experiments, comparing HTGAN with a classical‌ light-tailed GAN (LTGAN) on both synthetic and real‌ datasets exhibiting heavy-tailed characteristics. These experiments confirm our‌ theoretical findings: First, the HTGAN algorithm is better‌ at reproducing dependence in extremes than LTGAN. Second,‌ we show that the quality of approximation gets‌ better as the dimension of the latent noise‌ increases.

In 43, we investigate the use‌ of generative methods based on neural networks to‌ simulate extreme events. Although very popular, these methods‌ are mainly invoked in empirical works. Therefore, providing‌ theoretical guidelines for using such models in extreme‌ values context is of primal importance. To this‌ end, we propose an overview of most recent‌ generative methods dedicated to extremes, giving some theoretical‌ and practical tips on their tail behaviour thanks‌ to both extreme-value and copula tools. Additionally, 11‌ devises a novel neural-inspired approach for simulating multivariate‌ extremes. Specifically, we propose a GAN-based generative model‌ for sampling multivariate data exceeding large thresholds, giving‌ rise to what we refer to as the‌ ExceedGAN algorithm. Our approach is based on approximating‌ marginal log-quantile functions using feedforward neural networks with‌ eLU activation functions specifically introduced for bias correction.‌ An error bound is provided on the margins,‌ assuming a Jth order condition from extreme value‌ theory. The numerical experiments illustrate that ExceedGAN outperforms‌ competitors, both on synthetic and real-world data sets.‌ This work is submitted for publication.

in 12‌, we propose new parametrizations for neural networks‌ in order to estimate Expected Shortfall and Conditional‌ Tail Moments in heavy-tailed settings. The proposed neural‌ network estimators feature a bias correction based on‌ an extension of the usual second-order condition to‌ an arbitrary order. The convergence rate of the‌ uniform error between extreme log-quantiles and their neural‌ network approximation is established. The finite sample performances‌ of the non-conditional neural network estimator are compared‌ to other bias-reduced extreme-value competitors on simulated data.‌ It is shown that our method outperforms them‌ in difficult heavy-tailed situations where other estimators almost‌ all fail.

7.4.2 Estimation of extreme risk measures‌

Participants: Jonathan El Methni, Antoine Franchini,‌ Stephane Girard.

Joint work with: M. Allouche‌ (Kaiko) and A. Dutfoy (EDF).

Most of extrapolation‌ methods dedicated to the estimation of extreme risk‌ measures rely on the approximation of the excesses‌ distribution above a high threshold by a Generalized‌ Pareto Distribution (GPD). In 51, we propose an alternative to the‌ GPD, called the Refined‌ Pareto Distribution (RPD), which‌‌ allows for a second-order approximation of the excesses‌ distribution. The parameters of‌ the RPD are estimated‌‌ using an Approximate Bayesian Computation (ABC) method, and‌ reduced-bias estimators of extreme‌ risk measures are then‌‌ derived together with the associated credible intervals. The‌ ABC estimator demonstrates good‌ performance over a wide‌‌ range of heavy-tailed distributions. Its usefulness is also‌ illustrated on two data‌ sets of insurance claims.‌‌ The results are submitted for publication.

The celebrated‌ Weissman estimator provides a‌ simple way to compute‌‌ extreme quantiles, lying outside the observation range, from‌ heavy-tailed distributions. Asymptotic confidence‌ intervals can also be‌‌ built basing on its asymptotic normality, but they‌ may suffer from poor‌ coverage properties in practice.‌‌ In the context of the PhD thesis of‌ Antoine Franchini, we propose‌ several higher order approximations‌‌ of the Weissman estimator asymptotic distribution together with‌ a data-driven procedure to‌ automatically select the most‌‌ appropriate one. The usefulness of the associated adaptive‌ confidence interval is illustrated‌ on an intensive simulation‌‌ study as well as on two climatic and‌ financial data sets. The‌ results are submitted for‌‌ publication 54.

In 18, we address‌ the estimation of extreme‌ quantiles of Weibull tail-distributions.‌‌ Since such quantiles are asymptotically larger than the‌ sample maximum, their estimation‌ requires extrapolation methods. In‌‌ the case of Weibull tail-distributions, classical extreme-value estimators‌ are numerically outperformed by‌ estimators dedicated to this‌‌ set of light-tailed distributions. The latter estimators of‌ extreme quantiles are based‌ on two key quantities:‌‌ an order statistic to estimate an intermediate quantile‌ and an estimator of‌ the Weibull tail-coefficient used‌‌ to extrapolate. The common practice is to select‌ the same intermediate sequence‌ for both estimators. We‌‌ show how an adapted choice of two different‌ intermediate sequences leads to‌ a reduction of the‌‌ asymptotic bias associated with the resulting refined estimator.‌ This analysis is supported‌ by an asymptotic normality‌‌ result associated with the refined estimator. A data-driven‌ method is introduced for‌ the practical selection of‌‌ the intermediate sequences and our approach is compared‌ to three estimators of‌ extreme quantiles dedicated to‌‌ Weibull tail-distributions on simulated data. An illustration on‌ a real data set‌ of daily wind measures‌‌ is also provided.

7.4.3 Estimation of extreme inequality‌ measures

Participants: Jonathan El‌ Methni, Stephane Girard‌‌, Pearl Laveur.

Inequality indices provide a‌ quantitative framework for measuring‌ disparity within a distribution,‌‌ particularly in wealth or income. First, we introduce‌ a unified family of‌ inequality indices that encompasses‌‌ several classical ones, including Gini, Atkinson, extended Gini,‌ Bonferroni and Mehran indices.‌ Second, we prove, under‌‌ appropriate conditions, that indices within this family satisfy‌ six axioms widely accepted‌ in the literature. Third,‌‌ two general estimators are proposed for this class‌ and their asymptotic normality‌ is established under mild‌‌ assumptions. Besides, it has been observed that the‌ Gini index is robust‌ to changes in the‌‌ highest incomes. Leveraging extreme-value‌ theory, we prove a feature shared by the‌ entire family: Non-discrimination of tail behaviours in terms‌ of maximum domains of attraction. Notably, this property‌ also holds for several alternatives to the Gini‌ index, including those previously cited. These results are‌ illustrated both on simulated data and on a‌ real income data set. The results are submitted‌ for publication 52.

7.4.4 Changepoint identification in‌ heavy-tailed distributions

Participants: Stephane Girard.

Joint work‌ with: T. Opitz (INRAe Avignon), A. Usseglio-Carleve (Univ.‌ Avignon) and C. Yan (Univ. Michigan).

The problem‌ of detecting the existence of a changepoint in‌ a data sequence and of identifying its position‌ is challenging when the focus is on extreme‌ events and the distribution of data is heavy-tailed.‌ In this setting, we propose a robust semi-parametric‌ approach to changepoint identification that does not require‌ the likelihood function. The changepoint is estimated as‌ the position of the maximum of a statistic‌ presented in 53 and inspired by classical ANOVA‌ to contrast the tail behavior of data to‌ the left and right of all changepoint candidates.‌ It is shown that the estimator is asymptotically‌ consistent under mild assumptions. In numerical experiments, the‌ novel method shows reliable finite-sample behavior for various‌ simulation settings and is very competitive in comparison‌ to alternative changepoint identification approaches from the literature,‌ especially for small sample sizes. Finally, the utility‌ of the method is highlighted by identifying interpretable‌ changepoints in three real-data applications: very large motor‌ insurance claim amounts for a French administrative region‌ with age as covariate; daily Bitcoin cryptocurrency price‌ data (January 2018 – February 2025) and daily‌ log-returns of stocks of the Boeing company (March‌ 2015 – March 2025) both with time as‌ covariate. This work is submitted for publication 55‌.

7.4.5 Dimension reduction for extremes

Participants: Stephane‌ Girard.

Joint work with: C. Pakzad (Univ.‌ Paris-Nanterre).

In the context of the PhD thesis‌ of Meryem Bousebata, we proposed a new approach,‌ called Extreme-PLS (EPLS), for dimension reduction in regression‌ and adapted to distribution tails. The objective is‌ to find linear combinations of predictors that best‌ explain the extreme values of the response variable‌ in a non-linear inverse regression model. In 56‌, we extend the approach to more realistic‌ data settings where both serial correlation and missing-ness‌ occur. Specifically, we consider a single-index inverse regression‌ model under heavy-tailed conditions and introduce a Missing-at-Random‌ (MAR) mechanism acting on the covariates, whose probability‌ depends on the extremeness of the response. The‌ asymptotic behavior of the proposed estimator is established‌ within an $α$ -mixing framework, leading to consistency‌ results under regularly varying tails. Extensive Monte-Carlo experiments‌ covering eleven dependence schemes (including ARMA, GARCH, and‌ nonlinear ESTAR processes) demonstrate that the method performs‌ robustly across a wide range of heavy-tailed and‌ dependent scenarios, even when substantial portions of data‌ are missing. A real-world application to environmental data‌ further confirms the method's capacity to recover meaningful tail directions. The results‌ are submitted for publication.‌

Finally, the EPLS method‌‌ is extended to the functional framework in 57‌, to tackle the‌ case of functional covariates.‌‌ The results are submitted for publication.

8 Bilateral‌ contracts and grants with‌ industry

8.1 Bilateral contracts‌‌ with industry

Participants: Stephane Girard.

Contract with‌ EDF (2024-2027).
Stephane Girard‌ is the advisor of‌‌ the PhD thesis of Antoine Franchini funded by‌ EDF. The goal is‌ to investigate sensitivity analysis‌‌ and extrapolation limits in extreme-value theory. The financial‌ support for statify is‌ of 50K euros.

9‌‌ Partnerships and cooperations

9.1 International initiatives

9.1.1 Inria‌ associate team not involved‌ in an IIL or‌‌ an international program

WOMBAT

Title:
Variance-reduced Optimization Methods‌ and Bayesian Approximation Techniques‌ for scalable inference
Duration:‌‌
2023 ->
Coordinator:
Hien Duy Nguyen (h.nguyen5@latrobe.edu.au)
Partners:‌
- Trobe University de Melbourne‌ (Australie)
Inria contact:
Florence‌‌ Forbes
Summary:
Many inferential tools, such as machine‌ learning algorithms and statistical‌ models, require the estimation‌‌ of model parameters, structures, quantities, and properties, from‌ data. In practice, it‌ is common that model‌‌ characterizations are available through high-fidelity simulations of the‌ data generating processes, but‌ only through “black-boxes” that‌‌ are poorly suited for optimization under uncertainty or‌ conventional statistical inference procedures.‌ The main statistical challenge‌‌ is that model likelihoods are typically intractable or‌ unavailable in closed form.‌ Approaches suited for these‌‌ scenarios are typically referred to as likelihood-free or‌ simulation-based inference (SBI) methods,‌ and have received a‌‌ great deal of attention in recent years, with‌ momentum coming from mixing‌ of ideas from the‌‌ interface between statistics and machine learning. However, most‌ SBI methods scale poorly‌ when the number of‌‌ observations is too large, which makes them unsuitable‌ for modern data, which‌ are often acquired in‌‌ real time, in an incremental nature, and are‌ often available in large‌ volume. Computation of inferential‌‌ quantities in an incremental manner may be forcibly‌ imposed by the nature‌ of data acquisition (e.g.‌‌ streaming and sequential data) but may also be‌ seen as a solution‌ to handle larger data‌‌ volumes in a more resource friendly way, with‌ respect to memory, energy,‌ and time consumption. To‌‌ produce feasible and practical online algorithms for streaming‌ data and complex models,‌ we propose to study‌‌ the family of stochastic approximation (SA) algorithms. The‌ overall goal of the‌ project is to combine‌‌ recent ideas from the SBI and SA literature,‌ to propose efficient methods‌ for handling complex inferential‌‌ problems. We shall demonstrate our approaches via applications‌ to problems in challenging‌ domains, such as Magnetic‌‌ Resonance Imaging (MRI) or road network management as‌ initial targets. So doing,‌ we hope to achieve‌‌ both breakthroughs in applied methodology and the development‌ of new SBI and‌ SA techniques that wide-spread‌‌ applicability.

9.2 International research visitors

9.2.1 Visits of‌ international scientists

Other international‌ visits to the team‌‌

Darren Wraith

Status
Associate Professor
Institution of origin:‌
QUT
Country:
Australia
Dates:‌
mid-August 2025- mid-February 2026‌‌
Context of the visit:‌
Beyond Gaussian mixtures for inverse problems and simulation-based‌ inference.
Mobility program/type of mobility:
sabbatical research stay‌

Adam Bretherton

Status
PhD student
Institution of origin:‌
QUT, Brisbane
Country:
Australia
Dates:
mid-May - mid-June‌ 2025
Context of the visit:
Simulation-based inference and‌ Bayesian models
Mobility program/type of mobility:
research stay‌ in the context of the Associate Team Wombat‌

9.2.2 Visits to international teams

Research stays abroad‌

Razan Mhanna

Visited institution:
Brigham Young University (BYU)‌
Country:
USA
Dates:
May-June 2025
Context of the‌ visit:
Graph kernels for brain network analysis
Mobility‌ program/type of mobility:
research stay funded by "bourse‌ IDEX aide à la mobilité internationale sortantes des‌ doctorant.e.s de l’Université Grenoble Alpes"

9.3 National initiatives‌

Participants: Jonathan El Methni, Jean-Baptiste Durand,‌ Florence Forbes, Julyan Arbel, Sophie Achard‌, Stephane Girard, Pedro Luiz Coelho Rodrigues‌.

Jonathan El Methni and Stephane Girard were‌ awarded 5K euros and a PhD funding via‌ the IRGA call from Université Grenoble-Alpes, 2024–2027.

ANR‌

An ANR project RADIO-AIDE (2022-26) for Radiation induced‌ neurotoxicity assessed by Spatio-temporal modeling and AI after‌ brain radiotherapy coordinated by S.Ancelet from IRSN has‌ been granted for 4 years starting from April‌ 2022. It involves statify, Grenoble Insitute of‌ Neurosciences, Pixyl, ICANS, APHP, ICM and ENS P.Saclay.‌ The available funding for statify is 94K euros.‌
ANR project PEG2 (2022-26) on Predictive Ecological Genomics:‌ statify is involved in this 4-year project recently‌ accepted in July 2022. The PI is prof.‌ Olivier Francois who spent 2 years (2021-22) in‌ the team on a Delegation position.
Julyan Arbel‌ is coPI of the Bayes-Duality project launched with‌ a funding of $2.76 millions by Japan JST‌ - French ANR for a total of 5‌ years starting in October 2021. The goal is‌ to develop a new learning paradigm for Artificial‌ Intelligence that learns like humans in an adaptive,‌ robust, and continuous fashion. On the Japan side‌ the project is led by Mohammad Emtiyaz Khan‌ as the research director, and Kenichi Bannai and‌ Rio Yokota as Co-PIs.
Statify is involved in‌ the 4-year ANR project EXSTA “EXtremes, STatistical learning‌ and Applications” (2024-2028) hosted by Paris-Sorbonne University. Extreme‌ Value Theory is the branch of probability and‌ statistics dedicated to rare events associated with tails‌ of distributions, with numerous applications in various scientific‌ fields where extreme events are of particular importance,‌ and in risk management. Recent years have seen‌ the development of a theoretical framework inspired by‌ statistical learning theory and algorithms adapted from machine‌ learning for the analysis of extremes, in line‌ with the statistical community's growing interest in high-dimensional‌ problems and the increasing availability of large-scale data‌ sets. The aim of the project is to‌ reinforce these emerging directions and encourage interaction between‌ theory and practice. The consortium brings together statisticians‌ whose research topics cover a wide spectrum, from‌ mathematical statistics and learning theory to operational applications‌ in climate and environmental sciences and industry.
Pedro Luiz Coelho Rodrigues is‌ co-PI of the SBI4C‌ project of the MIAI‌‌ AI Cluster, under the reference ANR-23-IACL-0006. The project‌ started in September 2025‌ and has a four‌‌ years duration with 400k euros of funding. Other‌ laboratories involved are the‌ Laboratoire d'Informatique de Grenoble‌‌ (LIG) and the Institut des Géosciences de l'Environnement‌ (IGE). More details at‌ link.

PEPR Digital‌‌ Health

Florence Forbes and Sophie Achard are involved‌ in the REWIND project‌ (2023-2028), pRecision mEdicine WIth‌‌ loNgitudinal Data. The goal is to develop models‌ for longitudinal for understanding‌ the progression of chronic‌‌ diseases.

France Life Imaging (FLI)

Funding from “comité”‌ de pilotage national du‌ Réseau d’Expertise « Traitement‌‌ et Analyse en Imagerie Multimodale » (RE4) de‌ l’Infrastructure France Life Imaging‌ (FLI) for a project‌‌ entitled « Détection d’Anomalies en Imagerie Médicale par‌ apprentissage faiblement Supervisé ».‌ Joint project with Carole‌‌ Lartizien and Michel Dojat.

9.3.1 Networks

MSTGA and‌ AIGM INRAE (French National‌ Institute for Agricultural Research)‌‌ networks: F. Forbes and J.B Durand are members‌ of the INRAE network‌ called AIGM (ex MSTGA)‌‌ network since 2006, website, on Algorithmic issues‌ for Inference in Graphical‌ Models. It is funded‌‌ by INRAE MIA and RNSC/ISC Paris. This network‌ gathers researchers from different‌ disciplines. Statify co-organized and‌‌ hosted 2 of the network meetings in 2008‌ and 2015 in Grenoble.‌

10 Dissemination

10.1 Promoting‌‌ scientific activities

10.1.1 Scientific events: organisation

Member of‌ the organizing committees

Jean-Baptiste‌ Durand , MaSeMo :‌‌ Markov, Semi-Markov Models and Associated Fields (from Theory‌ to Application and back),‌ 1-4 July 2025, Paris,‌‌ France
Florence Forbes co-organized with Xun Huan (University‌ of Michigan) and Youssef‌ Marzouk (Massachusetts Institute of‌‌ Technology), a special session on Bayesian experimental design‌ at the MCM25 conference‌ in Chicago.

10.1.2 Scientific‌‌ events: selection

Member of the conference program committees‌

Julyan Arbel: Area Chair‌ for AISTATS.

Reviewer

Julyan‌‌ Arbel: Area Chair for ICML.

10.1.3 Journal

Member‌ of the editorial boards‌

Stephane Girard : Associate‌‌ Editor for Revstat and Dependence Modelling.
Julyan Arbel:‌ Associate Editor for Statistics‌ and Computing, Bayesian Analysis,‌‌ Australian and New Zealand Journal of Statistics, Statistics‌ & Probability Letters, Statistical‌ Methods & Applications.

Reviewer‌‌ - reviewing activities

Stephane Girard: Extremes, Electronic Journal‌ of Statistics.
Jonathan El‌ Methni: Journal of Statistical‌‌ Software.
Jean-Baptiste Durand: Statistics and Computing.
Julyan Arbel:‌ Annals of Applied Probability,‌ Applied Probability Journal, Extremes,‌‌ Journal of Machine Learning Research (x2), Journal of‌ the Royal Statistical Society‌ series B, Statistical Science‌‌ (x2).

10.1.4 Invited talks

Stephane Girard : Invited‌ talk at the annual‌ general assembly of the‌‌ 'PEPR Climat – TRACCS'
Jonathan El Methni :‌ Exposé invité au séminaire‌ GAIA de l'Université Grenoble‌‌ Alpes. Sur les traces des premiers graphiques statistiques,‌ décembre 2025.
Julyan Arbel:‌ Keynote talk at Journées‌‌ de Statistique de la SFdS, Marseille. Invited talks‌ at 14th International Conference‌ on Bayesian Nonparametrics, UCLA‌‌ (USA); Royal Statistical Society (RSS) Conference, Edinburgh (UK);‌ All About That Seminar,‌ Institut Henri Poincaré, Paris;‌‌ Recent Advances in Machine‌ Learning, Aussois.
Florence Forbes: invited talk at IABM‌ in March 2025, at the Model Based Clustering‌ workshop in July 2025 both in Nice, at‌ the GeNU workshop in September 2025 in Copenhaguen.‌
Jacopo Iollo: invited talk at the Isaac Newton‌ Institute in Cambridge UK and at the ASA/IMS‌ Spring Research Conference, New York, both in June‌ 2025, at the MCM25 conference in Chicago USA‌ in August 2025.

10.1.5 Leadership within the scientific‌ community

Stephane Girard: Member of the ELLIS Society‌ (European Laboratory for Learning and Intelligent Systems) since‌ 2025.
Julyan Arbel: Member of the ELLIS Society‌ (European Laboratory for Learning and Intelligent Systems) since‌ 2020. Member of Data Science axis Committee of‌ Persyval Labex, Grenoble.

10.1.6 Research administration

Jean-Baptiste Durand‌ , Member of the INRAE evaluation committee, section‌ MISTI (Applied mathematics and computer science).
Julyan Arbel:‌ Membre du comité d'évaluation ANR CE23 Intelligence artificielle‌ et science des données.

10.2 Teaching - Supervision‌ - Juries - Educational and pedagogical outreach

10.2.1‌ Teaching

Master: Stephane Girard, Statistique Inférentielle Avancée, 18‌ ETD, M1 level, Ensimag. Grenoble-INP, France.
Master: Stephane‌ Girard, Modélisation, estimation, simulation des risques climatiques, 12‌ ETD, M2 level, Ecole Polytechnique, Palaiseau, France.
Master:‌ Julyan Arbel, Bayesian Machine Learning, 36 ETD, with‌ R. Bardenet and G. V. Cardoso, Master MVA,‌ École normale supérieure Paris-Saclay .

10.2.2 Supervision

Stephane‌ Girard is the PhD advisor of the PhD‌ thesis of Antoine Franchini (Université Grenoble-Alpes, since december‌ 2024).
Stephane Girard and Jonathan El Methni are‌ the PhD co-avisors of the PhD thesis of‌ Pearl Laveur (Université Grenoble-Alpes, since october 2024).
Stephane‌ Girard is the co-advisor (with G. Stupfler, Université‌ d'Angers and A. Usseglio-Carleve, Université d'Avignon) of the‌ PhD thesis of Solune Denis (Université d'Angers, since‌ october 2024).
Julyan Arbel is the PhD advisor‌ of Mohamed-Bahi Yahiaoui (CEA Cadarache-Inria, with Loïc Giraldi,‌ Geoffrey Daniel).
Julyan Arbel is the PhD advisor‌ of Julien Zhou (Inria-Criteo, with Pierre Gaillard and‌ Thibaud Rahier).
Julyan Arbel is the PhD advisor‌ of Alexandre Wendling (Inria-UGA, with Clovis Galiez).
Julyan‌ Arbel and Pedro Rodrigues are the PhD advisors‌ of Eloise Touron (Inria, with Nelle Varoquaux and‌ Mickael Arbel).
Julyan Arbel and Pedro Rodrigues are‌ the PhD advisors of Camille Touron (Inria).
Julyan‌ Arbel and Sophie Achard are the PhD advisors‌ of Alice Chevaux (Inria, with Guillaume Kon Kam‌ King).
Julyan Arbel is the PhD advisor of‌ the PhD advisor of Soufiane Atouani (Inria).

10.2.3‌ Juries

Stephane Girard : Reviewer of the PhD‌ thesis of Nicolas Atienza, "Towards reliable ML: Leveraging‌ multi-modal representations, information bottleneck and extreme value theory",‌ Univ. Paris-Saclay, aoril 2025.
Stephane Girard : President‌ of the PhD committee of Alex Podgorny, "Réduction‌ de dimension pour l’inf´erence statistique de queues de‌ distribution", Univ. Strasbourg, september 2025.
Jonathan El Methni‌ : member of two hiring committees for PRAG‌ in Maths and a junior lecturer position at‌ Faculté d'économie de l'Université Grenoble Alpes. Member‌ for the hiring committee for ATER positions.
Julyan Arbel: Examiner or the‌ PhD thesis of Meriam‌ Ezziati, Laboratoire d’astrophysique de‌‌ Marseille, “Searching for high-z quasars in the Euclid‌ Wide Survey”.
Julyan Arbel:‌ Examiner or the PhD‌‌ thesis of Antoine Van Biesbroeck, Ecole Polytechnique &‌ CEA Saclay , “Extended‌ reference prior theory for‌‌ objective and practical inference, application to robust and‌ auditable seismic fragility curves‌ estimation”.
Julyan Arbel: Reviewer‌‌ or the PhD thesis of Qian Jin, UNSW‌ Sydney, “Latent Structure Models‌ in Statistical Learning and‌‌ Neural Network Extensions”.
Julyan Arbel: Reviewer or the‌ PhD thesis of Jan‌ Greve, Vienna University of‌‌ Economics and Business, “Probability Distributions on Partitions of‌ Data: Theory and Applications”.‌
Julyan Arbel: Reviewer or‌‌ the HDR thesis of Gianni Franchi, ENSTA Paris,‌ Institut Polytechnique de Paris,‌ “Towards Trustworthy Artificial Intelligence”.‌‌
Florence Forbes: chair of the PhD defence of‌ Tom Swagier and Younes‌ Moussaoui. Member of the‌‌ PhD commitee of Louis Grenioux.

10.2.4 Educational and‌ pedagogical outreach

Stephane Girard‌ : Training (9h, remote‌‌ teaching) “Climate Risk quantification methods and tools for‌ finance” for BNP Paribas‌ employees.

10.3 Popularization

10.3.1‌‌ Participation in Live events

Julyan Arbel is a‌ Social Media Officer for‌ the International Society for‌‌ Bayesian Analysis (ISBA). He organises Discussion Paper Webinars‌ for the Bayesian Analysis‌ journal.

11 Scientific production‌‌

11.1 Major publications

1 articleS.Sophie Achard‌, J.-F.Jean-François Coeurjolly‌, P. L.Pierre‌‌ Lafaye de Micheaux, H.Hanâ Lbath and‌ J.Jonas Richiardi.‌ Inter-regional correlation estimators for‌‌ functional magnetic resonance imaging.NeuroImage282November‌ 2023, 120388HAL‌DOI
2 articleM.‌‌Michaël Allouche, S.Stéphane Girard and E.‌Emmanuel Gobet. EV-GAN:‌ Simulation of extreme events‌‌ with ReLU neural networks.Journal of Machine‌ Learning Research23150‌2022, 1--39HAL‌‌
3 articleF.Fabien Boux, F.Florence‌ Forbes, J.Julyan‌ Arbel, B.Benjamin‌‌ Lemasson and E. L.Emmanuel L. Barbier.‌ Bayesian inverse regression for‌ vascular magnetic resonance fingerprinting‌‌.IEEE Transactions on Medical Imaging407‌July 2021, 1827-1837‌HAL DOI
4 article‌‌A.Abdelaati Daouia, S.Stéphane Girard and‌ G.G. Stupfler.‌ Estimation of Tail Risk‌‌ based on Extreme Expectiles.Journal of the‌ Royal Statistical Society series‌ B802018,‌‌ 263--292
5 articleA.Antoine Deleforge, F.‌Florence Forbes and R.‌Radu Horaud. High-Dimensional‌‌ Regression with Gaussian Mixtures and Partially-Latent Response Variables‌.Statistics and Computing‌February 2014HAL DOI‌‌
6 articleF.Florence Forbes, H. D.‌Hien Duy Nguyen,‌ T. T.Trung Tin‌‌ Nguyen and J.Julyan Arbel. Summary statistics‌ and discrepancy measures for‌ ABC via surrogate posteriors‌‌.Statistics and Computing32852022HAL‌DOI
7 articleS.‌Stéphane Girard, G.‌‌ C.Gilles Claude Stupfler and A.Antoine Usseglio-Carleve‌. Extreme Conditional Expectile‌ Estimation in Heavy-Tailed Heteroscedastic‌‌ Regression Models.Annals of Statistics496‌December 2021, 3358--3382‌HAL DOI
8 article‌‌B.Benjamin Lambert,‌ F.Florence Forbes, S.Senan Doyle,‌ H.Harmonie Dehaene and M.Michel Dojat.‌ Trustworthy clinical AI solutions: A unified review of‌ uncertaintyquantification in Deep Learning models for medical image‌ analysis.Artificial Intelligence in Medicine150April‌ 2024, 102830HALDOI
9 articleH.‌Hongliang Lu, J.Julyan Arbel and F.‌Florence Forbes. Bayesian nonparametric priors for hidden‌ Markov random fields.Statistics and Computing30‌2020, 1015-1035HALDOI
10 inproceedingsM.‌Mariia Vladimirova, J.Jakob Verbeek, P.‌Pablo Mesejo and J.Julyan Arbel. Understanding‌ Priors in Bayesian Neural Networks at the Unit‌ Level.Proceedings of Machine Learning ResearchICML‌ 2019 - 36th International Conference on Machine Learning‌97Proceedings of the 36th International Conference on‌ Machine LearningLong Beach, United StatesJune 2019‌, 6458-6467HAL

11.2 Publications of the year‌

International journals

11 articleM.Michaël Allouche,‌ S.Stéphane Girard and E.Emmanuel Gobet.‌ ExceedGAN: Simulation above extreme thresholds using Generative Adversarial‌ Networks.Extremes2025. In press. HAL‌back to text
12 articleM.Michaël Allouche‌, S.Stéphane Girard and E.Emmanuel Gobet‌. Learning extreme Expected Shortfall and Conditional Tail‌ Moments with neural networks. Application to cryptocurrency data‌.Neural Networks182February 2025, 106903‌HAL DOI back to text
13 articleM.‌Mathis Antonetti, H.Henrique Donãncio and F.‌Florence Forbes. An analysis of distributional reinforcement‌ learning with Gaussian mixtures.Transactions of Machine‌ Learning ResearchDecember 2025HAL back to text‌
14 articleN.Nagham Badreddine, F.Florence‌ Appaix, G.-P. J.Guillaume Jean-Paul Claude Becq‌, S.Sophie Achard, F.Frédéric Saudou‌ and E.Elodie Fino. Early Alterations of‌ Motor Learning and Corticostriatal Network Activity in a‌ Huntington's Disease Mouse Model.European Journal of‌ Neuroscience615March 2025HAL DOI
15‌ articleM.Mathias Barreto, O.Olivier Marchal‌ and J.Julyan Arbel. Optimal sub-Gaussian variance‌ proxy for truncated Gaussian and exponential random variables‌.Statistics and Probability Letters228February 2026‌, 110555HAL DOIback to text
16‌ articleJ.Jan Boelts, M.Michael Deistler‌, M.Manuel Gloeckler, Á.Álvaro Tejero-Cantero‌, J.-M.Jan-Matthis Lueckmann, G.Guy Moss‌, P.Peter Steinbach, T.Thomas Moreau‌, F.Fabio Muratore, J.Julia Linhart‌, C.Conor Durkan, J.Julius Vetter‌, B. K.Benjamin Kurt Miller, M.‌Maternus Herold, A.Abolfazl Ziaeemehr, M.‌Matthijs Pals, T.Theo Gruner, S.‌Sebastian Bischoff, N.Nastya Krouglova, R.‌Richard Gao, J. K.Janne K Lappalainen‌, B.Bálint Mucsányi, F.Felix Pei‌, A.Auguste Schulz, Z.Zinovia Stefanidi‌, P. L.Pedro Luiz Coelho Rodrigues,‌ C.Cornelius Schröder, F. A.Faried Abu‌ Zaid, J.Jonas Beck, J.Jaivardhan Kapoor, D. S.‌David S Greenberg,‌ P. J.Pedro J‌‌ Gonçalves and J. H.Jakob H Macke.‌ sbi reloaded: a toolkit‌ for simulation-based inference workflows‌‌.Journal of Open Source Software10108‌April 2025, 7754‌HAL DOI back to‌‌ text
17 articleL.Lucrezia Carboni, D.‌Dwight Nwaigwe, M.‌Marion Mainsant, R.‌‌Raphaël Bayle, M.Marina Reyboz, M.‌Martial Mermillod, M.‌Michel Dojat and S.‌‌Sophie Achard. Exploring continual learning strategies in‌ artificial neural networks through‌ graph-based analysis of connectivity:‌‌ insights from a brain-inspired perspective.Neural Networks‌185May 2025,‌ 107125HAL DOI back‌‌ to text
18 articleJ.Jonathan El Methni‌ and S.Stéphane Girard‌. A refined extreme‌‌ quantiles estimator for Weibull tail-distributions.REVSTAT -‌ Statistical Journal234‌2025HAL DOI back‌‌ to text
19 articleG.Gersende Fort,‌ F.Florence Forbes and‌ H. D.Hien Duy‌‌ Nguyen. Sequential Sample Average Majorization–Minimization.Statistics‌ and Computing36December‌ 2025, 31HAL‌‌DOI back to text
20 articleS.Stéphane‌ Girard, E.Emmanuel‌ Gobet and J.Jean‌‌ Pachebat. HTGAN: Heavy-Tail GAN for Multivariate Dependent‌ Extremes via Latent-Dimensional Control‌.International Journal of‌‌ Computer MathematicsNovember 2025, 1-41HAL DOI‌back to text
21‌ articleC.Clément Guichet‌‌, S.Sylvain Harquel, S.Sophie Achard‌, M.Martial Mermillod‌ and M.Monica Baciu‌‌. Lifespan oscillatory dynamics in lexical production: A‌ population-based MEG resting-state analysis‌.Imaging Neuroscience3‌‌April 2025, imag_a_00551HAL DOI back to‌ text
22 articleT.‌Tâm Le Minh,‌‌ S.Sophie Donnet, F.François Massol and‌ S.Stéphane Robin.‌ Hoeffding-type decomposition for U-statistics‌‌ on bipartite networks.Electronic Journal of Statistics‌ 1912025,‌ 2829–2875HAL DOI
23‌‌ articleT.Théo Moins, J.Julyan Arbel‌, A.Anne Dutfoy‌ and S.Stéphane Girard‌‌. On the use of a local $\overset{}{R‌}$ to improve MCMC convergence‌ diagnostic.Bayesian Analysis‌‌201March 2025, 1433-1458HAL DOI‌back to text
24‌ articleH.Hien Nguyen‌‌, T.Trungtin Nguyen, J.Julyan Arbel‌ and F.Florence Forbes‌. Revisiting Concentration Results‌‌ for Approximate Bayesian Computation.Bayesian AnalysisMarch‌ 2025, 1-23HAL‌DOI
25 articleP.‌‌Pierre Wolinski and J.Julyan Arbel. Gaussian‌ Pre-Activations in Neural Networks:‌ Myth or Reality?Transactions‌‌ on Machine Learning Research JournalApril 2025,‌ 1-50HAL back to‌ text

Invited conferences

26‌‌ inproceedingsM.Michaël Allouche, E.Emmanuel Gobet‌ and S.Stéphane Girard‌. Estimation of extreme‌‌ risk measures with neural networks.SAMA 2025‌ - Journée SAMAParis,‌ France2025HAL
27‌‌ inproceedingsS.Stéphane Girard, M.Michaël Allouche‌ and E.Emmanuel Gobet‌. Estimation of extreme‌‌ quantile from heavy tailed distributions with neural networks‌.ICSDS 2025 -‌ International Conference on Statistics‌‌ and Data ScienceSéville,‌ Spain2025HAL

International peer-reviewed conferences

28 inproceedings‌A.Alice Chevaux, A.Ali Fahkar,‌ K.Kévin Polisano, I.Irène Gannaz and‌ S.Sophie Achard. Benchmarking Brain Connectivity Graph‌ Inference: A Novel Validation Approach.33rd European‌ Signal Processing Conference (EUSIPCO 2025)Palerme, ItalySeptember‌ 2025HAL back to text
29 inproceedingsA.‌Ali Fahkar, K.Kévin Polisano, I.‌Irène Gannaz and S.Sophie Achard. Génération‌ de Matrices de Corrélation avec des Structures de‌ Graphe par Optimisation Convexe.GRETSI 2025 -‌ XXXe Colloque Francophone de Traitement du Signal et‌ des ImagesStrasbourg, FranceAugust 2025HAL back‌ to text
30 inproceedingsA.Ali Fakhar,‌ K.Kévin Polisano, I.Irène Gannaz and‌ S.Sophie Achard. Generating Correlation Matrices with‌ Graph Structures Using Convex Optimization.IEEE Statistical‌ Signal Processing Workshop (SSP)Edinbourg, United KingdomJune‌ 2025HAL back to text
31 inproceedingsJ.‌Jacopo Iollo, C.Christophe Heinkelé, P.‌Pierre Alliez and F.Florence Forbes. Bayesian‌ Experimental Design via Contrastive Diffusions.Internation Conference‌ on Learning RepresentationsICLR 2025 - International Conference‌ on Learning RepresentationsSingapore, Singapore2025, 1-24‌HAL back to text
32 inproceedingsB.Brice‌ Marc, P.Philippe Foucher, F.Florence‌ Forbes and P.Pierre Charbonnier. Normalizing Flows‌ contrastifs pour la détection d'anomalies sur ouvrages d'art‌.GRETSI 2025 - XXXe Colloque Francophone de‌ Traitement du Signal et des ImagesStrasbourg, France‌2025, 1-3HALback to text
33‌ inproceedingsG.Geoffroy Oudoumanessah, T.Thomas Coudert‌, L.Luc Meyer, A.Aurelien Delphin‌, T.Thomas Christen, M.Michel Dojat‌, C.Carole Lartizien and F.Florence Forbes‌. Cluster globally, Reduce locally: Scalable efficient dictionary‌ compression for magnetic resonance fingerprinting.International Symposium‌ in Biomedical ImagingISBI 2025 - International Symposium‌ in Biomedical ImagingHouston, United States2025,‌ 1-5HAL back to text
34 inproceedingsE.‌Eloïse Touron, P. L.Pedro Luiz Coelho‌ Rodrigues, J.Julyan Arbel, N.Nelle‌ Varoquaux and M.Michael Arbel. Simulation-based inference‌ of yeast centromeres.NeurIPS 2025 - 39th‌ Conference on Neural Information Processing Systems Workshop :‌ The 3rd Workshop on Imageomics: Discovering Biological Knowledge‌ from Images Using AI.Copenhagen, DenmarkNovember 2025‌, 1-13HAL DOIback to text
35‌ inproceedingsJ.Julien Zhou, P.Pierre Gaillard‌, T.Thibaud Rahier and J.Julyan Arbel‌. Logarithmic Regret for Unconstrained Submodular Maximization Stochastic‌ Bandit.Proceedings of Machine Learning ResearchALT‌ 2025 - 36th International Conference on Algorithmic Learning‌ Theory272Milan, ItalyPMLR2025, 1-25‌HAL DOI back to text

National peer-reviewed Conferences‌

36 inproceedingsS.Stéphane Girard, M.Michaël‌ Allouche and E.Emmanuel Gobet. Estimation of‌ extreme risk measures with neural networks.JDS2025‌ - 56èmes Journées de Statistique de la SfdS‌Marseille, FranceACM2025, 1-6HAL

Conferences without proceedings

37 inproceedings‌J.Julyan Arbel.‌ Bayesian deep learning: Overview‌‌ and challenges.BNP 14 - 14th International‌ Conference on Bayesian Nonparametrics‌Los Angeles (CA), United‌‌ States2025HAL
38 inproceedingsJ.Julyan Arbel‌. Overview and challenges‌ in Bayesian deep learning‌‌.JdS 2025 - 56es Journées de Statistique‌ de la SFdSMarseille,‌ France2025HAL
39‌‌ inproceedingsJ.Julyan Arbel. Some Bayesian nonparametric‌ ideas in (Bayesian) deep‌ learning.RSS 2025‌‌ - International Conference of Royal Statistical SocietyEdimbourg,‌ United Kingdom2025HAL‌
40 inproceedingsA.Antoine‌‌ Barrier, L.Lila Cunge, T.Thomas‌ Coudert, A.Aurélien‌ Delphin, L.Loïc‌‌ Legris, G.Geoffroy Oudoumanessah, L.Laurent‌ Lamalle, F.Florence‌ Forbes, M.Mariya‌‌ Doneva, B.Benjamin Lemasson, E. L.‌Emmanuel L. Barbier and‌ T.Thomas Christen.‌‌ MARVEL MRF for Contrast-free Blood Volume, Microvascular Properties,‌ and Relaxometry Mapping: Initial‌ Tests in Volunteers and‌‌ Stroke Patients.ISMRM & ISMRT 2025 -‌ Annual Meeting & Exhibition‌Honolulu (Hawaï), United States‌‌2025, 1-4HAL
41 inproceedingsM.Michel‌ Dojat. Digital Health‌ in the 21th.‌‌CPR 2025 - 4th Annecy Round Table on‌ Cardio Pulmonary ResuscitationAnnecy,‌ France2025HAL
42‌‌ inproceedingsC.Camille Touron, G.Gabriel Victorino‌ Cardoso, J.Julyan‌ Arbel and P. L.‌‌Pedro Luiz Coelho Rodrigues. Error analysis of‌ a compositional score-based algorithm‌ for simulation-based inference.‌‌Workshop on Principles of Generative Modeling at EurIPS‌ 2025Copenhague, DenmarkOctober‌ 2025HAL back to‌‌ text

Scientific book chapters

43 inbookM.Michaël‌ Allouche, S.Stéphane‌ Girard and E.Emmanuel‌‌ Gobet. On the simulation of extreme events‌ with neural networks.‌Handbook on Statistics of‌‌ ExtremesChapman & Hall/CRC2026. In press.‌ HAL back to text‌
44 inbookB.Benjamin‌‌ Lambert, F.Florence Forbes and M.Michel‌ Dojat. From out-of-distribution‌ detection to quality control‌‌.Trustworthy AI in Medical ImagingElsevier2025‌, 101-126HAL DOI‌

Doctoral dissertations and habilitation‌‌ theses

45 thesisJ.Julien Chevallier. A‌ journey in the fields‌ of PDE, probabilities and‌‌ statistics with point processes.Université grenoble Alpes‌December 2025HAL

Reports‌ & preprints

46 misc‌‌S.Soufiane Atouani, O.Olivier Marchal and‌ J.Julyan Arbel.‌ Optimal sub-Gaussian variance proxy‌‌ for 3-mass distributions.October 2025HAL
47‌ miscL. M.Luben‌ M. C. Cabezas,‌‌ V. S.Vagner S. Santos, T. R.‌Thiago R. Ramos,‌ P. L.Pedro Luiz‌‌ Coelho Rodrigues and R.Rafael Izbicki. CP4SBI:‌ Local Conformal Calibration of‌ Credible Sets in Simulation-Based‌‌ Inference.August 2025HAL back to text‌
48 miscJ.Julien‌ Chevallier, J.-F.Jean-François‌‌ Coeurjolly and R.Rasmus Waagepetersen. Critical point‌ processes obtained from a‌ Gaussian random field with‌‌ a view toward statistics.July 2025HAL‌
49 miscI.Isabella‌ Costa Maia, M.‌‌Marco Congedo, P.‌ L.Pedro Luiz Coelho Rodrigues and S.Salem‌ Said. Curvature-based rejection sampling.2025HAL‌
50 miscM.Michael Deistler, J.Jan‌ Boelts, P.Peter Steinbach, G.Guy‌ Moss, T.Thomas Moreau, M.Manuel‌ Gloeckler, P. L.Pedro Luiz Coelho Rodrigues‌, J.Julia Linhart, J. K.Janne‌ K. Lappalainen, B. K.Benjamin Kurt Miller‌, P. J.Pedro J. Gonçalves, J.-M.‌Jan-Matthis Lueckmann, C.Cornelius Schröder and J.‌ H.Jakob H. Macke. Simulation-Based Inference: A‌ Practical Guide.August 2025HAL back to‌ text
51 miscJ.Jonathan El Methni and‌ S.Stéphane Girard. Approximate Bayesian Computation of‌ reduced-bias extreme risk measures from heavy-tailed distributions.‌2025HAL back to text
52 miscJ.‌Jonathan El Methni, S.Stéphane Girard and‌ P.Pearl Laveur. A new family of‌ inequality indices: axioms, inference and tail properties.‌2025HAL back to text
53 miscJ.‌Jonathan El Methni, S.Stéphane Girard,‌ J.Juliette Legrand, G.Gilles Stupfler and‌ A.Antoine Usseglio-Carleve. Four contemporary problems in‌ extreme value analysis.2025HAL back to‌ text
54 miscA.Antoine Franchini, S.‌Stéphane Girard and A.Anne Dutfoy. Adaptive‌ confidence intervals for extreme quantiles from heavy-tailed distributions‌.2025HAL back to text
55 misc‌S.Stéphane Girard, T.Thomas Opitz,‌ A.Antoine Usseglio-Carleve and C.Chen Yan.‌ Changepoint identification in heavy-tailed distributions.2025HAL‌back to text
56 miscS.Stéphane Girard‌ and C.Cambyse Pakzad. Extreme-PLS with missing‌ data under weak dependence.2025HAL back‌ to text
57 miscS.Stéphane Girard and‌ C.Cambyse Pakzad. Functional Extreme-PLS.2025‌HAL back to text
58 miscT.Tâm‌ Le Minh, J.Julyan Arbel and F.‌Florence Forbes. A variational approach to empirical‌ mode estimation.2025HAL back to text‌
59 miscT.Tâm Le Minh, J.‌Julyan Arbel, T.Thomas Möllenhoff, M.‌ E.Mohammad Emtiyaz Khan and F.Florence Forbes‌. Natural Variational Annealing for Multimodal Optimization.‌2025HAL back to text
60 miscG.‌Geoffroy Oudoumanessah, T.Thomas Coudert, C.‌Carole Lartizien, M.Michel Dojat, T.‌Thomas Christen and F.Florence Forbes. Scalable‌ magnetic resonance fingerprinting: Incremental inference of high dimensional‌ elliptical mixtures from large data volumes.January‌ 2026HAL back to text
61 miscP.-L.‌Pierre-Louis Ruhlmann, P. L.Pedro Luiz Coelho‌ Rodrigues, M.Michael Arbel and F.Florence‌ Forbes. Flow Matching for Robust Simulation-Based Inference‌ under Model Misspecification.December 2025HAL back‌ to text

Other scientific publications

62 inproceedingsA.‌Arturo Cabrera Vazquez, S.Sophie Achard,‌ M.Michel Dojat and S.Stein Silva.‌ Graph theory in Disorders of Consciousness: Toward multimodal‌ integration.2025 - 5th Annual Meeting of the Neuromod InstituteAntibes,‌ France2025, 1-1‌HAL back to text‌‌
63 inproceedingsA.Arturo Cabrera Vazquez, S.‌Sophie Achard, M.‌Michel Dojat and S.‌‌Stein Silva. Interpretable AI and Graph Theory‌ in Disorders of Consciousness‌.IABM 2025 -‌‌ 3ème Colloque Français d'Intelligence Artificielle en Imagerie Biomédicale‌Nice, France2025,‌ 1-1HAL back to‌‌ text
64 inproceedingsA.Arturo Cabrera Vazquez,‌ S.Sophie Achard,‌ M.Michel Dojat and‌‌ S.Stein Silva. Neuroinflammation and disrupted functional‌ connectivity in coma: a‌ graph-theoretical study.EBRAINS‌‌ Summit 2025Brussels, Belgium2025HAL back to‌ text
65 inproceedingsR.‌Razan Mhanna, S.‌‌Sophie Achard, A.Alexander Petersen and J.‌Jonas Richiardi. From‌ networks to consciousness: Evaluating‌‌ Grakel graph kernels in Coma patient analysis.‌IABM 2025 - Colloque‌ Français d'Intelligence Artificielle en‌‌ Imagerie BiomédicaleNice, FranceMarch 2025HAL
66‌ inproceedingsR.Razan Mhanna‌, A.Alexander Petersen‌‌, S.Sophie Achard and J.Jonas Richiardi‌. Hybrid CNN–XGBoost Framework‌ for Distribution-Based rs-fMRI Brain‌‌ Graph Classification.AI Summit 2025Salt Lake‌ City (UT), USA, United‌ StatesJune 2025HAL‌‌
67 inproceedingsM.Marc Saghiah, J.-C.Jean-Côme‌ Douteau, A.Alice‌ Bressand, S.Sophie‌‌ Ancelet, M.Michel Dojat and B.Benjamin‌ Lemasson. Semi-automated approach‌ for quality control and‌‌ harmonization of large multiparametric and multicenter MRI databases:‌ methodology and application.‌SFRMBM 2025 - 7ème‌‌ congrès de la Société Française de Résonance Magnétique‌ en Biologie et Médecine‌Saint-Malo, France2025,‌‌ 1-1HAL

11.3 Cited publications

68 inbookH.‌H. Bacave, J.-B.‌J.-B. Durand, A.‌‌A. Franc, N.N. Peyrard, S.‌S. Plancade and R.‌R. Sabbadin. Multichain‌‌ HMM.A comprehensive guide to HSMM: Theory,‌ software, and advanced extensions‌N.N. Peyrard and‌‌ B.B. De Saporta, eds. Mathematics and‌ Statistics Series / ISTE‌ISTE; John Wiley2025‌‌, 79-116back to text
69 inbookC.‌C. Bérard, M.-J.‌M.-J. Cros, J.-B.‌‌J.-B. Durand, C.C. Lothodé, S.‌S. Plancade, R.‌R. Trépos and N.‌‌N Vergne. Review of HSMM R and‌ Python Softwares.A‌ comprehensive guide to HSMM:‌‌ Theory, software, and advanced extensionsN.N. Peyrard‌ and B.B. De‌ Saporta, eds. Mathematics‌‌ and Statistics Series / ISTEISTE; John Wiley‌2025, 47-77back‌ to text
70 inbook‌‌J.-B.J.-B. Durand, A.A. Franc,‌ N.N. Peyrard,‌ N.N. Vergne and‌‌ I.I. Votsi. Monochain HSMM.A‌ comprehensive guide to HSMM:‌ Theory, software, and advanced‌‌ extensionsN.N. Peyrard and B.B. De‌ Saporta, eds. Mathematics‌ and Statistics Series /‌‌ ISTEISTE; John Wiley2025, 1-46back‌ to text
71 inbook‌J.-B.J.-B. Durand,‌‌ N.N. Peyrard, S.S. Plancade and‌ R.R. Sabbadin.‌ Multichain HSMM.A‌‌ comprehensive guide to HSMM:‌ Theory, software, and advanced extensionsN.N. Peyrard‌ and B.B. De Saporta, eds. Mathematics‌ and Statistics Series / ISTEISTE; John Wiley‌2025, 129-156back to text
72 inproceedings‌J.-B.J.-B. Durand, M.M. Valdeyron,‌ N.N. Peyrard and S.S. Plancade.‌ Review of estimation algorithms for HMM/HSMM with mixed‌ effects.MaSeMo : Markov, Semi-Markov Models and‌ Associated Fields (from Theory to Application and back)‌Paris, FranceJul 2025, 9back to‌ text
73 unpublishedJ.Julia Linhart, G.‌ V.Gabriel Victorino Cardoso, A.Alexandre Gramfort‌, S.Sylvain Le Corff and P. L.‌Pedro Luiz Coelho Rodrigues. Diffusion posterior sampling‌ for simulation-based inference in tall data settings.‌June 2024, working paper or preprintHAL‌back to text

STATIFY - 2025

STATIFY - 2025

2025Activity reportProject-Team﻿​﻿﻿STATIFY

Keywords

Computer Science and﻿​﻿﻿ Digital Science

Other Research Topics and​​﻿﻿ Application Domains

1﻿﻿﻿‌ Team members, visitors, external﻿‌​‌ collaborators

Research Scientists

Faculty Members

Post-Doctoral Fellows

PhD Students

Technical Staff

Interns﻿​​﻿ and Apprentices

Administrative﻿​﻿﻿ Assistants

Visiting Scientists

2 Overall﻿​﻿﻿ objectives

3 Research program

3.1﻿​​﻿ Models for graphs and​​​‌ networks

Structure​​﻿﻿ learning.

Structure modelling.

Structured anomaly​​​‌ detection.

3.2 Dimension reduction﻿‌​‌ and latent variable modeling﻿​​﻿

Regression in high dimensions.​​​‌

Simulation-based inference​​﻿﻿ (SBI) for high dimensional​​​‌ inverse problems.

Online and incremental​​﻿﻿ inference.

3.3 Bayesian﻿﻿﻿‌ modelling

Markov priors for Bayesian​​​‌ nonparametric models.

Asymptotic properties﻿‌​‌ of BNP models.

Amortized Approximate Bayesian﻿﻿﻿‌ computation.

Bayesian​​﻿﻿ neural networks.

3.4​​​‌ Modelling and quantifying extreme﻿​﻿﻿ risk

Extreme quantile estimation.

New﻿‌​‌ measures of extreme risk.﻿​​﻿

Extremes with covariates.

Extremes​‌﻿﻿ and machine learning.

4 Application​​​‌ domains

4.1 Image Analysis﻿​﻿﻿

4.2​‌﻿﻿ Biology, Environment and Medicine​​﻿﻿

5 Social​​﻿﻿ and environmental responsibility

5.1﻿​​﻿ Footprint of research activities​​​‌

5.2﻿‌​‌ Impact of research results﻿​​﻿

6 Latest﻿﻿﻿‌ software developments, platforms, open﻿‌​‌ data

6.1 Latest software﻿​​﻿ developments

6.1.1 Planet-GLLiM

6.1.2 xLLiM (Kernelo)

7 New results

7.1﻿﻿﻿‌ Models for graphs and﻿‌​‌ networks

7.1.1 Leaf Area﻿​​﻿ estimation and Semantic segmentation​​​‌ of forest point clouds﻿﻿﻿‌ using neural networks.

7.1.2 Graph modelling​​​‌ for the study of﻿​﻿﻿ language dynamics

7.1.3​​​‌ Link between Graphs and﻿﻿﻿‌ artificial neural networks

7.1.4﻿​​﻿ Benchmark for graph inference​​​‌

7.1.5 Graphs for﻿​​﻿ coma patients

7.1.6﻿​​﻿ Biological neural network

7.1.7​​​‌ Community detection for binary﻿​﻿﻿ graphical models in high​‌﻿﻿ dimension

7.1.8​​​‌ Contrastive Normalizing Flows for﻿​﻿﻿ anomaly detection in Engineering​‌﻿﻿ Structures

7.1.9﻿​﻿﻿ Coupled hidden Markov and​‌﻿﻿ semi-Markov processes

7.2​​​‌ Latent variable modelling

7.2.1﻿﻿﻿‌ Stochastic Majorization-Minimization with sample-average﻿‌​‌ approximation

7.2.2 Natural Variational Annealing​​​‌ for Multimodal Optimization

7.2.3 Scalable​​﻿﻿ magnetic resonance fingerprinting: Incremental​​​‌ inference of high dimensional﻿​﻿﻿ elliptical mixtures from large​‌﻿﻿ data volumes

7.2.4 Assessing​​﻿﻿ a dose-response relationship after​​​‌ brain radiotherapy via Mixture﻿​﻿﻿ of Regressions

7.2.5 Massive﻿​​﻿ analysis of multidimensional astrophysical​​​‌ data by inverse regression﻿﻿﻿‌ of physical models

7.2.6 An﻿﻿﻿‌ analysis of distributional reinforcement﻿‌​‌ learning with Gaussian mixtures﻿​​﻿

7.2.7 Dynamic​​﻿﻿ Learning Rate for Deep​​​‌ Reinforcement Learning: A Bandit﻿​﻿﻿ Approach

7.2.8 Bandits and sequential​​​‌ learning

7.2.9 Optimal​‌﻿﻿ sub-Gaussian variance proxy

7.2.10 Mixed hidden﻿‌​‌ semi-Markov processes

7.3 Bayesian modelling﻿﻿﻿‌

7.3.1 Convergence of projected﻿‌​‌ stochastic natural gradient variational﻿​​﻿ inference for various step​​​‌ size and sample or﻿﻿﻿‌ batch size schedules

7.3.2​​﻿﻿ Concentration results for approximate​​​‌ Bayesian computation without identifiability﻿​﻿﻿

7.3.3 Diagnosing convergence of​‌﻿﻿ Markov chain Monte Carlo​​﻿﻿

7.3.4 Bayesian deep learning﻿​﻿﻿

7.3.5 Bayesian Experimental Design﻿‌​‌ via Contrastive Diffusions.

7.3.6 Active MRI Acquisition﻿﻿﻿‌ with Diffusion Guided Bayesian﻿‌​‌ Experimental Design.

7.3.7 Simulation-based​​​‌ inference using score-diffusion: algorithm﻿​﻿﻿ and theoretical analysis

7.3.8 Conformal prediction﻿​﻿﻿ for simulation-based inference

7.3.9 Simulation-based inference​‌﻿﻿ under model misspecification

7.3.10​​​‌ Simulation-based inference applied to﻿﻿﻿‌ biology

7.3.11​​​‌ Tutorial guide to simulation-based﻿﻿﻿‌ inference

7.4 Modelling​​​‌ and quantifying extreme risk﻿﻿﻿‌

2025Activity reportProject-TeamSTATIFY

Computer Science and Digital Science

Other Research Topics and Application Domains

1‌ Team members, visitors, external‌‌ collaborators

Interns and Apprentices

Administrative Assistants

2 Overall objectives

3.1 Models for graphs and‌ networks

Structure learning.

Structured anomaly‌ detection.

3.2 Dimension reduction‌‌ and latent variable modeling

Regression in high dimensions.‌

Simulation-based inference (SBI) for high dimensional‌ inverse problems.

Online and incremental inference.

3.3 Bayesian‌ modelling

Markov priors for Bayesian‌ nonparametric models.

Asymptotic properties‌‌ of BNP models.

Amortized Approximate Bayesian‌ computation.

Bayesian neural networks.

3.4‌ Modelling and quantifying extreme risk

New‌‌ measures of extreme risk.

Extremes‌ and machine learning.

4 Application‌ domains

4.1 Image Analysis

4.2‌ Biology, Environment and Medicine

5 Social and environmental responsibility

5.1 Footprint of research activities‌

5.2‌‌ Impact of research results

6 Latest‌ software developments, platforms, open‌‌ data

6.1 Latest software developments

7.1‌ Models for graphs and‌‌ networks

7.1.1 Leaf Area estimation and Semantic segmentation‌ of forest point clouds‌ using neural networks.

7.1.2 Graph modelling‌ for the study of language dynamics

7.1.3‌ Link between Graphs and‌ artificial neural networks

7.1.4 Benchmark for graph inference‌

7.1.5 Graphs for coma patients

7.1.6 Biological neural network

7.1.7‌ Community detection for binary graphical models in high‌ dimension

7.1.8‌ Contrastive Normalizing Flows for anomaly detection in Engineering‌ Structures

7.1.9 Coupled hidden Markov and‌ semi-Markov processes

7.2‌ Latent variable modelling

7.2.1‌ Stochastic Majorization-Minimization with sample-average‌‌ approximation

7.2.2 Natural Variational Annealing‌ for Multimodal Optimization

7.2.3 Scalable magnetic resonance fingerprinting: Incremental‌ inference of high dimensional elliptical mixtures from large‌ data volumes

7.2.4 Assessing a dose-response relationship after‌ brain radiotherapy via Mixture of Regressions

7.2.5 Massive analysis of multidimensional astrophysical‌ data by inverse regression‌ of physical models

7.2.6 An‌ analysis of distributional reinforcement‌‌ learning with Gaussian mixtures

7.2.7 Dynamic Learning Rate for Deep‌ Reinforcement Learning: A Bandit Approach

7.2.8 Bandits and sequential‌ learning

7.2.9 Optimal‌ sub-Gaussian variance proxy

7.2.10 Mixed hidden‌‌ semi-Markov processes

7.3 Bayesian modelling‌

7.3.1 Convergence of projected‌‌ stochastic natural gradient variational inference for various step‌ size and sample or‌ batch size schedules

7.3.2 Concentration results for approximate‌ Bayesian computation without identifiability

7.3.3 Diagnosing convergence of‌ Markov chain Monte Carlo

7.3.4 Bayesian deep learning

7.3.5 Bayesian Experimental Design‌‌ via Contrastive Diffusions.

7.3.6 Active MRI Acquisition‌ with Diffusion Guided Bayesian‌‌ Experimental Design.

7.3.7 Simulation-based‌ inference using score-diffusion: algorithm and theoretical analysis

7.3.8 Conformal prediction for simulation-based inference

7.3.9 Simulation-based inference‌ under model misspecification

7.3.10‌ Simulation-based inference applied to‌ biology

7.3.11‌ Tutorial guide to simulation-based‌ inference

7.4 Modelling‌ and quantifying extreme risk‌

7.4.1 Extreme events and‌‌ neural networks

7.4.2 Estimation of extreme risk measures‌

7.4.3 Estimation of extreme inequality‌ measures

7.4.4 Changepoint identification in‌ heavy-tailed distributions

7.4.5 Dimension reduction for extremes

8 Bilateral‌ contracts and grants with‌ industry

8.1 Bilateral contracts‌‌ with industry

9‌‌ Partnerships and cooperations

9.1 International initiatives

9.1.1 Inria‌ associate team not involved‌ in an IIL or‌‌ an international program

WOMBAT

9.2 International research visitors

9.2.1 Visits of‌ international scientists

Other international‌ visits to the team‌‌

9.2.2 Visits to international teams

Research stays abroad‌