EN FR
EN FR
STATIFY - 2025

2025Activity reportProject-Team​STATIFY

RNSR: 202023582A
  • Research​‌ center Inria Centre at​​ Université Grenoble Alpes
  • In​​​‌ partnership with:CNRS, Université​ de Grenoble Alpes
  • Team​‌ name: Bayesian and extreme​​ value statistical models for​​​‌ structured and high dimensional​ data
  • In collaboration with:​‌Laboratoire Jean Kuntzmann (LJK)​​

Creation of the Project-Team:​​​‌ 2020 April 01

Each​ year, Inria research teams​‌ publish an Activity Report​​ presenting their work and​​​‌ results over the reporting​ period. These reports follow​‌ a common structure, with​​ some optional sections depending​​​‌ on the specific team.​ They typically begin by​‌ outlining the overall objectives​​ and research programme, including​​​‌ the main research themes,​ goals, and methodological approaches.​‌ They also describe the​​ application domains targeted by​​​‌ the team, highlighting the​ scientific or societal contexts​‌ in which their work​​ is situated.

The reports​​​‌ then present the highlights​ of the year, covering​‌ major scientific achievements, software​​ developments, or teaching contributions.​​​‌ When relevant, they include​ sections on software, platforms,​‌ and open data, detailing​​ the tools developed and​​​‌ how they are shared.​ A substantial part is​‌ dedicated to new results,​​ where scientific contributions are​​​‌ described in detail, often​ with subsections specifying participants​‌ and associated keywords.

Finally,​​ the Activity Report addresses​​​‌ funding, contracts, partnerships, and​ collaborations at various levels,​‌ from industrial agreements to​​ international cooperations. It also​​​‌ covers dissemination and teaching​ activities, such as participation​‌ in scientific events, outreach,​​ and supervision. The document​​​‌ concludes with a presentation​ of scientific production, including​‌ major publications and those​​ produced during the year.​​​‌

Keywords

Computer Science and​ Digital Science

  • A3. Data​‌ and knowledge
  • A3.1. Data​​
  • A3.1.1. Modeling, representation
  • A3.1.4.​​​‌ Uncertain data
  • A3.3. Data​ and knowledge analysis
  • A3.3.2.​‌ Data mining
  • A3.3.3. Big​​ data analysis
  • A5. Interaction,​​​‌ multimedia and robotics
  • A5.3.​ Image processing and analysis​‌
  • A5.3.3. Pattern recognition
  • A5.9.​​ Signal processing
  • A5.9.2. Estimation,​​​‌ modeling
  • A6. Modeling, simulation​ and control
  • A6.2. Scientific​‌ computing, Numerical Analysis &​​ Optimization
  • A6.2.3. Probabilistic methods​​​‌
  • A6.2.4. Statistical methods
  • A6.3.​ Computation-data interaction
  • A6.3.1. Inverse​‌ problems
  • A6.3.3. Data processing​​
  • A6.3.5. Uncertainty Quantification
  • A9.​​​‌ Artificial intelligence
  • A9.2. Machine​ learning
  • A9.2.1. Supervised learning​‌
  • A9.2.2. Unsupervised learning
  • A9.2.4.​​ Optimization and learning
  • A9.2.5.​​​‌ Bayesian methods
  • A9.2.7. Kernel​ methods
  • A9.3. Signal processing​‌

Other Research Topics and​​ Application Domains

  • B1. Life​​ sciences
  • B1.2. Neuroscience and​​​‌ cognitive science
  • B1.2.1. Understanding‌ and simulation of the‌​‌ brain and the nervous​​ system
  • B2. Digital health​​​‌
  • B2.6. Biological and medical‌ imaging
  • B2.6.1. Brain imaging‌​‌
  • B3. Environment and planet​​
  • B3.3. Geosciences
  • B3.4. Risks​​​‌
  • B3.4.1. Natural risks
  • B3.4.2.‌ Industrial risks and waste‌​‌
  • B3.5. Agronomy
  • B5. Industry​​ of the future
  • B5.1.​​​‌ Factory of the future‌
  • B9. Society and Knowledge‌​‌
  • B9.5. Sciences
  • B9.5.6. Data​​ science
  • B9.11. Risk management​​​‌
  • B9.11.1. Environmental risks

1‌ Team members, visitors, external‌​‌ collaborators

Research Scientists

  • Florence​​ Forbes [Team leader​​​‌, INRIA, Senior‌ Researcher, HDR]‌​‌
  • Sophie Achard [CNRS​​, Senior Researcher,​​​‌ HDR]
  • Julyan Arbel‌ [INRIA, HDR‌​‌]
  • Pedro Luiz Coelho​​ Rodrigues [INRIA,​​​‌ ISFP]
  • Michel Dojat‌ [INSERM]
  • Henrique‌​‌ Donancio [INRIA,​​ Starting Research Position,​​​‌ from Sep 2025 until‌ Nov 2025]
  • Stephane‌​‌ Girard [INRIA,​​ Senior Researcher, HDR​​​‌]

Faculty Members

  • Julien‌ Chevallier [UGA,‌​‌ Associate Professor, until​​ Aug 2025]
  • Jean-Baptiste​​​‌ Durand [CIRAD,‌ Associate Professor]
  • Jonathan‌​‌ El Methni [UGA​​, Associate Professor]​​​‌

Post-Doctoral Fellows

  • Loic Chalmandrier‌ [UGA, Post-Doctoral‌​‌ Fellow]
  • Henrique Donancio​​ [INRIA, Post-Doctoral​​​‌ Fellow, until Aug‌ 2025]
  • Anton Francois‌​‌ [CNRS, Post-Doctoral​​ Fellow, until Sep​​​‌ 2025]
  • Thomas Guilmeau‌ [UGA, Post-Doctoral‌​‌ Fellow, from Jul​​ 2025]
  • Yiye Jiang​​​‌ [UGA, Post-Doctoral‌ Fellow]
  • Mamadou Kanoute‌​‌ [INRIA, Post-Doctoral​​ Fellow, from Sep​​​‌ 2025]
  • Tam Le‌ Minh [INRIA,‌​‌ Post-Doctoral Fellow]
  • Rafael​​ Mouallem Rosa [INRIA​​​‌, Post-Doctoral Fellow]‌
  • Paul-Gauthier Noe [INRIA‌​‌, Post-Doctoral Fellow,​​ until Jan 2025]​​​‌

PhD Students

  • Arturo Cabrera‌ Vazquez [INRIA]‌​‌
  • Alice Chevaux [UGA​​]
  • Isabella Costa Maia​​​‌ [UGA]
  • Luben‌ Miguel Cruz Cabezas [‌​‌FAPESP, from Oct​​ 2025]
  • Antoine Franchini​​​‌ [INRIA]
  • Jacopo‌ Iollo [INRIA,‌​‌ until Sep 2025]​​
  • Pearl Laveur [UGA​​​‌]
  • Brice Marc [‌CEREMA]
  • Razan Mhanna‌​‌ [UGA]
  • Geoffroy​​ Oudoumanessah [POLE EMPLOI​​​‌, from Dec 2025‌]
  • Geoffroy Oudoumanessah [‌​‌INRIA, from Oct​​ 2025 until Nov 2025​​​‌]
  • Geoffroy Oudoumanessah [‌INSERM, until Sep‌​‌ 2025]
  • Pierre-Louis Ruhlmann​​ [INRIA]
  • Ababacar​​​‌ Sembene [INRIA,‌ from Nov 2025]‌​‌
  • Camille Touron [UGA​​]

Technical Staff

  • Adam​​​‌ Fragkiadakis [INRIA,‌ Engineer, from Sep‌​‌ 2025]
  • Jacopo Iollo​​ [INRIA, Engineer​​​‌, from Oct 2025‌]
  • Marc Saghiah [‌​‌INRIA, Engineer,​​ from Jul 2025]​​​‌
  • Laurent Vallet [INRIA‌, Engineer, from‌​‌ Dec 2025]

Interns​​ and Apprentices

  • Mateo Amazo​​​‌ [UGA, Intern‌, from Feb 2025‌​‌ until Feb 2025]​​
  • Mateo Amazo [INRIA​​​‌, Intern, until‌ Jan 2025]
  • Hani‌​‌ Anouar Bourrous [INRIA​​, Intern, until​​​‌ May 2025]
  • Ba‌ Khuong Dang [INRIA‌​‌, Intern, from​​​‌ May 2025 until Oct​ 2025]
  • Quentin Faye​‌ [INPG SA,​​ Intern, from Apr​​​‌ 2025 until Jul 2025​]
  • Rithy Sochet [​‌INRIA, Intern,​​ from Apr 2025 until​​​‌ Sep 2025]

Administrative​ Assistants

  • Luce Coelho [​‌INRIA]
  • Diane Courtiol​​ [INRIA]
  • Marie-Anne​​​‌ Dauphin-Rizzi [INRIA]​
  • Julia Di Toro [​‌INRIA]
  • Myriam Etienne​​ [INRIA]
  • Nathalie​​​‌ Gillot [INRIA]​
  • Laura Leone [Randstad​‌, from Aug 2025​​]
  • Helen Pouchot-Rouge-Blanc [​​​‌INRIA]
  • Maria Immaculada​ Presseguer [INRIA]​‌
  • Annie Simon [INRIA​​]

Visiting Scientists

  • Luben​​​‌ Miguel Cruz Cabezas [​UFScar, Brazil, from​‌ Aug 2025]
  • Patrycja​​ Scislewska [UNIV VARSOVIE​​​‌, from Mar 2025​ until Jul 2025]​‌
  • Darren Wraith [UNIV​​ QUEENSLAND, from Aug​​​‌ 2025]

2 Overall​ objectives

The statify team​‌ focuses on statistics. Statistics​​ can be defined as​​​‌ a science of variation​ where the main question​‌ is how to acquire​​ knowledge in the face​​​‌ of variation. In the​ past, statistics were seen​‌ as an opportunity to​​ play in various backyards.​​​‌ Today, the statistician sees​ his own backyard invaded​‌ by data scientists, machine​​ learners and other computer​​​‌ scientists of all kinds.​ Everyone wants to do​‌ data analysis and some​​ (but not all) do​​​‌ it very well. Generally,​ data analysis algorithms and​‌ associated network architectures are​​ empirically validated using domain-specific​​​‌ datasets and data challenges.​ While winning such challenges​‌ is certainly rewarding, statistical​​ validation lies on more​​​‌ fundamentally grounded bases and​ raises interesting theoretical, algorithmic​‌ and practical insights. Statistical​​ questions can be converted​​​‌ to probability questions by​ the use of probability​‌ models. Once certain assumptions​​ about the mechanisms generating​​​‌ the data are made,​ statistical questions can be​‌ answered using probability theory.​​ However, the proper formulation​​​‌ and checking of these​ probability models is just​‌ as important, or even​​ more important, than the​​​‌ subsequent analysis of the​ problem using these models.​‌ The first question is​​ then how to formulate​​​‌ and evaluate probabilistic models​ for the problem at​‌ hand. The second question​​ is how to obtain​​​‌ answers after a certain​ model has been assumed.​‌ This latter task can​​ be more a matter​​​‌ of applied probability theory,​ and in practice, contains​‌ optimization and numerical analysis.​​

The statify team aims​​​‌ at bringing strengths, at​ a time when the​‌ number of solicitations received​​ by statisticians increases considerably​​​‌ because of the successive​ waves of big data​‌, data science and​​ deep learning. The​​​‌ difficulty is to back​ up our approaches with​‌ reliable mathematics while what​​ we have is often​​​‌ only empirical observations that​ we are not able​‌ to explain. Guiding data​​ analysis with statistical justification​​​‌ is a challenge in​ itself. statify has the​‌ ambition to play a​​ role in this task​​​‌ and to provide answers​ to questions about the​‌ appropriate usage of statistics.​​

Often statistical assumptions do​​​‌ not hold. Under what​ conditions then can we​‌ use statistical methods to​​ obtain reliable knowledge? These​​ conditions are rarely the​​​‌ natural state of complex‌ systems. The central motivation‌​‌ of statify is to​​ establish the conditions under​​​‌ which statistical assumptions and‌ associated inference procedures approximately‌​‌ hold and become reliable.​​

However, as George Box​​​‌ said "Statisticians and artists‌ both suffer from being‌​‌ too easily in love​​ with their models". To​​​‌ moderate this risk, we‌ choose to develop, in‌​‌ the team, expertise from​​ different statistical domains to​​​‌ offer different solutions to‌ attack a variety of‌​‌ problems. This is possible​​ because these domains share​​​‌ the same mathematical food‌ chain, from probability and‌​‌ measure theory to statistical​​ modeling, inference and data​​​‌ analysis.

Our goal is‌ to exploit methodological resources‌​‌ from statistics and machine​​ learning to develop models​​​‌ that handle variability and‌ that scale to high‌​‌ dimensional data while maintaining​​ our ability to assess​​​‌ their correctness, typically the‌ uncertainty associated with the‌​‌ provided solutions. To reach​​ this goal, the team​​​‌ offers a unique range‌ of expertise in statistics,‌​‌ combining probabilistic graphical models​​ and mixture models to​​​‌ analyze structured data, Bayesian‌ analysis to model knowledge‌​‌ and regularize ill-posed problems,​​ non-parametric statistics, risk modeling​​​‌ and extreme value theory‌ to face the lack,‌​‌ or impossibility, of precise​​ modeling information and data.​​​‌ In the team, this‌ expertise is organized to‌​‌ target five key challenges:​​

  • 1.
    Models for high​​​‌ dimensional, multimodal, heterogeneous data;‌
  • 2.
    Spatial (structured) data‌​‌ science;
  • 3.
    Scalable Bayesian​​ models and procedures;
  • 4.​​​‌
    Understanding mathematical properties of‌ statistical and machine learning‌​‌ methods;
  • 5.
    The big​​ problem of small data.​​​‌

The first two challenges‌ address sources of complexity‌​‌ coming from data, namely,​​ the fact that observations​​​‌ can be: 1) high‌ dimensional, collected from multiple‌​‌ sensors in varying conditions​​ i.e. multimodal and heterogeneous​​​‌ and 2) inter-dependent with‌ a known structure between‌​‌ variables or with unknown​​ interactions to be discovered.​​​‌ The other three challenges‌ focus on providing reliable‌​‌ and interpretable models: 3)​​ making the Bayesian approach​​​‌ scalable to handle large‌ and complex data; 4)‌​‌ quantifying the information processing​​ properties of machine learning​​​‌ methods and 5) allowing‌ to draw reliable conclusions‌​‌ from datasets that are​​ too small or not​​​‌ large enough to be‌ used for training machine/deep‌​‌ learning methods.

These challenges​​ rely on our four​​​‌ research axes:

  • 1.
    Models‌ for graphs and networks;‌​‌
  • 2.
    Dimension reduction and​​ latent variable modeling;
  • 3.​​​‌
    Bayesian modeling;
  • 4.
    Modeling‌ and quantifying extreme risk.‌​‌

In terms of applied​​ work, we will target​​​‌ high-impact applications in neuroimaging,‌ environmental and earth sciences.‌​‌

3 Research program

3.1​​ Models for graphs and​​​‌ networks

Participants: Jean-Baptiste Durand‌, Florence Forbes,‌​‌ Julyan Arbel, Sophie​​ Achard, Michel Dojat​​​‌, Julien Chevallier.‌

Keywords: graphical models, Markov‌​‌ properties, hidden Markov models,​​ clustering, missing data, mixture​​​‌ of distributions, EM algorithm,‌ image analysis, Bayesian inference.‌​‌

Graphs arise naturally as​​ versatile structures for capturing​​​‌ the intrinsic organization of‌ complex datasets. The literature‌​‌ on graphical modeling is​​ growing rapidly and covers​​​‌ a wide range of‌ applications, from bioinformatics to‌​‌ document modeling, image analysis,​​​‌ social network analysis, etc.​ When faced with multivariate,​‌ possibly high dimensional, data​​ acquired at different sites​​​‌ (or nodes) and structured​ according to an underlying​‌ network (or graph), the​​ objective is generally to​​​‌ understand the dependencies or​ associations present in the​‌ data so as to​​ provide a more accurate​​​‌ statistical analysis and a​ better understanding of the​‌ phenomenon under consideration.

Structure​​ learning.

This refers to​​​‌ the inference of the​ existing dependences between variables​‌ from observed samples. The​​ limits of obtaining graph​​​‌ edges using sample correlation​ between nodes is well​‌ known. We have investigated​​ alternative approaches, both Bayesian​​​‌ and frequentist, the former​ were rather used to​‌ account for constraints on​​ the structure while for​​​‌ the latter we focused​ on robust modeling and​‌ estimation in presence of​​ outliers. We proposed a​​​‌ fast Bayesian structure learning​ based on pre-screening of​‌ categorical variables, in the​​ PhD thesis of T.​​​‌ Rahier with Schneider Electric.​ In the continuous variable​‌ case, we studied the​​ design of tractable estimators​​​‌ and algorithms that can​ provide robust estimation of​‌ covariance structures. Many covariance​​ estimation methods rely on​​​‌ the Gaussian graphical model​ but a viable model​‌ for data contaminated by​​ outliers requires the use​​​‌ of more robust and​ complex procedures and is​‌ therefore more challenging to​​ build. Then, the problem​​​‌ of robust structure learning​ is especially acute in​‌ the high-dimensional setting, in​​ which the number of​​​‌ variables p is of​ the same order or​‌ is much larger than​​ the number of available​​​‌ observations n. We​ have investigated different ways​‌ to handle both the​​ above mentioned issues, in​​​‌ order to provide models​ for application such as​‌ modeling brain connectivity from​​ functional magnetic resonance imaging​​​‌ (fMRI) data. Each brain​ region is associated with​‌ a time series, and​​ the goal is to​​​‌ study the connectivity among​ these regions. Interactions between​‌ the regions can be​​ described by covariance or​​​‌ precision matrices that quantify​ the links between time​‌ series and can then​​ be represented as graphs.​​​‌ We have first proposed​ an approach, initiated with​‌ the PhD of K.​​ Ashurbekova, to generalize the​​​‌ Gaussian approach to multivariate​ heavy-tailed distributions with dimensionality​‌ relatively larger than the​​ number of observations. This​​​‌ encompasses methods related to​ shrinkage and M-estimators for​‌ which we aimed at​​ designing algorithms with proved​​​‌ convergence results and optimal​ values for shrinkage coefficients.​‌ Second, still motivated by​​ the brain connectivity application,​​​‌ we have investigated in​ the PhD of H.​‌ Lbath (QFunC project), the​​ possibility to compute more​​​‌ subtle correlations between brain​ regions using a new​‌ notion of correlation of​​ local averages. At last,​​​‌ to go beyond the​ Gaussian assumption, we also​‌ investigated copulas approaches or​​ characterized graphical dependencies for​​​‌ multivariate counts, with potential​ applications to branching processes.​‌

Structure modelling.

Once the​​ structure is identified, the​​​‌ following questions are about​ comparing the discovered graph​‌ structures together, or with​​ regards to a reference​​​‌ graph. If the structure​ is not itself the​‌ object of consideration, the​​ goal is usually to​​ account for it in​​​‌ a subsequent analysis. Except‌ for simple graphs (chains‌​‌ or trees), this is​​ problematic because mainstream statistical​​​‌ models and algorithms are‌ based on the independence‌​‌ assumption and become intractable​​ for even moderate graph​​​‌ sizes. The analysis of‌ graphs as the objects‌​‌ of interest with the​​ design of tools to​​​‌ model and compare them‌ has been studied in‌​‌ the PhD of L.​​ Carboni. We proposed new​​​‌ mathematical tools based on‌ equivalence relation between graph‌​‌ statistics in order to​​ be able to take​​​‌ into account the location‌ in space of the‌​‌ nodes. To account for​​ dependences in a tractable​​​‌ way we often rely‌ on Markov modelling and‌​‌ variational inference. When dependence​​ in time is considered,​​​‌ Gaussian processes are an‌ interesting tractable tool. With‌​‌ the PhD of A.​​ Constantin, we have investigated​​​‌ those in the context‌ of a collaboration with‌​‌ INRAE and CNES in​​ Toulouse, for the classification​​​‌ and reconstruction of irregularly‌ sampled satellite image times‌​‌ series. The proposed approach​​ is able to deal​​​‌ with irregular temporal sampling‌ and missing data directly‌​‌ in the classification process.​​ It is based on​​​‌ Gaussian processes and allows‌ to perform jointly the‌​‌ classification of the pixel​​ labels as well as​​​‌ the reconstruction of the‌ pixel time series. The‌​‌ method complexity scales linearly​​ with the number of​​​‌ pixels, making it amenable‌ in large scale scenario.‌​‌ In a different context,​​ we have developed hidden​​​‌ semi-Markov models for the‌ analysis of eye movements,‌​‌ in particular with the​​ PhD of B. Olivier​​​‌ in collaboration with A.‌ Guérin-Dugué (GIPSA-lab) and B.‌​‌ Lemaire (Laboratoire de Psychologie​​ et Neurocognition). New coupling​​​‌ methods for hidden semi-Markov‌ models driven by several‌​‌ underlying state processes have​​ been proposed.

Structured anomaly​​​‌ detection.

The vast majority‌ of deep learning architectures‌​‌ for medical image analysis​​ are based on supervised​​​‌ models requiring the collection‌ of large datasets of‌​‌ annotated examples. Building such​​ annotated datasets, which requires​​​‌ skilled medical experts, is‌ time consuming and hardly‌​‌ achievable, especially for some​​ specific tasks, including the​​​‌ detection of small and‌ subtle lesions that are‌​‌ sometimes impossible to visually​​ detect and thus manually​​​‌ outline. This critical aspect‌ significantly impairs performances of‌​‌ supervised models and hampers​​ their deployment in clinical​​​‌ neuroimaging applications, especially for‌ brain pathologies that require‌​‌ the detection of small​​ size lesions (e.g.​​​‌ multiple sclerosis, microbleeds) or‌ subtle structural or morphological‌​‌ changes (e.g. Parkinson's​​ disease). We have developed​​​‌ unsupervised anomaly detection methods‌ based on generalized Student‌​‌ mixture models and deep​​ statistical unsupervised learning model​​​‌ for the detection of‌ early forms of Parkinson's‌​‌ disease. We have also​​ compared parametric mixture approaches​​​‌ to non parametric machine‌ learning techniques for change‌​‌ detection in the context​​ of time series analysis​​​‌ of glycemic curves for‌ diabetes.

3.2 Dimension reduction‌​‌ and latent variable modeling​​

Participants: Jean-Baptiste Durand,​​​‌ Florence Forbes, Stephane‌ Girard, Julyan Arbel‌​‌, Pedro Luiz Coelho​​ Rodrigues.

Keywords: mixture​​​‌ of distributions, EM algorithm,‌ missing data, conditional independence,‌​‌ statistical pattern recognition, clustering,​​​‌ unsupervised and partially supervised​ learning.

Extracting information from​‌ raw data is a​​ complex task, all the​​​‌ more so as this​ information is measured in​‌ a high dimensional space.​​ Fortunately, this information usually​​​‌ lives in a subspace​ of smaller size. Identifying​‌ this subspace is crucial​​ but difficult. One approach​​​‌ is to perform appropriate​ changes of representation that​‌ facilitate the identification and​​ characterization of the desired​​​‌ subspace. Latent random variables​ are a key concept​‌ to encode in a​​ structured way representations that​​​‌ are easier to handle​ and capture the essential​‌ features of the data.​​

Regression in high dimensions.​​​‌

Methods adapted to high​ dimensions include inverse regression​‌ methods, i.e. SIR, partial​​ least squares (PLS), approaches​​​‌ based on mixtures of​ regressions with different variants,​‌ e.g. Gaussian locally linear​​ mapping (GLLiM) and extensions,​​​‌ Mixtures of Experts, cluster​ weighted models, etc. SIR-like​‌ methods are flexible in​​ that they reduce the​​​‌ dimension in a way​ optimal for the subsequent​‌ regression task that can​​ itself be carried out​​​‌ by any desired regression​ tool. In that sense​‌ these methods are said​​ to be non parametric​​​‌ or semi-parametric and they​ have a potential to​‌ provide robust procedures. We​​ have also proposed a​​​‌ new approach, called Extreme-PLS,​ for dimension reduction in​‌ conditional extreme values settings​​ where the goal is​​​‌ to best explain the​ extreme values of the​‌ response variable.

Simulation-based inference​​ (SBI) for high dimensional​​​‌ inverse problems.

To account​ for uncertainty in a​‌ principled manner, we also​​ considered Bayesian inversion techniques.​​​‌ We investigated the use​ of learning approaches to​‌ handle Bayesian inverse problems​​ in a computationally efficient​​​‌ way when the observations​ to be inverted present​‌ a moderately high number​​ of dimensions and are​​​‌ in large number. We​ proposed tractable inverse regression​‌ approaches, based on GLLiM​​ and normalizing flows. They​​​‌ have the advantage to​ produce full probability distributions​‌ as approximations of the​​ target posterior distributions. These​​​‌ distributions have several interesting​ features. They provide confidence​‌ indices on the predictions​​ and can be combined​​​‌ with importance sampling or​ approximate Bayesian computation (ABC)​‌ schemes for a better​​ exploration when multiple equivalent​​​‌ solutions exist. They generalise​ easily to variants that​‌ can handle non Gaussian​​ data, dependent or missing​​​‌ observations. The relevance of​ the proposed approach has​‌ been illustrated on synthetic​​ examples and on two​​​‌ real data applications, in​ the context of planetary​‌ remote sensing and neuroimaging.​​ In addition, we addressed​​​‌ the issue of model​ selection for some of​‌ the GLLiM models, i.e.​​ Mixture of experts (MoE)​​​‌ models and contributed to​ a number of theoretical​‌ results.

Online and incremental​​ inference.

Most SBI methods​​​‌ scale poorly when the​ number of observations is​‌ too large, which makes​​ them unsuitable for modern​​​‌ data, which are often​ acquired in real time,​‌ in an incremental nature,​​ and are often available​​​‌ in large volume. Computation​ of inferential quantities in​‌ an incremental manner may​​ be forcibly imposed by​​​‌ the nature of data​ acquisition (e.g. streaming​‌ and sequential data) but​​ may also be seen​​ as a solution to​​​‌ handle larger data volumes‌ in a more resource‌​‌ friendly way, with respect​​ to memory, energy, and​​​‌ time consumption. To produce‌ feasible and practical online‌​‌ algorithms for streaming data​​ and complex models, we​​​‌ have investigated the family‌ of stochastic approximation (SA)‌​‌ algorithms combined with the​​ class of majorization-minimization (MM)​​​‌ and expectation-maximization (EM) algorithms‌ for a certain class‌​‌ of models, e.g.,​​ exponential family distributions and​​​‌ their mixtures.

3.3 Bayesian‌ modelling

Participants: Julyan Arbel‌​‌, Florence Forbes,​​ Jean-Baptiste Durand, Pedro​​​‌ Coelho Rodrigues.

Keywords:‌ Bayesian statistics, Bayesian nonparametrics,‌​‌ Markov Chain Monte Carlo,​​ Experimental design, Bayesian neural​​​‌ networks, Approximate Bayesian Computation.‌

Bayesian methods have become‌​‌ the center of attraction​​ to model the underlying​​​‌ uncertainty of statistical models.‌ Bayesian models and methods‌​‌ are already used in​​ all of our other​​​‌ axes, whenever the Bayesian‌ choice provides interesting features,‌​‌ e.g. for model selection,​​ dependence modeling (copulas), inverse​​​‌ problems, etc. This axis‌ emphasizes more specifically our‌​‌ theoretical and methodological research​​ in Bayesian learning. In​​​‌ particular, we will focus‌ on techniques referred to‌​‌ as Bayesian nonparametrics (BNP).​​

Markov priors for Bayesian​​​‌ nonparametric models.

We have‌ proposed Bayesian nonparametric priors‌​‌ for hidden Markov random​​ fields, first for continuous,​​​‌ Gaussian observations with an‌ illustration in image segmentation.‌​‌ Second, for discrete observed​​ data typically issued from​​​‌ counts, e.g. Poisson distributed‌ observations with an illustration‌​‌ on risk mapping model.​​ The inference was done​​​‌ by Variational Bayesian Expectation‌ Maximization (VBEM).

Asymptotic properties‌​‌ of BNP models.

A​​ common way to assess​​​‌ a Bayesian procedure is‌ to study the asymptotic‌​‌ behavior of posterior distributions,​​ that is their ability​​​‌ to estimate a true‌ distribution when the number‌​‌ of observations grows. Mixture​​ models have attracted a​​​‌ lot of attention in‌ the last decade due‌​‌ to some negative results​​ regarding the number of​​​‌ clusters. More specifically, it‌ was shown that Bayesian‌​‌ nonparametric mixture models are​​ inconsistent for some choices​​​‌ of priors. We proposed‌ ways to compute the‌​‌ prior distribution of the​​ number of clusters. This​​​‌ is a notoriously difficult‌ task, and we proposed‌​‌ approximations in order to​​ enable such computations for​​​‌ real-world applications. We studied‌ and justified BNP models‌​‌ based on their asymptotic​​ properties. We showed that​​​‌ mixture models based on‌ many different BNP processes‌​‌ are inconsistent in the​​ number of clusters and​​​‌ discuss possible solutions. Notably,‌ we showed that a‌​‌ post-processing algorithm introduced for​​ the simplest process (Dirichlet​​​‌ process) extends to more‌ general models and provides‌​‌ a consistent method to​​ estimate the number of​​​‌ components.

Amortized Approximate Bayesian‌ computation.

Approximate Bayesian computation‌​‌ (ABC) has become an​​ essential part of the​​​‌ Bayesian toolbox for addressing‌ problems in which the‌​‌ likelihood is prohibitively expensive​​ or entirely unknown. A​​​‌ key ingredient in ABC‌ is the choice of‌​‌ a discrepancy that describes​​ how different the simulated​​​‌ and observed data are,‌ often based on a‌​‌ set of summary statistics​​ when the data cannot​​​‌ be compared directly. The‌ choice of the appropriate‌​‌ discrepancies is an active​​​‌ research topic, which has​ mainly considered data discrepancies​‌ requiring samples of observations​​ or distances between summary​​​‌ statistics. We have first​ investigated sample-based discrepancies and​‌ established new asymptotic results​​ using so-called energy-based distances.​​​‌ We have then considered​ a summary-based approach and​‌ proposed a new ABC​​ procedure that can be​​​‌ seen as an extension​ of the semi-automatic ABC​‌ framework to a functional​​ summary statistics setting and​​​‌ can also be used​ as an alternative to​‌ sample-based approaches. The resulting​​ ABC approach also exhibits​​​‌ amortization properties via the​ use of the GLLiM​‌ inverse regression model.

Bayesian​​ neural networks.

The connection​​​‌ between Bayesian neural networks​ and Gaussian processes gained​‌ a lot of attention​​ in the last few​​​‌ years, with the flagship​ result that hidden units​‌ converge to a Gaussian​​ process limit when the​​​‌ layers width tends to​ infinity. Underpinning this result​‌ is the fact that​​ hidden units become independent​​​‌ in the infinite-width limit.​ Our aim is to​‌ shed some light on​​ hidden units dependence properties​​​‌ in practical finite-width Bayesian​ neural networks. In addition​‌ to theoretical results, we​​ assessed empirically the depth​​​‌ and width impacts on​ hidden units dependence properties.​‌ Hidden units are proven​​ to follow a Gaussian​​​‌ process limit when the​ layer width tends to​‌ infinity. Recent work has​​ suggested that finite Bayesian​​​‌ neural networks may outperform​ their infinite counterparts because​‌ they adapt their internal​​ representations flexibly. To establish​​​‌ solid ground for future​ research on finite-width neural​‌ networks, our goal is​​ to study the prior​​​‌ induced on hidden units.​ Our main result is​‌ an accurate description of​​ hidden units tails which​​​‌ shows that unit priors​ become heavier-tailed going deeper,​‌ thanks to the introduced​​ notion of generalized Weibull-tail.​​​‌ This finding sheds light​ on the behavior of​‌ hidden units of finite​​ Bayesian neural networks.

3.4​​​‌ Modelling and quantifying extreme​ risk

Participants: Julyan Arbel​‌, Stephane Girard,​​ Florence Forbes, Sophie​​​‌ Achard, Jonathan El​ Methni.

Keywords: dimension​‌ reduction, extreme value analysis,​​ functional estimation.

Extreme events​​​‌ have a major impact​ on a wide variety​‌ of domains from environmental​​ sciences (heat waves, flooding),​​​‌ reliability, to finance and​ insurance (financial crashes, reinsurance).​‌ While usual statistical approaches​​ focus on the modeling​​​‌ of the bulk of​ the distribution, extreme-value analysis​‌ aims at building models​​ adapted to distribution tails,​​​‌ where by nature, observations​ are rare. Extreme value​‌ analysis is a relatively​​ recent domain in statistics​​​‌ focusing on distribution tails.​

Extreme quantile estimation.

One​‌ of the most popular​​ risk measures is the​​​‌ Value-at-Risk (VaR) introduced in​ the 1990’s. In statistical​‌ terms, the VaR at​​ level α(​​​‌0,1)​ corresponds to the upper​‌ α-quantile of the​​ loss distribution. We have​​​‌ proposed estimators and studied​ their theoretical properties for​‌ extreme quantiles, that is​​ when α0​​​‌. We have also​ investigated Weissman extrapolation device​‌ for estimating extreme quantiles​​ from heavy-tailed distributions. This​​​‌ is based on two​ estimators: an order statistic​‌ to estimate an intermediate​​ quantile and an estimator​​ of the tail-index. The​​​‌ common practice is to‌ select the same intermediate‌​‌ sequence for both estimators.​​ We showed how an​​​‌ adapted choice of two‌ different intermediate sequences leads‌​‌ to a reduction of​​ the asymptotic bias associated​​​‌ with the resulting refined‌ Weissman estimator. This new‌​‌ bias reduction method is​​ fully automatic and does​​​‌ not involve the selection‌ of extra parameters.

New‌​‌ measures of extreme risk.​​

A simple way to​​​‌ assess the (environmental, industrial‌ or financial) risk is‌​‌ to compute a measure​​ linked to the value​​​‌ of the phenomena of‌ interest (rainfall height, wind‌​‌ speed, river flow). Candidate​​ measures include quantiles (which​​​‌ correspond to traditional Value‌ at Risk or return‌​‌ levels), expectiles, tail conditional​​ moments, spectral risk measures,​​​‌ distorsion risk measures, etc.‌ We have mainly focused‌​‌ on the first two​​ measures, quantiles and expectiles,​​​‌ and investigated estimation procedures‌ for extensions of these‌​‌ measures. The main drawback​​ of quantiles is that​​​‌ they do not provide‌ a coherent risk measure.‌​‌ Two distributions may have​​ the same extreme quantile​​​‌ but very different tail‌ behaviors. Moreover, standard estimators‌​‌ do not use the​​ most extreme values of​​​‌ the sample and consequently‌ induce a loss of‌​‌ information. Our strategy was​​ to adapt the definition​​​‌ of quantiles to take‌ into account the whole‌​‌ distribution tail.

We have​​ introduced new measures of​​​‌ extreme risk based on‌ Lp- quantiles‌​‌ encompassing both expectiles and​​ quantiles. We believe this​​​‌ generalization of the concept‌ of extreme quantile to‌​‌ extreme Lp-​​ quantile opens promising new​​​‌ research directions. We have‌ first explored to what‌​‌ extent univariate extreme-value estimators​​ can be improved on​​​‌ the basis of these‌ novel Lp-‌​‌ quantiles. We built tractable​​ estimators of these quantities​​​‌ with guaranteed theoretical properties.‌

Extremes with covariates.

A‌​‌ second challenge was to​​ extend this concept to​​​‌ the regression framework where‌ the variable of interest‌​‌ depends on a set​​ of covariates. When the​​​‌ number of covariates is‌ large, two research directions‌​‌ have been explored to​​ overcome the curse of​​​‌ dimensionality: 1) we designed‌ a dimension reduction method‌​‌ for the extreme-value context,​​ 2) we also considered​​​‌ semi-parametric models to reduce‌ the complexity of the‌​‌ fitted model.

Another challenge​​ with expectiles is that​​​‌ their sample versions do‌ not benefit from a‌​‌ simple explicit form, making​​ their analysis significantly harder​​​‌ than that of quantiles‌ and order statistics. This‌​‌ difficulty is compounded when​​ one wishes to integrate​​​‌ auxiliary information about the‌ phenomenon of interest through‌​‌ a finite-dimensional covariate, in​​ which case the problem​​​‌ becomes the estimation of‌ conditional expectiles. We exploited‌​‌ the fact that the​​ expectiles of a distribution​​​‌ are in fact the‌ quantiles of another distribution‌​‌ explicitly linked to the​​ former one, in order​​​‌ to construct nonparametric kernel‌ estimators of extreme conditional‌​‌ expectiles. We analyze the​​ asymptotic properties of our​​​‌ estimators in the context‌ of conditional heavy tailed‌​‌ distributions. The extension to​​ functional covariates was investigated.​​​‌ Since quantiles and expectiles‌ belong to the wider‌​‌ family of Lp​​​‌-quantiles, we also​ proposed to construct kernel​‌ estimators of extreme conditional​​ Lp-quantiles.​​​‌ We studied their asymptotic​ properties in the context​‌ of conditional heavy-tailed distributions​​ and we showed through​​​‌ a simulation study that​ taking p(​‌1,2)​​ may allow to recover​​​‌ extreme conditional quantiles and​ expectiles accurately.

We built​‌ a general theory for​​ the estimation of extreme​​​‌ conditional expectiles in heteroscedastic​ regression models with heavy-tailed​‌ noise. Our approach is​​ supported by general results​​​‌ of independent interest on​ residual-based extreme value estimators​‌ in heavy-tailed regression models,​​ and is intended to​​​‌ cope with covariates having​ a large but fixed​‌ dimension. We demonstrated how​​ our results could be​​​‌ applied to a wide​ class of important examples,​‌ among which linear models,​​ single-index models as well​​​‌ as ARMA and GARCH​ time series models.

Extremes​‌ and machine learning.

This​​ is the topic of​​​‌ a more recent collaboration​ with E. Gobet from​‌ CMAP. Feedforward neural networks​​ based on Rectified linear​​​‌ units (ReLU) cannot efficiently​ approximate quantile functions which​‌ are not bounded, especially​​ in the case of​​​‌ heavy-tailed distributions. We have​ thus proposed a new​‌ parametrization for the generator​​ of a Generative adversarial​​​‌ network (GAN) adapted to​ this framework, basing on​‌ extreme-value theory. We provided​​ an analysis of the​​​‌ uniform error between the​ extreme quantile and its​‌ GAN approximation. It appears​​ that the rate of​​​‌ convergence of the error​ is mainly driven by​‌ the second-order parameter of​​ the data distribution. A​​​‌ similar investigation has been​ conducted to simulate fractional​‌ Brownian motion with ReLU​​ neural networks.

4 Application​​​‌ domains

4.1 Image Analysis​

Participants: Florence Forbes,​‌ Jean-Baptiste Durand, Stephane​​ Girard, Pedro Coelho​​​‌ Rodrigues, Sophie Achard​, Michel Dojat.​‌

As regards applications, several​​ areas of image analysis​​​‌ can be covered using​ the tools developed in​‌ the team. More specifically,​​ we have addressed various​​​‌ issues in computer vision​ involving Bayesian modelling and​‌ probabilistic clustering techniques. Other​​ applications in medical imaging​​​‌ are natural. We work​ more specifically on MRI​‌ and functional MRI data,​​ in collaboration with the​​​‌ Grenoble Institute of Neuroscience​ (GIN). We also consider​‌ other statistical 2D fields​​ coming from other domains​​​‌ such as remote sensing,​ in collaboration with the​‌ Institut de Planétologie et​​ d'Astrophysique de Grenoble (IPAG)​​​‌ and the Centre National​ d'Etudes Spatiales (CNES).

4.2​‌ Biology, Environment and Medicine​​

Participants: Florence Forbes,​​​‌ Stephane Girard, Jean-Baptiste​ Durand, Julyan Arbel​‌, Sophie Achard,​​ Pedro Coelho Rodrigues,​​​‌ Julien Chevallier, Michel​ Dojat, Jonathan El​‌ Methni.

A second​​ domain of applications concerns​​​‌ biology and medicine. We​ considered the use of​‌ mixture models to identify​​ biomakers. We also investigated​​​‌ statistical tools for the​ analysis of fluorescence signals​‌ in molecular biology. Applications​​ in neurosciences are also​​​‌ considered. In the environmental​ domain, we considered the​‌ modelling of high-impact weather​​ events and the use​​​‌ of hyperspectral data as​ a new tool for​‌ quantitative ecology.

5 Social​​ and environmental responsibility

5.1​​ Footprint of research activities​​​‌

The footprint of our‌ research activities has not‌​‌ been assessed yet. Most​​ of the team members​​​‌ have validated the “charte‌ d'éco-responsabilité” written by a‌​‌ working group from Laboratoire​​ Jean Kuntzmann, which should​​​‌ have practical implications in‌ the near future.

5.2‌​‌ Impact of research results​​

A lot of our​​​‌ developments are motivated by‌ and target applications in‌​‌ medicine and environmental sciences.​​ As such they have​​​‌ a social impact with‌ a better handling and‌​‌ treatment of patients, in​​ particular with brain diseases​​​‌ or disorders. On the‌ environmental side, our work‌​‌ has an impact on​​ geoscience-related decision making with​​​‌ e.g. extreme events risk‌ analysis, planetary science studies‌​‌ and tools to assess​​ biodiversity markers. However, how​​​‌ to truly measure and‌ report this impact in‌​‌ practice is another question​​ we have not really​​​‌ addressed yet.

6 Latest‌ software developments, platforms, open‌​‌ data

6.1 Latest software​​ developments

6.1.1 Planet-GLLiM

  • Name:​​​‌
    Planet-GLLiM
  • Keyword:
    Inverse problem‌
  • Functional Description:
    The application‌​‌ implements the GLLiM statistical​​ learning technique in its​​​‌ different variants for the‌ inversion of a physical‌​‌ model of reflectance on​​ spectro-(gonio)-photometric data. The latter​​​‌ are of two types:‌ 1. laboratory measurements of‌​‌ reflectance spectra acquired according​​ to different illumination and​​​‌ viewing geometries, 2. and‌ 4D spectro-photometric remote sensing‌​‌ products from multi-angular CRISM​​ or Pléiades acquisitions.
  • URL:​​​‌
  • Publications:
  • Contact:
    Sylvain Douté‌​‌
  • Participant:
    5 anonymous participants​​
  • Partner:
    Institut de Planétologie​​​‌ et d’Astrophysique de Grenoble‌

6.1.2 xLLiM (Kernelo)

  • Name:‌​‌
    xLLiM
  • Keywords:
    Inverse problem,​​ Clustering, Regression, Gaussian mixture,​​​‌ Python, C++
  • Scientific Description:‌
    Building a regression model‌​‌ for the purpose of​​ prediction is widely used​​​‌ in all disciplines. A‌ large number of applications‌​‌ consists of learning the​​ association between responses and​​​‌ predictors and focusing on‌ predicting responses for the‌​‌ newly observed samples. In​​ this work, we go​​​‌ beyond simple linear models‌ and focus on predicting‌​‌ low-dimensional responses using high-dimensional​​ covariates when the associations​​​‌ between responses and covariates‌ are non-linear.
  • Functional Description:‌​‌
    xLLiM is a Gaussian​​ Locally-Linear Mapping (GLLiM) solver.​​​‌ xLLiM provides a C++‌ library with Python bindings‌​‌ for non linear mapping​​ (non linear regression) using​​​‌ a mixture of regression‌ model and an inverse‌​‌ regression strategy. The methods​​ include the GLLiM model​​​‌ (Deleforge et al (2015)‌ ) based on Gaussian‌​‌ mixtures.
  • URL:
  • Publications:​​
  • Contact:
    Florence Forbes‌
  • Participant:
    6 anonymous participants‌​‌
  • Partner:
    Institut de Planétologie​​ et d’Astrophysique de Grenoble​​​‌

7 New results

7.1‌ Models for graphs and‌​‌ networks

7.1.1 Leaf Area​​ estimation and Semantic segmentation​​​‌ of forest point clouds‌ using neural networks.

Participants:‌​‌ Jean-Baptiste Durand, Florence​​ Forbes.

Joint work​​​‌ with: Grégoire Vincent‌ and Yuchen Bai, IRD,‌​‌ AMAP, Montpellier, France.

Tropical​​ forests, covering only 7%​​​‌ of the Earth’s land‌ surface, play a disproportionately‌​‌ vital role in biosphere,​​ storing 25% of the​​​‌ terrestrial carbon and contribute‌ to over a third‌​‌ of the global terrestrial​​ productivity. They also recycle​​​‌ about a third of‌ the precipitations through evapotranspiration‌​‌ and thus contribute to​​​‌ generate and maintain a​ humid climate regionally, with​‌ positive effects also extending​​ well beyond the tropics.​​​‌ However, the seasonal variability​ in fluxes between tropical​‌ rainforests and atmosphere is​​ still poorly understood. Better​​​‌ understanding the processes underlying​ flux seasonality in tropical​‌ forests is thus critical​​ to improve our predictive​​​‌ ability on global biogeochemical​ cycles. Leaf area index​‌ (LAI), a key parameter​​ governing water and carbon​​​‌ fluxes, is inadequately characterised,​ necessitating advances in monitoring​‌ technologies such as aerial​​ and terrestrial laser scanning​​​‌ (LiDAR). In this work,​ we address key challenges​‌ in quantifying leaf area​​ in tropical forests using​​​‌ LiDAR technology.

In a​ previous work, we developed​‌ an end-to-end Deep Learning​​ approach for semantic segmentation​​​‌ of Unmanned Aerial Vehicle​ (UAV) Laser Scans (ULS)​‌ in presence of two​​ classes: wood and leaves.​​​‌ This approach is referred​ to as SOUL and​‌ was published at Neurips​​ 2023.

A remaining challenge​​​‌ was the analysis of​ various sources of uncertainty​‌ and biases that affect​​ LAI estimation from LiDAR​​​‌ surveys. These biases include​ limitations in sensor sensitivity​‌ (censoring), unknown clumping of​​ targets, inadequate weighting of​​​‌ multiple LiDAR returns, unknown​ leaf angle distribution, leaf​‌ size, and the presence​​ of woody components within​​​‌ the canopy. Since there​ is currently no efficient​‌ and comprehensive method to​​ obtain the true LAI​​​‌ of a forest plot,​ the study uses simulated​‌ ULS data generated by​​ the DART software based​​​‌ on two forest mock-ups:​ Wytham Woods and RAMI-V​‌ Järvselja Birch Stand. The​​ simulated data mimics the​​​‌ characteristics of real ULS​ data while providing full​‌ access to details about​​ the forest, particularly the​​​‌ LAI. Among the various​ biases, woody components pose​‌ a unique challenge because​​ woody organ structure is​​​‌ naturally different from the​ other sources of bias.​‌ Therefore, our approach prioritises​​ addressing this bias to​​​‌ isolate and understand the​ individual contributions of other​‌ factors of bais in​​ LAI estimation. To eliminate​​​‌ the impact of woody​ components, we propose a​‌ robust protocol that combines​​ the SOUL method with​​​‌ AMAPVox, a ray tracing​ software. Once the woody​‌ component bias removed, a​​ quantitative analysis of the​​​‌ remaining biases is conducted,​ laying the foundation for​‌ future work in this​​ area.

7.1.2 Graph modelling​​​‌ for the study of​ language dynamics

Participants: Sophie​‌ Achard.

Joint work​​ with: Clément Guichet,​​​‌ Monica Bacciu and Martial​ Mermillod from LPNC, Univ.​‌ Grenoble Alpes.

In 21​​, we worked on​​​‌ lifespan oscillatory dynamics in​ lexical production. Lexical production​‌ performances have been associated​​ with cognitive control demands​​​‌ increase with age to​ support efficient semantic access,​‌ thus suggesting an interplay​​ between a domain-general and​​​‌ a language-specific component. Current​ neurocognitive models suggest the​‌ Default Mode Network (DMN)​​ and Fronto-Parietal Network (FPN)​​​‌ connectivity may drive this​ interplay, impacting the trajectory​‌ of production performance with​​ a pivotal shift around​​​‌ midlife. However, the corresponding​ time-varying architecture still needs​‌ clarification. Here, we leveraged​​ MEG resting-state data from​​​‌ healthy adults aged 18–88​ years from a CamCAN​‌ population-based sample. We found​​ that DMN-FPN dynamics shift​​ from anterior-ventral to posterior-dorsal​​​‌ states until midlife to‌ mitigate word-finding challenges, concurrent‌​‌ with heightened alpha-band oscillations.​​ Specifically, sensorimotor integration along​​​‌ this posterior path could‌ facilitate cross-talk with lower-level‌​‌ circuitry as dynamic information​​ flow with more anterior,​​​‌ higher-order cognitive states gets‌ compromised. This suggests a‌​‌ bottom-up, exploitation-based form of​​ cognitive control in the​​​‌ aging brain, highlighting the‌ interplay between abstraction, control,‌​‌ and perceptive-motor systems in​​ preserving lexical production.

7.1.3​​​‌ Link between Graphs and‌ artificial neural networks

Participants:‌​‌ Sophie Achard, Lucrezia​​ Carboni.

Joint work​​​‌ with: Michel Dojat‌ from GIN, Univ. Grenoble‌​‌ Alpes

Artificial neural networks​​ are prone to being​​​‌ fooled by carefully perturbed‌ inputs which cause an‌​‌ egregious misclassification. These adversarial​​ attacks have been the​​​‌ focus of extensive research.‌ Likewise, there has been‌​‌ an abundance of research​​ in ways to detect​​​‌ and defend against them.‌ In 17, we‌​‌ introduce a novel approach​​ of detection and interpretation​​​‌ of adversarial attacks from‌ a graph perspective. For‌​‌ an input image, we​​ compute an associated sparse​​​‌ graph using the layer-wise‌ relevance propagation algorithm (Bach‌​‌ et al., 2015). Specifically,​​ we only keep edges​​​‌ of the neural network‌ with the highest relevance‌​‌ values. Three quantities are​​ then computed from the​​​‌ graph which are then‌ compared against those computed‌​‌ from the training set.​​ The result of the​​​‌ comparison is a classification‌ of the image as‌​‌ benign or adversarial. To​​ make the comparison, two​​​‌ classification methods are introduced:‌ (1) an explicit formula‌​‌ based on Wasserstein distance​​ applied to the degree​​​‌ of node and (2)‌ a logistic regression. Both‌​‌ classification methods produce strong​​ results which lead us​​​‌ to believe that a‌ graph-based interpretation of adversarial‌​‌ attacks is valuable.

7.1.4​​ Benchmark for graph inference​​​‌

Participants: Sophie Achard,‌ Alice Chevaux, Ali‌​‌ Fakhar.

Joint work​​ with: Kevin Polisano,​​​‌ CNRS and Irène Gannaz,‌ Grenoble-INP.

In a series‌​‌ of papers 30,​​ 28, 29,​​​‌ we propose to work‌ on the generation of‌​‌ theoretical correlation matrices with​​ specific sparsity patterns, associated​​​‌ to graph structures. We‌ present a novel approach‌​‌ based on convex optimization,​​ offering greater flexibility compared​​​‌ to existing techniques, notably‌ by controlling the mean‌​‌ of the entry distribution​​ in the generated correlation​​​‌ matrices. This allows for‌ the generation of correlation‌​‌ matrices that better represent​​ realistic data and can​​​‌ be used to benchmark‌ statistical methods for graph‌​‌ inference.

7.1.5 Graphs for​​ coma patients

Participants: Sophie​​​‌ Achard, Michel Dojat‌, Arturo Cabrera Vazquez‌​‌.

Joint work with​​: Stein Silva, CHU​​​‌ Toulouse.

During the first‌ year of Arturo's PhD,‌​‌ we developed several approaches​​ to characterize the brain​​​‌ connectivity of coma patients.‌ The originality of the‌​‌ work is to use​​ multimodal data combining both​​​‌ fMRI and PET TSPO‌ with new graph methods‌​‌ to combine graphs from​​ the two modalities. This​​​‌ work was presented in‌ different conferences 64,‌​‌ 63, 62

7.1.6​​ Biological neural network

Participants:​​​‌ Julien Chevallier.

Joint‌ work with: Eva‌​‌ Löcherbach from Paris 1,​​​‌ Guilherme Ost from UFRJ.​

The main objective is​‌ to estimate the connectivity​​ parameter p of a​​​‌ biological neural network based​ only on the observation​‌ of the action potentials​​ of N neurons over​​​‌ T time units. In​ our main result, we​‌ show that p can​​ be estimated with rate​​​‌ N-1/​2+N1​‌/2/T​​+(log(​​​‌T)/T​)1/2​‌ through an easy-to-compute estimator.​​ Our analysis relies on​​​‌ a precise study of​ the spatio-temporal decay of​‌ correlations of the interacting​​ chains. This is done​​​‌ through the study of​ coalescing random walks defining​‌ a backward regeneration representation​​ of the system.

7.1.7​​​‌ Community detection for binary​ graphical models in high​‌ dimension

Participants: Julien Chevallier​​.

Joint work with​​​‌: Guilherme Ost from​ UFRJ.

The main objective​‌ is to find two​​ the communities (one excitating​​​‌ and one inhibiting) based​ on the observation of​‌ the action potentials of​​ N neurons over T​​​‌ time units. More specifically,​ we propose a simple​‌ algorithm for which the​​ probability of exact recovery​​​‌ converges to 1 as​ long as (N​‌/T1/​​2)log(​​​‌NT)→​0 as T and​‌ N diverge. Interestingly, this​​ simple algorithm does not​​​‌ required any prior knowledge​ on the other model​‌ parameters (e.g. the edge​​ probability p).

7.1.8​​​‌ Contrastive Normalizing Flows for​ anomaly detection in Engineering​‌ Structures

Participants: Florence Forbes​​, Brice Marc.​​​‌

Joint work with:​ Philippe Fouchier and Pierre​‌ Charbonnier from CEREMA endsum,​​ Strasbourg.

Among unsupervised anomaly​​​‌ detection methods in the​ context of civil engineering​‌ (CE) monitoring, those using​​ Normalizing Flows (NF) have​​​‌ reached state-of-the-art performance. Using​ only defect-free images, they​‌ learn to detect anomalies​​ as elements departing from​​​‌ the healthy parts distribution.​ In this work, we​‌ propose to increase the​​ discriminative power of these​​​‌ methods by leveraging the​ possibility to produce synthetic​‌ anomalies. Starting with CFlow-AD,​​ one of the best-performing​​​‌ NF-based methods, we augment​ its loss with different​‌ complementary learning objectives using​​ anomalies generated by POISSON​​​‌ interpolation. In this work​ 32, we demonstrate​‌ the interest of these​​ new augmented losses on​​​‌ several CE-related datasets.

7.1.9​ Coupled hidden Markov and​‌ semi-Markov processes

Participants: Jean-Baptiste​​ Durand.

Joint work​​​‌ with: Hanna Bacave,​ Nathalie Peyrard, Sandra Plancade​‌ and Régis Sabbadin from​​ MIAT INRAE - Unité​​​‌ de Mathématiques et Informatique​ Appliquées de Toulouse; Alain​‌ Franc from Biogeco INRAE,​​ Bordeaux.

The concept of​​​‌ multichain (H)SMM has not​ been already rigorously formalized,​‌ even if a few​​ models have been proposed​​​‌ in the HMM literature.​ We achieved a review​‌ on existing multichain HSMMs​​ and proposed a sound​​​‌ formalization of two classes​ of models that extend​‌ standard and general semi-Markov​​ models to the multichain​​​‌ setting. Then, we addressed​ the hidden framework and​‌ built various classes of​​ multichain-H(S)MMs – M(H)SMMs –​​​‌ that generalize some MHMM​ structures. A generative definition​‌ based on hazard rates​​ instead of probability distribution​​ functions enabled us to​​​‌ account for flexible interactions‌ between dynamics of observed‌​‌ and hidden chains. Adaptation​​ of these general classes​​​‌ into models for practical‌ situations still raises challenges‌​‌ in terms of inference,​​ but also in terms​​​‌ of parameterization. Indeed, the‌ dimension of the functions‌​‌ (hazard rates and probability​​ distribution functions) involved in​​​‌ the multichain distribution increases‌ with the model richness.‌​‌ Details in 71,​​ 68.

—————————————

7.2​​​‌ Latent variable modelling

7.2.1‌ Stochastic Majorization-Minimization with sample-average‌​‌ approximation

Participants: Florence Forbes​​.

Joint work with​​​‌: Hien Nguyen, School‌ of Computing, Engineering and‌​‌ Mathematical Sciences, La Trobe​​ Univ., Bundoora 3086, Victoria​​​‌ Australia, and Institute of‌ Mathematics for Industry, Kyushu‌​‌ Univ., Nishi Ward, Fukuoka​​ 819-0395, Japan, Gersende Fort,​​​‌ IMT and LAAS-CNRS, Université‌ de Toulouse, CNRS, Toulouse.‌​‌

Many statistical inference and​​ machine learning methods rely​​​‌ on the ability to‌ optimize an expectation functional,‌​‌ whose explicit form is​​ intractable. The typical method​​​‌ for conducting such optimization‌ is to approximate the‌​‌ expected value problem by​​ a size-N sample average,​​​‌ often referred to as‌ sample average approximation (SAA)‌​‌ or M-estimation. When the​​ solution to the SAA​​​‌ problem cannot be obtained‌ in closed form, the‌​‌ Majorization-Minimization (MM) algorithm framework​​ constitutes a broad class​​​‌ of incremental optimization solutions,‌ relying on the iterative‌​‌ construction of surrogates, known​​ as majorizers, of the​​​‌ original problem. The ability‌ to solve an SAA‌​‌ problem depends on the​​ availability of all N​​​‌ observations, contemporaneously, which is‌ difficult when N is‌​‌ large or data are​​ observed as a stream.​​​‌ In this work 19‌, we propose a‌​‌ stochastic MM algorithm that​​ solves the expected value​​​‌ problem via iterative SAA‌ majorizer constructions using sequential‌​‌ subsets of data, which​​ we call Sequential Sample​​​‌ Average Majorization-Minimization (SAM2). Compared‌ to previous stochastic MM‌​‌ algorithm variants, our method​​ permit an extended definition​​​‌ of majorizers, and does‌ not rely on convexity‌​‌ assumptions, smoothness assumptions, or​​ restrictions on functional classes​​​‌ for objectives and majorizers.‌ We develop a theory‌​‌ of stochastic convergence for​​ SAM2, made possible via​​​‌ the presentation of a‌ novel double array uniform‌​‌ strong law of large​​ numbers. Examples of SAM2​​​‌ algorithms are given along‌ with a numerical demonstration‌​‌ of SAM2 to quantile​​ regression problems, in the​​​‌ regular and sparse parameter‌ settings, including both convex‌​‌ and non-convex objective functions.​​

7.2.2 Natural Variational Annealing​​​‌ for Multimodal Optimization

Participants:‌ Tam Le Minh,‌​‌ Florence Forbes, Julyan​​ Arbel.

Joint work​​​‌ with: Emtiyaz Khan‌ and Thomas Mollenhoff from‌​‌ Riken, Tokyo, Japan

We​​ introduce a new multimodal​​​‌ optimization approach called Natural‌ Variational Annealing (NVA) that‌​‌ combines the strengths of​​ three foundational concepts to​​​‌ simultaneously search for multiple‌ global and local modes‌​‌ of black-box nonconvex objectives.​​ First, it implements a​​​‌ simultaneous search by using‌ variational posteriors, such as,‌​‌ mixtures of Gaussians. Second,​​ it applies annealing to​​​‌ gradually trade off exploration‌ for exploitation. Finally, it‌​‌ learns the variational search​​ distribution using natural-gradient learning​​​‌ where updates resemble well-known‌ and easy-to-implement algorithms. The‌​‌ three concepts come together​​​‌ in NVA giving rise​ to new algorithms and​‌ also allowing us to​​ incorporate "fitness shaping", a​​​‌ core concept from evolutionary​ algorithms. We assess the​‌ quality of search on​​ simulations and compare them​​​‌ to methods using gradient​ descent and evolution strategies.​‌ We also provide an​​ application to a real-world​​​‌ inverse problem in planetary​ science. More details in​‌ 59. An extension​​ to the situations where​​​‌ only samples are available​ can be found in​‌ 58.

7.2.3 Scalable​​ magnetic resonance fingerprinting: Incremental​​​‌ inference of high dimensional​ elliptical mixtures from large​‌ data volumes

Participants: Florence​​ Forbes, Geoffroy Oudoumanessah​​​‌.

Joint work with​: Luc Meyer from​‌ SED, Michel Dojat, Thomas​​ Coudert, Thomas Christen from​​​‌ Grenoble Institute of Neurosciences,​ Carole Lartizien from Creatis.​‌

Magnetic Resonance Fingerprinting (MRF)​​ is an emerging technology​​​‌ with the potential to​ revolutionize radiology and medical​‌ diagnostics. In comparison to​​ traditional magnetic resonance imaging​​​‌ (MRI), MRF enables the​ rapid, simultaneous, non-invasive acquisition​‌ and reconstruction of multiple​​ tissue parameters, paving the​​​‌ way for novel diagnostic​ techniques. In the original​‌ matching approach, reconstruction is​​ based on the search​​​‌ for the best matches​ between in vivo acquired​‌ signals and a dictionary​​ of high-dimensional simulated signals​​​‌ (fingerprints) with known tissue​ properties. A critical and​‌ limiting challenge is that​​ the size of the​​​‌ simulated dictionary increases exponentially​ with the number of​‌ parameters, leading to an​​ extremely costly subsequent matching.​​​‌ In this work, we​ propose to address this​‌ scalability issue by considering​​ probabilistic mixtures of high-dimensional​​​‌ elliptical distributions, to learn​ more efficient dictionary representations.​‌ Mixture components are modelled​​ as flexible ellipitic shapes​​​‌ in low dimensional subspaces.​ They are exploited to​‌ cluster similar signals and​​ reduce their dimension locally​​​‌ cluster-wise to limit information​ loss. To estimate such​‌ a mixture model, we​​ provide a new incremental​​​‌ algorithm capable of handling​ large numbers of signals,​‌ allowing us to go​​ far beyond the hardware​​​‌ limitations encountered by standard​ implementations. We demonstrate, on​‌ simulated and real data,​​ that our method effectively​​​‌ manages large volumes of​ MRF data with maintained​‌ accuracy. It offers a​​ more efficient solution for​​​‌ accurate tissue characterization and​ significantly reduces the computational​‌ burden, making the clinical​​ application of MRF more​​​‌ practical and accessible. This​ work has been presented​‌ at the International Symposium​​ on Biomedical Imaging (ISBI​​​‌ 2025) 33 and published​ in Statistics and Computing​‌ 60.

7.2.4 Assessing​​ a dose-response relationship after​​​‌ brain radiotherapy via Mixture​ of Regressions

Participants: Florence​‌ Forbes.

Joint work​​ with: Theo Sylvestre,​​​‌ Sophie Ancelet from IRSN.​

Brain radiotherapy (RT) is​‌ one of the key​​ tools in the treatment​​​‌ of tumors of the​ central nervous system (CNS).​‌ However, its potential toxicity​​ to the CNS remains​​​‌ one of the major​ research issues in radioprotection.​‌ In particular, cognitive decline,​​ which may significantly impair​​​‌ the quality of life​ of long-term survivors, has​‌ been reported in patients​​ treated with RT for​​​‌ a brain tumor. The​ intracerebral radiation-induced mechanisms that​‌ could explain this cognitive​​ decline are only partially​​ understood. The EpiBrainRad project,​​​‌ within which the doctoral‌ work of Theo Sylvestre‌​‌ has been conducted, investigates​​ the role that leukoencephalopathy​​​‌ may play in these‌ mechanisms. It is based‌​‌ on data from the​​ EpiBrainRad cohort, which includes​​​‌ patients treated with RT‌ for glioblastoma at Pitié-Salpêtrière‌​‌ Hospital or at the​​ Strasbourg Institute of Oncology.​​​‌

The aim was to‌ demonstrate, if it exists,‌​‌ and to estimate the​​ association between the brain​​​‌ dose and the spatio-temporal‌ progression of irreversible white‌​‌ matter abnormalities characteristic of​​ leukoencephalopathy, identified on MRI​​​‌ as white matter hyperintensities‌ (WMH). It also seeks‌​‌ to provide insights into​​ the radiosensitivity of white​​​‌ matter.

Embedded in the‌ ANR RADIO-AIDE project (itself‌​‌ part of EpiBrainRad), this​​ work relied primarily on​​​‌ imaging data from a‌ sub-cohort of 50 patients‌​‌ from the EpiBrainRad cohort.​​ For each patient, a​​​‌ dosimetric CT scan from‌ which a voxel-wise dose‌​‌ map is extracted is​​ available, along with a​​​‌ longitudinal collection of MRIs‌ in which various brain‌​‌ lesions are segmented.

Three​​ main contributions were made:​​​‌ 1) A preprocessing pipeline‌ for segmented MRIs is‌​‌ proposed to make them​​ suitable for estimating the​​​‌ dose–response association of interest.‌ 2) Longitudinal intra-individual MRI‌​‌ registration and inter-individual registration​​ are performed to enable​​​‌ a population-level voxel-wise analysis‌ on a common brain,‌​‌ in the spirit of​​ voxel-based studies. 3) An​​​‌ algorithm is defined and‌ implemented to distinguish leukoencephalopathy‌​‌ lesions (LL) from edema—both​​ characterized on MRI by​​​‌ WMH—and to correct for‌ brain deformations associated with‌​‌ different lesions.

7.2.5 Massive​​ analysis of multidimensional astrophysical​​​‌ data by inverse regression‌ of physical models

Participants:‌​‌ Florence Forbes.

Joint​​ work with: Sylvain​​​‌ Douté IPAG, Stan Borkowski‌ and Luc Meyer from‌​‌ SED Grenoble

With the​​ tremendous progress made in​​​‌ AI, data acquisition and‌ processing are now possible‌​‌ at a much larger​​ scale. In earth and​​​‌ space (E&S) science, although‌ wider and richer representations‌​‌ are desirable to effectively​​ and quantitatively characterize information,​​​‌ we still struggle to‌ turn them into real-world‌​‌ breakthroughs, partially due to​​ data processing bottlenecks. Computationally​​​‌ efficient modeling and inference‌ techniques have been developed‌​‌ in order to meet​​ computing resource constraints, energy​​​‌ considerations and the inherent‌ complexity of algorithms. However,‌​‌ most approaches are designed​​ for batch data and​​​‌ thus have limitations in‌ processing large amount of‌​‌ data. It thus appears​​ most timely to develop​​​‌ the theory and practice‌ of a new form‌​‌ of learning that targets​​ potentially heterogeneous remote sensing​​​‌ data that are both‌ large in size and‌​‌ dimension, while providing quantitative​​ and rigorous statements about​​​‌ methods performance.

7.2.6 An‌ analysis of distributional reinforcement‌​‌ learning with Gaussian mixtures​​

Participants: Florence Forbes,​​​‌ Henrique Donancio, Mathis‌ Antonetti.

Distributional Reinforcement‌​‌ Learning (DRL) seeks to​​ optimize risk-sensitive objectives by​​​‌ modeling the full return‌ distribution rather than only‌​‌ its expectation. A key​​ challenge is to choose​​​‌ a return distribution representation‌ that allows (i) efficient‌​‌ estimation of risk measures,​​ (ii) tractable optimization, and​​​‌ (iii) sufficient expressiveness. Gaussian‌ mixtures (GM) provide a‌​‌ flexible and powerful representation​​​‌ for this purpose, yet​ they remain underexplored in​‌ DRL, with most existing​​ methods relying on the​​​‌ L2 norm as​ a tractable metric between​‌ GM. In this work​​ 13, we conduct​​​‌ a theoretical and empirical​ study of alternative metrics​‌ for GM-based DRL. We​​ show that the L​​​‌2 norm is not​ suitable and introduce two​‌ principled alternatives: a mixture-specific​​ optimal transport distance (MW)​​​‌ and a maximum mean​ discrepancy (MMD) distance. For​‌ the MW metric, we​​ establish convergence guarantees for​​​‌ a dynamic programming algorithm​ related to temporal-difference (TD)​‌ learning. Leveraging multivariate GM​​ representations, we also highlight​​​‌ the potential of MW​ in multi-objective RL. Experimental​‌ results on selected Atari​​ Learning Environment tasks illustrate​​​‌ the practical benefits of​ the proposed metrics, showing​‌ promising performance.

7.2.7 Dynamic​​ Learning Rate for Deep​​​‌ Reinforcement Learning: A Bandit​ Approach

Participants: Florence Forbes​‌, Henrique Donancio.​​

Joint work with:​​​‌ Leah South, Queensland University​ of Technology, Brisbane Australia​‌ and Antoine Barrier, Grenoble​​ Institute of Neuroscience.

In​​​‌ Deep Reinforcement Learning models​ trained using gradient-based techniques,​‌ the choice of optimizer​​ and its learning rate​​​‌ are crucial to achieving​ good performance: higher learning​‌ rates can prevent the​​ model from learning effectively,​​​‌ while lower ones might​ slow convergence. Additionally, due​‌ to the non-stationarity of​​ the objective function, the​​​‌ best-performing learning rate can​ change over the training​‌ steps. To adapt the​​ learning rate, a standard​​​‌ technique consists of using​ decay schedulers. However, these​‌ schedulers assume that the​​ model is progressively approaching​​​‌ convergence, which may not​ always be true, leading​‌ to delayed or premature​​ adjustments. In this work,​​​‌ we propose dynamic Learning​ Rate for deep Reinforcement​‌ Learning (LRRL), a meta-learning​​ approach that selects the​​​‌ learning rate based on​ the agent's performance during​‌ training. LRRL is based​​ on a multi-armed bandit​​​‌ algorithm, where each arm​ represents a different learning​‌ rate, and the bandit​​ feedback is provided by​​​‌ the cumulative returns of​ the RL policy to​‌ update the arms' probability​​ distribution. Our empirical results​​​‌ demonstrate that LRRL can​ substantially improve the performance​‌ of deep RL algorithms.​​

7.2.8 Bandits and sequential​​​‌ learning

Participants: Julyan Arbel​, Julien Zhou.​‌

Joint work with:​​ Pierre Gaillard (Inria Thoth),​​​‌ Thibaud Rahier (Criteo AI​ Lab).

Bandit algorithms address​‌ the exploration-exploitation trade-off by​​ balancing learning about actions​​​‌ and maximizing cumulative rewards,​ with applications in areas​‌ like online advertising, recommendation​​ systems, and A/B testing.​​​‌ We improve existing regret​ bounds in two settings:​‌ stochastic combinatorial semi-bandits, and​​ online unconstrained submodular maximization​​​‌ with stochastic bandit feedback​ 35.

7.2.9 Optimal​‌ sub-Gaussian variance proxy

Participants:​​ Julyan Arbel.

Joint​​​‌ work with: Mathias​ Barreto (National Research University​‌ Higher School of Economics,​​ Moscow), Olivier Marchal (Institut​​​‌ Camille Jordan, Lyon).

In​ 15, we establish​‌ the optimal sub-Gaussian variance​​ proxy for truncated Gaussian​​​‌ and truncated exponential random​ variables. The proofs rely​‌ on first characterizing the​​ optimal variance proxy as​​​‌ the unique solution to​ a set of two​‌ equations and then observing​​ that for these two​​ truncated distributions, one may​​​‌ find explicit solutions to‌ this set of equations.‌​‌ Moreover, we establish the​​ conditions under which the​​​‌ optimal variance proxy coincides‌ with the variance, thereby‌​‌ characterizing the strict sub-Gaussianity​​ of the truncated random​​​‌ variables. Specifically, we demonstrate‌ that truncated Gaussian variables‌​‌ exhibit strict sub-Gaussian behavior​​ if and only if​​​‌ they are symmetric, meaning‌ their truncation is symmetric‌​‌ with respect to the​​ mean. Conversely, truncated exponential​​​‌ variables are shown to‌ never exhibit strict sub-Gaussian‌​‌ properties. These findings contribute​​ to the understanding of​​​‌ these prevalent probability distributions‌ in statistics and machine‌​‌ learning, providing a valuable​​ foundation for improved and​​​‌ optimal modeling and decision-making‌ processes.

7.2.10 Mixed hidden‌​‌ semi-Markov processes

Participants: Jean-Baptiste​​ Durand.

Joint work​​​‌ with: Nathalie Peyrard,‌ Sandra Plancade, Marie-Josée Cros,‌​‌ Ronan Trépos and Mathieu​​ Valdeyron from MIAT INRAE​​​‌ - Unité de Mathématiques‌ et Informatique Appliquées de‌​‌ Toulouse; Alain Franc from​​ Biogeco INRAE, Bordeaux; Corentin​​​‌ Lothodé, CNRS, Angers; Nicolas‌ Vergne and Caroline Bérard‌​‌ from Université de Rouen​​ Normandie; Irene Vosti from​​​‌ Université de Lorraine, Metz.‌

Parameter estimation in hidden‌​‌ semi-Markov processes is frequently​​ addressed by the EM​​​‌ algorithm or Newton iterative‌ algorithms. These rely on‌​‌ the classical forward-backward recursion.​​ When mixed effects are​​​‌ incorporated in model parameters‌ (emission distributions, transition probabilities‌​‌ and sojourn time distributions),​​ integration of the forward-backward​​​‌ formulas has to be‌ performed, leading to intractable‌​‌ algorithms. As a consequence,​​ further approximations have to​​​‌ be achieved: for example‌ Monte-Carlo EM, Monte-Carlo Newton,‌​‌ variational EM... We produced​​ a state of the​​​‌ art of available methods‌ used in hidden Markov‌​‌ models (HMMs) and hidden​​ semi-Markov models (HSMMs), with​​​‌ a detailed report to‌ the restrictions associated with‌​‌ each algorithm (for example:​​ fixed effects only, random​​​‌ effects in emission distributions‌ only, etc.). We also‌​‌ provided a catalogue of​​ available python and R​​​‌ software, considering also plain‌ HSMMs and Multichain HMMS‌​‌ (see also Section 7.1.9​​). Eventually, a new​​​‌ MCEM algorithm was developed‌ to address the case‌​‌ of HSMMs with mixed​​ effects in all model​​​‌ parameters (emission distributions, transition‌ probabilities and sojourn time‌​‌ distributions), which has never​​ been addressed before. Alternatives​​​‌ are currently being studied‌ in M Valdeyron's doctoral‌​‌ work. Details in 69​​, 70, 72​​​‌.

7.3 Bayesian modelling‌

7.3.1 Convergence of projected‌​‌ stochastic natural gradient variational​​ inference for various step​​​‌ size and sample or‌ batch size schedules

Participants:‌​‌ Florence Forbes, Thomas​​ Guilmeau.

Joint work​​​‌ with: Hadrien Hendrickx‌ from THOTH team.

Stochastic‌​‌ natural gradient variational inference​​ (NGVI) is a popular​​​‌ and efficient algorithm for‌ Bayesian inference. Despite empirical‌​‌ success, the convergence of​​ this method is still​​​‌ not fully understood. In‌ this work, we define‌​‌ and study a projected​​ stochastic NGVI when variational​​​‌ distributions form an exponential‌ family. Stochasticity arises when‌​‌ either gradients are intractable​​ expectations or large sums.​​​‌ We prove new non-asymptotic‌ convergence results for combinations‌​‌ of constant or decreasing​​ step sizes and constant​​​‌ or increasing sample/batch sizes.‌ When all hyperparameters are‌​‌ fixed, NGVI is shown​​​‌ to converge geometrically to​ a neighborhood of the​‌ optimum, while we establish​​ convergence to the optimum​​​‌ with rates of the​ form 𝒪1T​‌ρ, possibly with​​ ρ1,​​​‌ for all other combinations​ of step size and​‌ sample/batch size schedules. These​​ rates apply when the​​​‌ target posterior distribution is​ close in some sense​‌ to the considered exponential​​ family. Our theoretical results​​​‌ extend existing NGVI and​ stochastic optimization results and​‌ provide more flexibility to​​ adjust, in a principled​​​‌ way, step sizes and​ sample/batch sizes in order​‌ to meet speed, resources,​​ or accuracy constraints. More​​​‌ details can be found​ in the paper accepted​‌ at AISTATS 2026.

7.3.2​​ Concentration results for approximate​​​‌ Bayesian computation without identifiability​

Participants: Florence Forbes,​‌ Julyan Arbel.

Joint​​ work with: Hien​​​‌ Nguyen and Trung Tin​ Nguyen, University of Queensland,​‌ Brisbane Australia.

We study​​ the large sample behaviors​​​‌ of approximate Bayesian computation​ (ABC) posterior measures in​‌ situations when the data​​ generating process is dependent​​​‌ on unidentifiable parameters. In​ particular, we establish the​‌ concentration of posterior measures​​ on sets of arbitrarily​​​‌ small measure that contain​ the equivalence set of​‌ the data generative parameter,​​ when the sample size​​​‌ tends to infinity. Our​ theory also makes weak​‌ assumptions regarding the measurement​​ of discrepancy between the​​​‌ data set and simulations.​ In particular, it does​‌ not require the use​​ of summary statistics and​​​‌ is applicable to a​ broad class of kernelized​‌ ABC algorithms. We provide​​ useful illustrations and demonstrations​​​‌ of our theory in​ practice, and offer a​‌ comprehensive assessment of how​​ our findings complement other​​​‌ results in the literature​

7.3.3 Diagnosing convergence of​‌ Markov chain Monte Carlo​​

Participants: Julyan Arbel,​​​‌ Stephane Girard.

Joint​ work with: A. Dutfoy​‌ (EDF R&D) and T.​​ Moins (Ecole Nationale des​​​‌ Chartes, PSL).

Diagnosing convergence​ of Markov chain Monte​‌ Carlo (MCMC) is crucial​​ in Bayesian analysis. Among​​​‌ the most popular methods,​ the potential scale reduction​‌ factor (commonly named R​​^) is an​​​‌ indicator that monitors the​ convergence of output chains​‌ to a stationary distribution,​​ based on a comparison​​​‌ of the between- and​ within-variance of the chains.​‌ Several improvements have been​​ suggested since its introduction​​​‌ in the 90'ss. In​ the PhD work of​‌ Théo Moins, we analyse​​ some properties of the​​​‌ theoretical value R associated​ to R^ in​‌ the case of a​​ localized version that focuses​​​‌ on quantiles of the​ distribution. This leads to​‌ proposing a new indicator​​ 23, which is​​​‌ shown to allow both​ for localizing the MCMC​‌ convergence in different quantiles​​ of the distribution, and​​​‌ at the same time​ for handling some convergence​‌ issues not detected by​​ other R^ versions.​​​‌

7.3.4 Bayesian deep learning​

Participants: Julyan Arbel,​‌ Pierre Wolinski.

25​​ studies feature propagation at​​​‌ initialization in neural networks,​ which lies at the​‌ root of numerous initialization​​ designs. An assumption very​​​‌ commonly made in the​ field states that the​‌ pre-activations are Gaussian. Although​​ this convenient Gaussian hypothesis​​ can be justified when​​​‌ the number of neurons‌ per layer tends to‌​‌ infinity, it is challenged​​ by both theoretical and​​​‌ experimental works for finite-width‌ neural networks. Our major‌​‌ contribution of this work​​ is to construct a​​​‌ family of pairs of‌ activation functions and initialization‌​‌ distributions that ensure that​​ the pre-activations remain Gaussian​​​‌ throughout the network's depth,‌ even in narrow neural‌​‌ networks. In the process,​​ we discover a set​​​‌ of constraints that a‌ neural network should fulfill‌​‌ to ensure Gaussian pre-activations.​​ Additionally, we provide a​​​‌ critical review of the‌ claims of the Edge‌​‌ of Chaos line of​​ works and build an​​​‌ exact Edge of Chaos‌ analysis. We also propose‌​‌ a unified view on​​ pre-activations propagation, encompassing the​​​‌ framework of several well-known‌ initialization procedures. Finally, our‌​‌ work provides a principled​​ framework for answering the​​​‌ much-debated question: is it‌ desirable to initialize the‌​‌ training of a neural​​ network whose pre-activations are​​​‌ ensured to be Gaussian?‌

7.3.5 Bayesian Experimental Design‌​‌ via Contrastive Diffusions.

Participants:​​ Florence Forbes, Jacopo​​​‌ Iollo.

Joint work‌ with: Pierre Alliez,‌​‌ Inria Titane and Christophe​​ Heinkele, Cerema Strasbourg.

Bayesian​​​‌ Optimal Experimental Design (BOED)‌ is a powerful tool‌​‌ to reduce the cost​​ of running a sequence​​​‌ of experiments. When based‌ on the Expected Information‌​‌ Gain (EIG), design optimization​​ corresponds to the maximization​​​‌ of some intractable expected‌ contrast between prior and‌​‌ posterior distributions. Scaling this​​ maximization to high dimensional​​​‌ and complex settings has‌ been an issue due‌​‌ to BOED inherent computational​​ complexity. In this work,​​​‌ we introduce a pooled‌ posterior distribution with cost-effective‌​‌ sampling properties and provide​​ a tractable access to​​​‌ the EIG contrast maximization‌ via a new EIG‌​‌ gradient expression. Diffusion-based samplers​​ are used to compute​​​‌ the dynamics of the‌ pooled posterior and ideas‌​‌ from bi-level optimization are​​ leveraged to derive an​​​‌ efficient joint sampling-optimization loop.‌ The resulting efficiency gain‌​‌ allows to extend BOED​​ to the well-tested generative​​​‌ capabilities of diffusion models.‌ By incorporating generative models‌​‌ into the BOED framework,​​ we expand its scope​​​‌ and its use in‌ scenarios that were previously‌​‌ impractical. Numerical experiments and​​ comparison with state-of-the-art methods​​​‌ show the potential of‌ the approach. This work‌​‌ has been accepted at​​ ICLR 2025 31.​​​‌

7.3.6 Active MRI Acquisition‌ with Diffusion Guided Bayesian‌​‌ Experimental Design.

Participants: Florence​​ Forbes, Jacopo Iollo​​​‌, Geoffroy Oudoumanessah,‌ Michel Dojat.

Joint‌​‌ work with: Carole​​ Lartizien, Creatis Lyon.

A​​​‌ key challenge in maximizing‌ the benefits of Magnetic‌​‌ Resonance Imaging (MRI) in​​ clinical settings is to​​​‌ accelerate acquisition times without‌ significantly degrading image quality.‌​‌ This objective requires a​​ balance between under-sampling the​​​‌ raw k-space measurements for‌ faster acquisitions and gathering‌​‌ sufficient raw information for​​ high-fidelity image reconstruction and​​​‌ analysis tasks. To achieve‌ this balance, we propose‌​‌ to use sequential Bayesian​​ experimental design (BED) to​​​‌ provide an adaptive and‌ task-dependent selection of the‌​‌ most informative measurements. Measurements​​ are sequentially augmented with​​​‌ new samples selected to‌ maximize information gain on‌​‌ a posterior distribution over​​​‌ target images. Selection is​ performed via a gradient-based​‌ optimization of a design​​ parameter that defines a​​​‌ subsampling pattern. In this​ work, we introduce a​‌ new active BED procedure​​ that leverages diffusion-based generative​​​‌ models to handle the​ high dimensionality of the​‌ images and employs stochastic​​ optimization to select among​​​‌ a variety of patterns​ while meeting the acquisition​‌ process constraints and budget.​​ So doing, we show​​​‌ how our setting can​ optimize, not only standard​‌ image reconstruction, but also​​ any associated image analysis​​​‌ task. The versatility and​ performance of our approach​‌ are demonstrated on several​​ MRI acquisitions

7.3.7 Simulation-based​​​‌ inference using score-diffusion: algorithm​ and theoretical analysis

Participants:​‌ Pedro Rodrigues, Julyan​​ Arbel, Julia Linhart​​​‌, Camille Touron.​

Joint work with:​‌ Gabriel Cardoso from École​​ de Mines de Paris​​​‌ and Sylvain Le Corff​ from Sorbonne Université and​‌ Alexandre Gramfort from Meta.​​

Simulation-based inference (SBI) estimates​​​‌ parameters of complex non-linear​ models with intractable likelihoods​‌ by training generative models​​ on simulated data to​​​‌ approximate the posterior linking​ inputs to observations.

In​‌ 42, we study​​ the compositional score produced​​​‌ by the GAUSS algorithm​ of 73 and establish​‌ an upper bound on​​ its mean squared error​​​‌ in terms of both​ the individual score errors​‌ and the number of​​ observations. We illustrate our​​​‌ theoretical findings on a​ Gaussian example, where all​‌ analytical expressions can be​​ derived in a closed​​​‌ form.

7.3.8 Conformal prediction​ for simulation-based inference

Participants:​‌ Pedro Rodrigues, Luben​​ Miguel Cruz Cabezas.​​​‌

Joint work with:​ Rafael Izbicki from UFScar,​‌ Brazil.

Current experimental scientists​​ have been increasingly relying​​​‌ on simulation-based inference (SBI)​ to invert complex non-linear​‌ models with intractable likelihoods.​​ However, posterior approximations obtained​​​‌ with SBI are often​ miscalibrated, causing credible regions​‌ to undercover true parameters.​​ We develop CP4SBI, a​​​‌ model-agnostic conformal calibration framework​ that constructs credible sets​‌ with local Bayesian coverage.​​ Our two proposed variants,​​​‌ namely local calibration via​ regression trees and CDF-based​‌ calibration, enable finite-sample local​​ coverage guarantees for any​​​‌ scoring function, including HPD,​ symmetric, and quantile-based regions.​‌ Experiments on widely used​​ SBI benchmarks demonstrate that​​​‌ our approach improves the​ quality of uncertainty quantification​‌ for neural posterior estimators​​ using both normalizing flows​​​‌ and score-diffusion modeling 47​.

7.3.9 Simulation-based inference​‌ under model misspecification

Participants:​​ Pedro Rodrigues, Florence​​​‌ Forbes, Pierre-Louis Ruhlmann​.

Joint work with​‌: Michael Arbel from​​ Inria (THOTH team).

Simulation-based​​​‌ inference (SBI) is transforming​ experimental sciences by enabling​‌ parameter estimation in complex​​ non-linear models from simulated​​​‌ data. A persistent challenge,​ however, is model misspecification:​‌ simulators are only approximations​​ of reality, and mismatches​​​‌ between simulated and real​ data can yield biased​‌ or overconfident posteriors. We​​ address this issue by​​​‌ introducing Flow Matching Corrected​ Posterior Estimation (FMCPE), a​‌ framework that leverages the​​ flow matching paradigm to​​​‌ refine simulation-trained posterior estimators​ using a small set​‌ of real calibration samples.​​ Our approach proceeds in​​​‌ two stages: first, a​ posterior approximator is trained​‌ on abundant simulated data;​​ second, flow matching transports​​ its predictions toward the​​​‌ true posterior supported by‌ real observations, without requiring‌​‌ explicit knowledge of the​​ misspecification. This design enables​​​‌ FMCPE to combine the‌ scalability of SBI with‌​‌ robustness to distributional shift.​​ Across synthetic benchmarks and​​​‌ real-world datasets, we show‌ that our proposal consistently‌​‌ mitigates the effects of​​ misspecification, delivering improved inference​​​‌ accuracy and uncertainty calibration‌ compared to standard SBI‌​‌ baselines, while remaining computationally​​ efficient 61.

7.3.10​​​‌ Simulation-based inference applied to‌ biology

Participants: Pedro Rodrigues‌​‌, Julyan Arbel,​​ Eloise Touron.

Joint​​​‌ work with: Michael‌ Arbel from Inria (THOTH‌​‌ team).

The chromatin folding​​ and the spatial arrangement​​​‌ of chromosomes in the‌ cell play a crucial‌​‌ role in DNA replication​​ and genes expression. An​​​‌ improper chromatin folding could‌ lead to malfunctions and,‌​‌ over time, diseases. For​​ eukaryotes, centromeres are essential​​​‌ for proper chromosome segregation‌ and folding. Despite extensive‌​‌ research using de novo​​ sequencing of genomes and​​​‌ annotation analysis, centromere locations‌ in yeasts remain difficult‌​‌ to infer and are​​ still unknown in most​​​‌ species. Recently, genome-wide chromosome‌ conformation capture coupled with‌​‌ next-generation sequencing (Hi-C) has​​ become one of the​​​‌ leading methods to investigate‌ chromosome structures. Some recent‌​‌ studies have used Hi-C​​ data to give a​​​‌ point estimate of each‌ centromere, but those approaches‌​‌ highly rely on a​​ good pre-localization. Here, we​​​‌ present a novel approach‌ that infers in a‌​‌ stochastic manner the locations​​ of all centromeres in​​​‌ budding yeast based on‌ both the experimental Hi-C‌​‌ map and simulated contact​​ maps 34.

7.3.11​​​‌ Tutorial guide to simulation-based‌ inference

Participants: Pedro Rodrigues‌​‌.

Joint work with​​: Thomas Moreau from​​​‌ Inria (MIND team) and‌ several colleagues from Tuebingen‌​‌ University.

In this tutorial,​​ we provide a practical​​​‌ guide for practitioners aiming‌ to apply SBI methods.‌​‌ We outline a structured​​ SBI workflow and offer​​​‌ practical guidelines and diagnostic‌ tools for every stage‌​‌ of the process –​​ from setting up the​​​‌ simulator and prior, choosing‌ and training inference networks,‌​‌ to performing inference and​​ validating the results. We​​​‌ illustrate these steps through‌ examples from astrophysics, psychophysics,‌​‌ and neuroscience. This tutorial​​ empowers researchers to apply​​​‌ state-of-the-art SBI methods, facilitating‌ efficient parameter inference for‌​‌ scientific discovery 50 and​​ 16.

7.4 Modelling​​​‌ and quantifying extreme risk‌

7.4.1 Extreme events and‌​‌ neural networks

Participants: Stephane​​ Girard.

Joint work​​​‌ with: M. Allouche (Kaiko)‌ and E. Gobet (CMAP,‌​‌ Ecole Poytechnique).

Dealing with​​ extreme values is a​​​‌ major challenge in probabilistic‌ modeling, of great importance‌​‌ in various application domains​​ such as economics, engineering​​​‌ and life sciences. In‌ the context of Generative‌​‌ Modeling, it is known​​ that models based on​​​‌ transformations of light-tailed distribution,‌ such as Generative Adversarial‌​‌ Networks (GANs), fail to​​ capture the behaviour in​​​‌ the tails. In particular,‌ these models are not‌​‌ able to capture the​​ dependence in extreme regions.​​​‌ In 20, we‌ study a modified version‌​‌ of the GAN algorithm,​​ where the input is​​​‌ a heavy-tailed distribution (and‌ we call it HTGAN).‌​‌ Recalling the stable tail​​​‌ dependence function (stdf), a​ tool from extreme-value theory​‌ that measures the dependence​​ structure in extreme regions,​​​‌ we provide a bound​ on the approximation of​‌ the stdf of the​​ target with the output​​​‌ of a HTGAN.This bound​ scales as N-​‌1)/(​​d-1)​​​‌, where N is​ the dimension of the​‌ input noise of the​​ network and d is​​​‌ the dimension of the​ data of interest. This​‌ suggests increasing the dimension​​ of the latent noise​​​‌ to gain precision in​ the estimation of dependence.​‌ We perform experiments, comparing​​ HTGAN with a classical​​​‌ light-tailed GAN (LTGAN) on​ both synthetic and real​‌ datasets exhibiting heavy-tailed characteristics.​​ These experiments confirm our​​​‌ theoretical findings: First, the​ HTGAN algorithm is better​‌ at reproducing dependence in​​ extremes than LTGAN. Second,​​​‌ we show that the​ quality of approximation gets​‌ better as the dimension​​ of the latent noise​​​‌ increases.

In 43,​ we investigate the use​‌ of generative methods based​​ on neural networks to​​​‌ simulate extreme events. Although​ very popular, these methods​‌ are mainly invoked in​​ empirical works. Therefore, providing​​​‌ theoretical guidelines for using​ such models in extreme​‌ values context is of​​ primal importance. To this​​​‌ end, we propose an​ overview of most recent​‌ generative methods dedicated to​​ extremes, giving some theoretical​​​‌ and practical tips on​ their tail behaviour thanks​‌ to both extreme-value and​​ copula tools. Additionally, 11​​​‌ devises a novel neural-inspired​ approach for simulating multivariate​‌ extremes. Specifically, we propose​​ a GAN-based generative model​​​‌ for sampling multivariate data​ exceeding large thresholds, giving​‌ rise to what we​​ refer to as the​​​‌ ExceedGAN algorithm. Our approach​ is based on approximating​‌ marginal log-quantile functions using​​ feedforward neural networks with​​​‌ eLU activation functions specifically​ introduced for bias correction.​‌ An error bound is​​ provided on the margins,​​​‌ assuming a Jth order​ condition from extreme value​‌ theory. The numerical experiments​​ illustrate that ExceedGAN outperforms​​​‌ competitors, both on synthetic​ and real-world data sets.​‌ This work is submitted​​ for publication.

in 12​​​‌, we propose new​ parametrizations for neural networks​‌ in order to estimate​​ Expected Shortfall and Conditional​​​‌ Tail Moments in heavy-tailed​ settings. The proposed neural​‌ network estimators feature a​​ bias correction based on​​​‌ an extension of the​ usual second-order condition to​‌ an arbitrary order. The​​ convergence rate of the​​​‌ uniform error between extreme​ log-quantiles and their neural​‌ network approximation is established.​​ The finite sample performances​​​‌ of the non-conditional neural​ network estimator are compared​‌ to other bias-reduced extreme-value​​ competitors on simulated data.​​​‌ It is shown that​ our method outperforms them​‌ in difficult heavy-tailed situations​​ where other estimators almost​​​‌ all fail.

7.4.2 Estimation​ of extreme risk measures​‌

Participants: Jonathan El Methni​​, Antoine Franchini,​​​‌ Stephane Girard.

Joint​ work with: M. Allouche​‌ (Kaiko) and A. Dutfoy​​ (EDF).

Most of extrapolation​​​‌ methods dedicated to the​ estimation of extreme risk​‌ measures rely on the​​ approximation of the excesses​​​‌ distribution above a high​ threshold by a Generalized​‌ Pareto Distribution (GPD). In​​ 51, we propose​​ an alternative to the​​​‌ GPD, called the Refined‌ Pareto Distribution (RPD), which‌​‌ allows for a second-order​​ approximation of the excesses​​​‌ distribution. The parameters of‌ the RPD are estimated‌​‌ using an Approximate Bayesian​​ Computation (ABC) method, and​​​‌ reduced-bias estimators of extreme‌ risk measures are then‌​‌ derived together with the​​ associated credible intervals. The​​​‌ ABC estimator demonstrates good‌ performance over a wide‌​‌ range of heavy-tailed distributions.​​ Its usefulness is also​​​‌ illustrated on two data‌ sets of insurance claims.‌​‌ The results are submitted​​ for publication.

The celebrated​​​‌ Weissman estimator provides a‌ simple way to compute‌​‌ extreme quantiles, lying outside​​ the observation range, from​​​‌ heavy-tailed distributions. Asymptotic confidence‌ intervals can also be‌​‌ built basing on its​​ asymptotic normality, but they​​​‌ may suffer from poor‌ coverage properties in practice.‌​‌ In the context of​​ the PhD thesis of​​​‌ Antoine Franchini, we propose‌ several higher order approximations‌​‌ of the Weissman estimator​​ asymptotic distribution together with​​​‌ a data-driven procedure to‌ automatically select the most‌​‌ appropriate one. The usefulness​​ of the associated adaptive​​​‌ confidence interval is illustrated‌ on an intensive simulation‌​‌ study as well as​​ on two climatic and​​​‌ financial data sets. The‌ results are submitted for‌​‌ publication 54.

In​​ 18, we address​​​‌ the estimation of extreme‌ quantiles of Weibull tail-distributions.‌​‌ Since such quantiles are​​ asymptotically larger than the​​​‌ sample maximum, their estimation‌ requires extrapolation methods. In‌​‌ the case of Weibull​​ tail-distributions, classical extreme-value estimators​​​‌ are numerically outperformed by‌ estimators dedicated to this‌​‌ set of light-tailed distributions.​​ The latter estimators of​​​‌ extreme quantiles are based‌ on two key quantities:‌​‌ an order statistic to​​ estimate an intermediate quantile​​​‌ and an estimator of‌ the Weibull tail-coefficient used‌​‌ to extrapolate. The common​​ practice is to select​​​‌ the same intermediate sequence‌ for both estimators. We‌​‌ show how an adapted​​ choice of two different​​​‌ intermediate sequences leads to‌ a reduction of the‌​‌ asymptotic bias associated with​​ the resulting refined estimator.​​​‌ This analysis is supported‌ by an asymptotic normality‌​‌ result associated with the​​ refined estimator. A data-driven​​​‌ method is introduced for‌ the practical selection of‌​‌ the intermediate sequences and​​ our approach is compared​​​‌ to three estimators of‌ extreme quantiles dedicated to‌​‌ Weibull tail-distributions on simulated​​ data. An illustration on​​​‌ a real data set‌ of daily wind measures‌​‌ is also provided.

7.4.3​​ Estimation of extreme inequality​​​‌ measures

Participants: Jonathan El‌ Methni, Stephane Girard‌​‌, Pearl Laveur.​​

Inequality indices provide a​​​‌ quantitative framework for measuring‌ disparity within a distribution,‌​‌ particularly in wealth or​​ income. First, we introduce​​​‌ a unified family of‌ inequality indices that encompasses‌​‌ several classical ones, including​​ Gini, Atkinson, extended Gini,​​​‌ Bonferroni and Mehran indices.‌ Second, we prove, under‌​‌ appropriate conditions, that indices​​ within this family satisfy​​​‌ six axioms widely accepted‌ in the literature. Third,‌​‌ two general estimators are​​ proposed for this class​​​‌ and their asymptotic normality‌ is established under mild‌​‌ assumptions. Besides, it has​​ been observed that the​​​‌ Gini index is robust‌ to changes in the‌​‌ highest incomes. Leveraging extreme-value​​​‌ theory, we prove a​ feature shared by the​‌ entire family: Non-discrimination of​​ tail behaviours in terms​​​‌ of maximum domains of​ attraction. Notably, this property​‌ also holds for several​​ alternatives to the Gini​​​‌ index, including those previously​ cited. These results are​‌ illustrated both on simulated​​ data and on a​​​‌ real income data set.​ The results are submitted​‌ for publication 52.​​

7.4.4 Changepoint identification in​​​‌ heavy-tailed distributions

Participants: Stephane​ Girard.

Joint work​‌ with: T. Opitz (INRAe​​ Avignon), A. Usseglio-Carleve (Univ.​​​‌ Avignon) and C. Yan​ (Univ. Michigan).

The problem​‌ of detecting the existence​​ of a changepoint in​​​‌ a data sequence and​ of identifying its position​‌ is challenging when the​​ focus is on extreme​​​‌ events and the distribution​ of data is heavy-tailed.​‌ In this setting, we​​ propose a robust semi-parametric​​​‌ approach to changepoint identification​ that does not require​‌ the likelihood function. The​​ changepoint is estimated as​​​‌ the position of the​ maximum of a statistic​‌ presented in 53 and​​ inspired by classical ANOVA​​​‌ to contrast the tail​ behavior of data to​‌ the left and right​​ of all changepoint candidates.​​​‌ It is shown that​ the estimator is asymptotically​‌ consistent under mild assumptions.​​ In numerical experiments, the​​​‌ novel method shows reliable​ finite-sample behavior for various​‌ simulation settings and is​​ very competitive in comparison​​​‌ to alternative changepoint identification​ approaches from the literature,​‌ especially for small sample​​ sizes. Finally, the utility​​​‌ of the method is​ highlighted by identifying interpretable​‌ changepoints in three real-data​​ applications: very large motor​​​‌ insurance claim amounts for​ a French administrative region​‌ with age as covariate;​​ daily Bitcoin cryptocurrency price​​​‌ data (January 2018 –​ February 2025) and daily​‌ log-returns of stocks of​​ the Boeing company (March​​​‌ 2015 – March 2025)​ both with time as​‌ covariate. This work is​​ submitted for publication 55​​​‌.

7.4.5 Dimension reduction​ for extremes

Participants: Stephane​‌ Girard.

Joint work​​ with: C. Pakzad (Univ.​​​‌ Paris-Nanterre).

In the context​ of the PhD thesis​‌ of Meryem Bousebata, we​​ proposed a new approach,​​​‌ called Extreme-PLS (EPLS), for​ dimension reduction in regression​‌ and adapted to distribution​​ tails. The objective is​​​‌ to find linear combinations​ of predictors that best​‌ explain the extreme values​​ of the response variable​​​‌ in a non-linear inverse​ regression model. In 56​‌, we extend the​​ approach to more realistic​​​‌ data settings where both​ serial correlation and missing-ness​‌ occur. Specifically, we consider​​ a single-index inverse regression​​​‌ model under heavy-tailed conditions​ and introduce a Missing-at-Random​‌ (MAR) mechanism acting on​​ the covariates, whose probability​​​‌ depends on the extremeness​ of the response. The​‌ asymptotic behavior of the​​ proposed estimator is established​​​‌ within an α-mixing​ framework, leading to consistency​‌ results under regularly varying​​ tails. Extensive Monte-Carlo experiments​​​‌ covering eleven dependence schemes​ (including ARMA, GARCH, and​‌ nonlinear ESTAR processes) demonstrate​​ that the method performs​​​‌ robustly across a wide​ range of heavy-tailed and​‌ dependent scenarios, even when​​ substantial portions of data​​​‌ are missing. A real-world​ application to environmental data​‌ further confirms the method's​​ capacity to recover meaningful​​ tail directions. The results​​​‌ are submitted for publication.‌

Finally, the EPLS method‌​‌ is extended to the​​ functional framework in 57​​​‌, to tackle the‌ case of functional covariates.‌​‌ The results are submitted​​ for publication.

8 Bilateral​​​‌ contracts and grants with‌ industry

8.1 Bilateral contracts‌​‌ with industry

Participants: Stephane​​ Girard.

  • Contract with​​​‌ EDF (2024-2027).
    Stephane Girard‌ is the advisor of‌​‌ the PhD thesis of​​ Antoine Franchini funded by​​​‌ EDF. The goal is‌ to investigate sensitivity analysis‌​‌ and extrapolation limits in​​ extreme-value theory. The financial​​​‌ support for statify is‌ of 50K euros.

9‌​‌ Partnerships and cooperations

9.1​​ International initiatives

9.1.1 Inria​​​‌ associate team not involved‌ in an IIL or‌​‌ an international program

WOMBAT​​
  • Title:
    Variance-reduced Optimization Methods​​​‌ and Bayesian Approximation Techniques‌ for scalable inference
  • Duration:‌​‌
    2023 ->
  • Coordinator:
    Hien​​ Duy Nguyen (h.nguyen5@latrobe.edu.au)
  • Partners:​​​‌
    • Trobe University de Melbourne‌ (Australie)
  • Inria contact:
    Florence‌​‌ Forbes
  • Summary:
    Many inferential​​ tools, such as machine​​​‌ learning algorithms and statistical‌ models, require the estimation‌​‌ of model parameters, structures,​​ quantities, and properties, from​​​‌ data. In practice, it‌ is common that model‌​‌ characterizations are available through​​ high-fidelity simulations of the​​​‌ data generating processes, but‌ only through “black-boxes” that‌​‌ are poorly suited for​​ optimization under uncertainty or​​​‌ conventional statistical inference procedures.‌ The main statistical challenge‌​‌ is that model likelihoods​​ are typically intractable or​​​‌ unavailable in closed form.‌ Approaches suited for these‌​‌ scenarios are typically referred​​ to as likelihood-free or​​​‌ simulation-based inference (SBI) methods,‌ and have received a‌​‌ great deal of attention​​ in recent years, with​​​‌ momentum coming from mixing‌ of ideas from the‌​‌ interface between statistics and​​ machine learning. However, most​​​‌ SBI methods scale poorly‌ when the number of‌​‌ observations is too large,​​ which makes them unsuitable​​​‌ for modern data, which‌ are often acquired in‌​‌ real time, in an​​ incremental nature, and are​​​‌ often available in large‌ volume. Computation of inferential‌​‌ quantities in an incremental​​ manner may be forcibly​​​‌ imposed by the nature‌ of data acquisition (e.g.‌​‌ streaming and sequential data)​​ but may also be​​​‌ seen as a solution‌ to handle larger data‌​‌ volumes in a more​​ resource friendly way, with​​​‌ respect to memory, energy,‌ and time consumption. To‌​‌ produce feasible and practical​​ online algorithms for streaming​​​‌ data and complex models,‌ we propose to study‌​‌ the family of stochastic​​ approximation (SA) algorithms. The​​​‌ overall goal of the‌ project is to combine‌​‌ recent ideas from the​​ SBI and SA literature,​​​‌ to propose efficient methods‌ for handling complex inferential‌​‌ problems. We shall demonstrate​​ our approaches via applications​​​‌ to problems in challenging‌ domains, such as Magnetic‌​‌ Resonance Imaging (MRI) or​​ road network management as​​​‌ initial targets. So doing,‌ we hope to achieve‌​‌ both breakthroughs in applied​​ methodology and the development​​​‌ of new SBI and‌ SA techniques that wide-spread‌​‌ applicability.

9.2 International research​​ visitors

9.2.1 Visits of​​​‌ international scientists

Other international‌ visits to the team‌​‌
Darren Wraith
  • Status
    Associate​​ Professor
  • Institution of origin:​​​‌
    QUT
  • Country:
    Australia
  • Dates:‌
    mid-August 2025- mid-February 2026‌​‌
  • Context of the visit:​​​‌
    Beyond Gaussian mixtures for​ inverse problems and simulation-based​‌ inference.
  • Mobility program/type of​​ mobility:
    sabbatical research stay​​​‌
Adam Bretherton
  • Status
    PhD​ student
  • Institution of origin:​‌
    QUT, Brisbane
  • Country:
    Australia​​
  • Dates:
    mid-May - mid-June​​​‌ 2025
  • Context of the​ visit:
    Simulation-based inference and​‌ Bayesian models
  • Mobility program/type​​ of mobility:
    research stay​​​‌ in the context of​ the Associate Team Wombat​‌

9.2.2 Visits to international​​ teams

Research stays abroad​​​‌
Razan Mhanna
  • Visited institution:​
    Brigham Young University (BYU)​‌
  • Country:
    USA
  • Dates:
    May-June​​ 2025
  • Context of the​​​‌ visit:
    Graph kernels for​ brain network analysis
  • Mobility​‌ program/type of mobility:
    research​​ stay funded by "bourse​​​‌ IDEX aide à la​ mobilité internationale sortantes des​‌ doctorant.e.s de l’Université Grenoble​​ Alpes"

9.3 National initiatives​​​‌

Participants: Jonathan El Methni​, Jean-Baptiste Durand,​‌ Florence Forbes, Julyan​​ Arbel, Sophie Achard​​​‌, Stephane Girard,​ Pedro Luiz Coelho Rodrigues​‌.

  • Jonathan El Methni​​ and Stephane Girard were​​​‌ awarded 5K euros and​ a PhD funding via​‌ the IRGA call from​​ Université Grenoble-Alpes, 2024–2027.
ANR​​​‌
  • An ANR project RADIO-AIDE​ (2022-26) for Radiation induced​‌ neurotoxicity assessed by Spatio-temporal​​ modeling and AI after​​​‌ brain radiotherapy coordinated by​ S.Ancelet from IRSN has​‌ been granted for 4​​ years starting from April​​​‌ 2022. It involves statify​, Grenoble Insitute of​‌ Neurosciences, Pixyl, ICANS, APHP,​​ ICM and ENS P.Saclay.​​​‌ The available funding for​ statify is 94K euros.​‌
  • ANR project PEG2 (2022-26)​​ on Predictive Ecological Genomics:​​​‌ statify is involved in​ this 4-year project recently​‌ accepted in July 2022.​​ The PI is prof.​​​‌ Olivier Francois who spent​ 2 years (2021-22) in​‌ the team on a​​ Delegation position.
  • Julyan Arbel​​​‌ is coPI of the​ Bayes-Duality project launched with​‌ a funding of $2.76​​ millions by Japan JST​​​‌ - French ANR for​ a total of 5​‌ years starting in October​​ 2021. The goal is​​​‌ to develop a new​ learning paradigm for Artificial​‌ Intelligence that learns like​​ humans in an adaptive,​​​‌ robust, and continuous fashion.​ On the Japan side​‌ the project is led​​ by Mohammad Emtiyaz Khan​​​‌ as the research director,​ and Kenichi Bannai and​‌ Rio Yokota as Co-PIs.​​
  • Statify is involved in​​​‌ the 4-year ANR project​ EXSTA “EXtremes, STatistical learning​‌ and Applications” (2024-2028) hosted​​ by Paris-Sorbonne University. Extreme​​​‌ Value Theory is the​ branch of probability and​‌ statistics dedicated to rare​​ events associated with tails​​​‌ of distributions, with numerous​ applications in various scientific​‌ fields where extreme events​​ are of particular importance,​​​‌ and in risk management.​ Recent years have seen​‌ the development of a​​ theoretical framework inspired by​​​‌ statistical learning theory and​ algorithms adapted from machine​‌ learning for the analysis​​ of extremes, in line​​​‌ with the statistical community's​ growing interest in high-dimensional​‌ problems and the increasing​​ availability of large-scale data​​​‌ sets. The aim of​ the project is to​‌ reinforce these emerging directions​​ and encourage interaction between​​​‌ theory and practice. The​ consortium brings together statisticians​‌ whose research topics cover​​ a wide spectrum, from​​​‌ mathematical statistics and learning​ theory to operational applications​‌ in climate and environmental​​ sciences and industry.
  • Pedro​​ Luiz Coelho Rodrigues is​​​‌ co-PI of the SBI4C‌ project of the MIAI‌​‌ AI Cluster, under the​​ reference ANR-23-IACL-0006. The project​​​‌ started in September 2025‌ and has a four‌​‌ years duration with 400k​​ euros of funding. Other​​​‌ laboratories involved are the‌ Laboratoire d'Informatique de Grenoble‌​‌ (LIG) and the Institut​​ des Géosciences de l'Environnement​​​‌ (IGE). More details at‌ link.
PEPR Digital‌​‌ Health
  • Florence Forbes and​​ Sophie Achard are involved​​​‌ in the REWIND project‌ (2023-2028), pRecision mEdicine WIth‌​‌ loNgitudinal Data. The goal​​ is to develop models​​​‌ for longitudinal for understanding‌ the progression of chronic‌​‌ diseases.
France Life Imaging​​ (FLI)
  • Funding from “comité”​​​‌ de pilotage national du‌ Réseau d’Expertise « Traitement‌​‌ et Analyse en Imagerie​​ Multimodale » (RE4) de​​​‌ l’Infrastructure France Life Imaging‌ (FLI) for a project‌​‌ entitled « Détection d’Anomalies​​ en Imagerie Médicale par​​​‌ apprentissage faiblement Supervisé ».‌ Joint project with Carole‌​‌ Lartizien and Michel Dojat.​​

9.3.1 Networks

MSTGA and​​​‌ AIGM INRAE (French National‌ Institute for Agricultural Research)‌​‌ networks: F. Forbes and​​ J.B Durand are members​​​‌ of the INRAE network‌ called AIGM (ex MSTGA)‌​‌ network since 2006, website​​, on Algorithmic issues​​​‌ for Inference in Graphical‌ Models. It is funded‌​‌ by INRAE MIA and​​ RNSC/ISC Paris. This network​​​‌ gathers researchers from different‌ disciplines. Statify co-organized and‌​‌ hosted 2 of the​​ network meetings in 2008​​​‌ and 2015 in Grenoble.‌

10 Dissemination

10.1 Promoting‌​‌ scientific activities

10.1.1 Scientific​​ events: organisation

Member of​​​‌ the organizing committees
  • Jean-Baptiste‌ Durand , MaSeMo :‌​‌ Markov, Semi-Markov Models and​​ Associated Fields (from Theory​​​‌ to Application and back),‌ 1-4 July 2025, Paris,‌​‌ France
  • Florence Forbes co-organized​​ with Xun Huan (University​​​‌ of Michigan) and Youssef‌ Marzouk (Massachusetts Institute of‌​‌ Technology), a special session​​ on Bayesian experimental design​​​‌ at the MCM25 conference‌ in Chicago.

10.1.2 Scientific‌​‌ events: selection

Member of​​ the conference program committees​​​‌
  • Julyan Arbel: Area Chair‌ for AISTATS.
Reviewer
  • Julyan‌​‌ Arbel: Area Chair for​​ ICML.

10.1.3 Journal

Member​​​‌ of the editorial boards‌
  • Stephane Girard : Associate‌​‌ Editor for Revstat and​​ Dependence Modelling.
  • Julyan Arbel:​​​‌ Associate Editor for Statistics‌ and Computing, Bayesian Analysis,‌​‌ Australian and New Zealand​​ Journal of Statistics, Statistics​​​‌ & Probability Letters, Statistical‌ Methods & Applications.
Reviewer‌​‌ - reviewing activities
  • Stephane​​ Girard: Extremes, Electronic Journal​​​‌ of Statistics.
  • Jonathan El‌ Methni: Journal of Statistical‌​‌ Software.
  • Jean-Baptiste Durand: Statistics​​ and Computing.
  • Julyan Arbel:​​​‌ Annals of Applied Probability,‌ Applied Probability Journal, Extremes,‌​‌ Journal of Machine Learning​​ Research (x2), Journal of​​​‌ the Royal Statistical Society‌ series B, Statistical Science‌​‌ (x2).

10.1.4 Invited talks​​

  • Stephane Girard : Invited​​​‌ talk at the annual‌ general assembly of the‌​‌ 'PEPR Climat – TRACCS'​​
  • Jonathan El Methni :​​​‌ Exposé invité au séminaire‌ GAIA de l'Université Grenoble‌​‌ Alpes. Sur les traces​​ des premiers graphiques statistiques,​​​‌ décembre 2025.
  • Julyan Arbel:‌ Keynote talk at Journées‌​‌ de Statistique de la​​ SFdS, Marseille. Invited talks​​​‌ at 14th International Conference‌ on Bayesian Nonparametrics, UCLA‌​‌ (USA); Royal Statistical Society​​ (RSS) Conference, Edinburgh (UK);​​​‌ All About That Seminar,‌ Institut Henri Poincaré, Paris;‌​‌ Recent Advances in Machine​​​‌ Learning, Aussois.
  • Florence Forbes:​ invited talk at IABM​‌ in March 2025, at​​ the Model Based Clustering​​​‌ workshop in July 2025​ both in Nice, at​‌ the GeNU workshop in​​ September 2025 in Copenhaguen.​​​‌
  • Jacopo Iollo: invited talk​ at the Isaac Newton​‌ Institute in Cambridge UK​​ and at the ASA/IMS​​​‌ Spring Research Conference, New​ York, both in June​‌ 2025, at the MCM25​​ conference in Chicago USA​​​‌ in August 2025.

10.1.5​ Leadership within the scientific​‌ community

  • Stephane Girard: Member​​ of the ELLIS Society​​​‌ (European Laboratory for Learning​ and Intelligent Systems) since​‌ 2025.
  • Julyan Arbel: Member​​ of the ELLIS Society​​​‌ (European Laboratory for Learning​ and Intelligent Systems) since​‌ 2020. Member of Data​​ Science axis Committee of​​​‌ Persyval Labex, Grenoble.

10.1.6​ Research administration

  • Jean-Baptiste Durand​‌ , Member of the​​ INRAE evaluation committee, section​​​‌ MISTI (Applied mathematics and​ computer science).
  • Julyan Arbel:​‌ Membre du comité d'évaluation​​ ANR CE23 Intelligence artificielle​​​‌ et science des données.​

10.2 Teaching - Supervision​‌ - Juries - Educational​​ and pedagogical outreach

10.2.1​​​‌ Teaching

  • Master: Stephane Girard,​ Statistique Inférentielle Avancée, 18​‌ ETD, M1 level, Ensimag.​​ Grenoble-INP, France.
  • Master: Stephane​​​‌ Girard, Modélisation, estimation, simulation​ des risques climatiques, 12​‌ ETD, M2 level, Ecole​​ Polytechnique, Palaiseau, France.
  • Master:​​​‌ Julyan Arbel, Bayesian Machine​ Learning, 36 ETD, with​‌ R. Bardenet and G.​​ V. Cardoso, Master MVA,​​​‌ École normale supérieure Paris-Saclay​ .

10.2.2 Supervision

  • Stephane​‌ Girard is the PhD​​ advisor of the PhD​​​‌ thesis of Antoine Franchini​ (Université Grenoble-Alpes, since december​‌ 2024).
  • Stephane Girard and​​ Jonathan El Methni are​​​‌ the PhD co-avisors of​ the PhD thesis of​‌ Pearl Laveur (Université Grenoble-Alpes,​​ since october 2024).
  • Stephane​​​‌ Girard is the co-advisor​ (with G. Stupfler, Université​‌ d'Angers and A. Usseglio-Carleve,​​ Université d'Avignon) of the​​​‌ PhD thesis of Solune​ Denis (Université d'Angers, since​‌ october 2024).
  • Julyan Arbel​​ is the PhD advisor​​​‌ of Mohamed-Bahi Yahiaoui (CEA​ Cadarache-Inria, with Loïc Giraldi,​‌ Geoffrey Daniel).
  • Julyan Arbel​​ is the PhD advisor​​​‌ of Julien Zhou (Inria-Criteo,​ with Pierre Gaillard and​‌ Thibaud Rahier).
  • Julyan Arbel​​ is the PhD advisor​​​‌ of Alexandre Wendling (Inria-UGA,​ with Clovis Galiez).
  • Julyan​‌ Arbel and Pedro Rodrigues​​ are the PhD advisors​​​‌ of Eloise Touron (Inria,​ with Nelle Varoquaux and​‌ Mickael Arbel).
  • Julyan Arbel​​ and Pedro Rodrigues are​​​‌ the PhD advisors of​ Camille Touron (Inria).
  • Julyan​‌ Arbel and Sophie Achard​​ are the PhD advisors​​​‌ of Alice Chevaux (Inria,​ with Guillaume Kon Kam​‌ King).
  • Julyan Arbel is​​ the PhD advisor of​​​‌ the PhD advisor of​ Soufiane Atouani (Inria).

10.2.3​‌ Juries

  • Stephane Girard :​​ Reviewer of the PhD​​​‌ thesis of Nicolas Atienza,​ "Towards reliable ML: Leveraging​‌ multi-modal representations, information bottleneck​​ and extreme value theory",​​​‌ Univ. Paris-Saclay, aoril 2025.​
  • Stephane Girard : President​‌ of the PhD committee​​ of Alex Podgorny, "Réduction​​​‌ de dimension pour l’inf´erence​ statistique de queues de​‌ distribution", Univ. Strasbourg, september​​ 2025.
  • Jonathan El Methni​​​‌ : member of two​ hiring committees for PRAG​‌ in Maths and a​​ junior lecturer position at​​​‌ Faculté d'économie de l'Université​ Grenoble Alpes. Member​‌ for the hiring committee​​ for ATER positions.
  • Julyan​​ Arbel: Examiner or the​​​‌ PhD thesis of Meriam‌ Ezziati, Laboratoire d’astrophysique de‌​‌ Marseille, “Searching for high-z​​ quasars in the Euclid​​​‌ Wide Survey”.
  • Julyan Arbel:‌ Examiner or the PhD‌​‌ thesis of Antoine Van​​ Biesbroeck, Ecole Polytechnique &​​​‌ CEA Saclay , “Extended‌ reference prior theory for‌​‌ objective and practical inference,​​ application to robust and​​​‌ auditable seismic fragility curves‌ estimation”.
  • Julyan Arbel: Reviewer‌​‌ or the PhD thesis​​ of Qian Jin, UNSW​​​‌ Sydney, “Latent Structure Models‌ in Statistical Learning and‌​‌ Neural Network Extensions”.
  • Julyan​​ Arbel: Reviewer or the​​​‌ PhD thesis of Jan‌ Greve, Vienna University of‌​‌ Economics and Business, “Probability​​ Distributions on Partitions of​​​‌ Data: Theory and Applications”.‌
  • Julyan Arbel: Reviewer or‌​‌ the HDR thesis of​​ Gianni Franchi, ENSTA Paris,​​​‌ Institut Polytechnique de Paris,‌ “Towards Trustworthy Artificial Intelligence”.‌​‌
  • Florence Forbes: chair of​​ the PhD defence of​​​‌ Tom Swagier and Younes‌ Moussaoui. Member of the‌​‌ PhD commitee of Louis​​ Grenioux.

10.2.4 Educational and​​​‌ pedagogical outreach

  • Stephane Girard‌ : Training (9h, remote‌​‌ teaching) “Climate Risk quantification​​ methods and tools for​​​‌ finance” for BNP Paribas‌ employees.

10.3 Popularization

10.3.1‌​‌ Participation in Live events​​

  • Julyan Arbel is a​​​‌ Social Media Officer for‌ the International Society for‌​‌ Bayesian Analysis (ISBA). He​​ organises Discussion Paper Webinars​​​‌ for the Bayesian Analysis‌ journal.

11 Scientific production‌​‌

11.1 Major publications

11.2​​ Publications of the year​​​‌

International journals

Invited conferences

International​ peer-reviewed conferences

National peer-reviewed Conferences​‌

Conferences​​ without proceedings

  • 37 inproceedings​​​‌J.Julyan Arbel.‌ Bayesian deep learning: Overview‌​‌ and challenges.BNP​​ 14 - 14th International​​​‌ Conference on Bayesian Nonparametrics‌Los Angeles (CA), United‌​‌ States2025HAL
  • 38​​ inproceedingsJ.Julyan Arbel​​​‌. Overview and challenges‌ in Bayesian deep learning‌​‌.JdS 2025 -​​ 56es Journées de Statistique​​​‌ de la SFdSMarseille,‌ France2025HAL
  • 39‌​‌ inproceedingsJ.Julyan Arbel​​. Some Bayesian nonparametric​​​‌ ideas in (Bayesian) deep‌ learning.RSS 2025‌​‌ - International Conference of​​ Royal Statistical SocietyEdimbourg,​​​‌ United Kingdom2025HAL‌
  • 40 inproceedingsA.Antoine‌​‌ Barrier, L.Lila​​ Cunge, T.Thomas​​​‌ Coudert, A.Aurélien‌ Delphin, L.Loïc‌​‌ Legris, G.Geoffroy​​ Oudoumanessah, L.Laurent​​​‌ Lamalle, F.Florence‌ Forbes, M.Mariya‌​‌ Doneva, B.Benjamin​​ Lemasson, E. L.​​​‌Emmanuel L. Barbier and‌ T.Thomas Christen.‌​‌ MARVEL MRF for Contrast-free​​ Blood Volume, Microvascular Properties,​​​‌ and Relaxometry Mapping: Initial‌ Tests in Volunteers and‌​‌ Stroke Patients.ISMRM​​ & ISMRT 2025 -​​​‌ Annual Meeting & Exhibition‌Honolulu (Hawaï), United States‌​‌2025, 1-4HAL​​
  • 41 inproceedingsM.Michel​​​‌ Dojat. Digital Health‌ in the 21th.‌​‌CPR 2025 - 4th​​ Annecy Round Table on​​​‌ Cardio Pulmonary ResuscitationAnnecy,‌ France2025HAL
  • 42‌​‌ inproceedingsC.Camille Touron​​, G.Gabriel Victorino​​​‌ Cardoso, J.Julyan‌ Arbel and P. L.‌​‌Pedro Luiz Coelho Rodrigues​​. Error analysis of​​​‌ a compositional score-based algorithm‌ for simulation-based inference.‌​‌Workshop on Principles of​​ Generative Modeling at EurIPS​​​‌ 2025Copenhague, DenmarkOctober‌ 2025HALback to‌​‌ text

Scientific book chapters​​

Doctoral dissertations and habilitation‌​‌ theses

  • 45 thesisJ.​​Julien Chevallier. A​​​‌ journey in the fields‌ of PDE, probabilities and‌​‌ statistics with point processes​​.Université grenoble Alpes​​​‌December 2025HAL

Reports‌ & preprints

Other scientific​​ publications

11.3 Cited​​ publications

  • 68 inbookH.​​​‌H. Bacave, J.-B.‌J.-B. Durand, A.‌​‌A. Franc, N.​​N. Peyrard, S.​​​‌S. Plancade and R.‌R. Sabbadin. Multichain‌​‌ HMM.A comprehensive​​ guide to HSMM: Theory,​​​‌ software, and advanced extensions‌N.N. Peyrard and‌​‌ B.B. De Saporta​​, eds. Mathematics and​​​‌ Statistics Series / ISTE‌ISTE; John Wiley2025‌​‌, 79-116back to​​ text
  • 69 inbookC.​​​‌C. Bérard, M.-J.‌M.-J. Cros, J.-B.‌​‌J.-B. Durand, C.​​C. Lothodé, S.​​​‌S. Plancade, R.‌R. Trépos and N.‌​‌N Vergne. Review​​ of HSMM R and​​​‌ Python Softwares.A‌ comprehensive guide to HSMM:‌​‌ Theory, software, and advanced​​ extensionsN.N. Peyrard​​​‌ and B.B. De‌ Saporta, eds. Mathematics‌​‌ and Statistics Series /​​ ISTEISTE; John Wiley​​​‌2025, 47-77back‌ to text
  • 70 inbook‌​‌J.-B.J.-B. Durand,​​ A.A. Franc,​​​‌ N.N. Peyrard,‌ N.N. Vergne and‌​‌ I.I. Votsi.​​ Monochain HSMM.A​​​‌ comprehensive guide to HSMM:‌ Theory, software, and advanced‌​‌ extensionsN.N. Peyrard​​ and B.B. De​​​‌ Saporta, eds. Mathematics‌ and Statistics Series /‌​‌ ISTEISTE; John Wiley​​2025, 1-46back​​​‌ to text
  • 71 inbook‌J.-B.J.-B. Durand,‌​‌ N.N. Peyrard,​​ S.S. Plancade and​​​‌ R.R. Sabbadin.‌ Multichain HSMM.A‌​‌ comprehensive guide to HSMM:​​​‌ Theory, software, and advanced​ extensionsN.N. Peyrard​‌ and B.B. De​​ Saporta, eds. Mathematics​​​‌ and Statistics Series /​ ISTEISTE; John Wiley​‌2025, 129-156back​​ to text
  • 72 inproceedings​​​‌J.-B.J.-B. Durand,​ M.M. Valdeyron,​‌ N.N. Peyrard and​​ S.S. Plancade.​​​‌ Review of estimation algorithms​ for HMM/HSMM with mixed​‌ effects.MaSeMo :​​ Markov, Semi-Markov Models and​​​‌ Associated Fields (from Theory​ to Application and back)​‌Paris, FranceJul 2025​​, 9back to​​​‌ text
  • 73 unpublishedJ.​Julia Linhart, G.​‌ V.Gabriel Victorino Cardoso​​, A.Alexandre Gramfort​​​‌, S.Sylvain Le​ Corff and P. L.​‌Pedro Luiz Coelho Rodrigues​​. Diffusion posterior sampling​​​‌ for simulation-based inference in​ tall data settings.​‌June 2024, working​​ paper or preprintHAL​​​‌back to text