Keywords
Computer Science and Digital Science
 A6. Modeling, simulation and control
 A6.1. Methods in mathematical modeling
 A6.1.1. Continuous Modeling (PDE, ODE)
 A6.1.2. Stochastic Modeling
 A6.1.4. Multiscale modeling
 A6.2. Scientific computing, Numerical Analysis & Optimization
 A6.2.1. Numerical analysis of PDE and ODE
 A6.2.2. Numerical probability
 A6.2.3. Probabilistic methods
 A6.2.4. Statistical methods
 A6.2.5. Numerical Linear Algebra
 A6.2.6. Optimization
 A6.3. Computationdata interaction
 A6.3.1. Inverse problems
 A6.3.2. Data assimilation
 A6.3.4. Model reduction
 A6.3.5. Uncertainty Quantification
 A6.5. Mathematical modeling for physical sciences
 A6.5.2. Fluid mechanics
 A6.5.3. Transport
 A6.5.5. Chemistry
Other Research Topics and Application Domains
 B1. Life sciences
 B2. Health
 B3. Environment and planet
 B3.2. Climate and meteorology
 B4. Energy
 B4.2. Nuclear Energy Production
 B4.2.1. Fission
 B5.3. Nanotechnology
 B5.5. Materials
1 Team members, visitors, external collaborators
Research Scientists
 Mathias Rousset [Team leader, INRIA, Researcher, HDR]
 Frédéric Cérou [INRIA, Researcher]
 Cédric Herzet [INRIA, Researcher, HDR]
 Patrick Héas [INRIA, Researcher]
 François Le Gland [INRIA, Senior Researcher, (until Nov. 2022  retirement)]
Faculty Member
 Valérie Monbet [UNIV RENNES I, Professor, HDR]
PhD Students
 François Ernoult [UNIV RENNES I]
 Théo Guyard [INSA RENNES]
 Thu Le Tran [UNIV RENNES I]
Administrative Assistant
 Gunther Tessier [INRIA]
2 Overall objectives
As the constant surge of computational power is nurturing scientists into simulating the most detailed features of reality, from complex molecular systems to climate or weather forecast, the computer simulation of physical systems is becoming reliant on highly complex stochastic dynamical models and very abundant observational data. The complexity of such models and of the associated observational data stems from intrinsic physical features, which do include high dimensionality as well as intricate temporal and spatial multiscales. It also results in much less control over simulation uncertainty.
Within this highly challenging context, SIMSMART positions itself as a mathematical and computational probability and statistics research team, dedicated to Monte Carlo simulation methods. Such methods include in particular particle Monte Carlo methods for rare event simulation, data assimilation and model reduction, with application to stochastic random dynamical physical models. The main objective of SIMSMART is to disrupt this now classical field by creating deeper mathematical frameworks adapted to the management of contemporary highly sophisticated physical models.
3 Research program
Introduction. Computer simulation of physical systems is becoming increasingly reliant on highly complex models, as the constant surge of computational power is nurturing scientists into simulating the most detailed features of reality – from complex molecular systems to climate/weather forecast.
Yet, when modeling physical reality, bottomup approaches are stumbling over intrinsic difficulties. First, the timescale separation between the fastest simulated microscopic features, and the macroscopic effective slow behavior becomes huge, implying that the fully detailed and direct long time simulation of many interesting systems (e.g. large molecular systems) are out of reasonable computational reach. Second, the chaotic dynamical behaviors of the systems at stake, coupled with such multiscale structures, exacerbate the intricate uncertainty of outcomes, which become highly dependent on intrinsic chaos, uncontrolled modeling, as well as numerical discretization. Finally, the massive increase of observational data addresses new challenges to classical data assimilation, such as dealing with high dimensional observations and/or extremely long time series of observations.
SIMSMART Identity. Within this highly challenging applicative context, SIMSMART positions itself as a computational probability and statistics research team, with a mathematical perspective. Our approach is based on the use of stochastic modeling of complex physical systems, and on the use of Monte Carlo simulation methods, with a strong emphasis on dynamical models. The two main numerical tasks of interest to SIMSMART are the following: (i) simulating with pseudorandom number generators  a.k.a. sampling  dynamical models of random physical systems, (ii) sampling such random physical dynamical models given some real observations  a.k.a. Bayesian data assimilation. SIMSMART aims at providing an appropriate mathematical level of abstraction and generalization to a wide variety of Monte Carlo simulation algorithms in order to propose nonsuperficial answers to both methodological and mathematical challenges. The issues to be resolved include computational complexity reduction, statistical variance reduction, and uncertainty quantification.
SIMSMART Objectives. The main objective of SIMSMART is to disrupt this now classical field of particle Monte Carlo simulation by creating deeper mathematical frameworks adapted to the challenging world of complex (e.g. high dimensional and/or multiscale), and massively observed systems, as described in the beginning of this introduction.
To be more specific, we will classify SIMSMART objectives using the following four intertwined topics:
 Objective 1: Rare events and random simulation.
 Objective 2: High dimensional and advanced particle filtering.
 Objective 3: Nonparametric approaches.
 Objective 4: Model reduction and sparsity.
Rare events Objective 1 are ubiquitous in random simulation, either to accelerate the occurrence of physically relevant random slow phenomenons, or to estimate the effect of uncertain variables. Objective 1 will be mainly concerned with particle methods where splitting is used to enforce the occurrence of rare events.
The problem of high dimensional observations, the main topic in Objective 2, is a known bottleneck in filtering, especially in nonlinear particle filtering, where linear data assimilation methods remain the stateoftheart approaches.
The increasing size of recorded observational data and the increasing complexity of models also suggest to devote more effort into nonparametric data assimilation methods, the main issue of Objective 3.
In some contexts, for instance when one wants to compare solutions of a complex (e.g. high dimensional) dynamical systems depending on uncertain parameters, the construction of relevant reducedorder models becomes a key topic. Model reduction aims at proposing efficient algorithmic procedures for the resolution (to some reasonable accuracy) of highdimensional systems of parametric equations. This overall objective entails many different subtasks:1) the identification of lowdimensional surrogates of the target “solution’’ manifold, 2) The devise of efficient methodologies of resolution exploiting lowdimensional surrogates, 3) The theoretical validation of the accuracy achievable by the proposed procedures. This is the content of Objective 4.
With respect to volume of research activity, Objective 1, Objective 4 and the sum (Objective 2+Objective 3) are comparable.
Some new challenges in the simulation and data assimilation of random physical dynamical systems have become prominent in the last decade. A first issue (i) consists in the intertwined problems of simulating on large, macroscopic random times, and simulating rare events (see Objective 1). The link between both aspects stems from the fact that many effective, large times dynamics can be approximated by sequences of rare events. A second, obvious, issue (ii) consists in managing very abundant observational data (see Objective 2 and 3). A third issue (iii) consists in quantifying uncertainty/sensitivity/variance of outcomes with respect to models or noise. A fourth issue (iv) consists in managing high dimensionality, either when dealing with complex prior physical models, or with very large data sets. The related increase of complexity also requires, as a fifth issue (v), the construction of reduced models to speedup comparative simulations (see Objective 4). In a context of very abundant data, this may be replaced by a sixth issue (vi) where complexity constraints on modeling is replaced by the use of nonparametric statistical inference (see Objective 3).
Hindsight suggests that all the latter challenges are related. Indeed, the contemporary digital condition, made of a massive increase in computational power and in available data, is resulting in a demand for more complex and uncertain models, for more extreme regimes, and for using inductive approaches relying on abundant data. In particular, uncertainty quantification (item (iii)) and high dimensionality (item (iv)) are in fact present in all 4 Objectives considered in SimSmart.
4 Application domains
4.1 Domain 1 – Computational Physics
The development of largescale computing facilities has enabled simulations of systems at the atomistic scale on a daily basis. The aim of these simulations is to bridge the time and space scales between the macroscopic properties of matter and the stochastic atomistic description. Typically, such simulations are based on the ordinary differential equations of classical mechanics supplemented with a random perturbation modeling temperature, or collisions between particles.
Let us give a few examples. In biochemistry, such simulations are key to predict the influence of a ligand on the behavior of a protein, with applications to drug design. The computer can thus be used as a numerical microscope in order to access data that would be very difficult and costly to obtain experimentally. In that case, a rare event (Objective 1) is given by a macroscopic system change such as a conformation change of the protein. In nuclear safety, such simulations are key to predict the transport of neutrons in nuclear plants, with application to assessing aging of concrete. In that case, a rare event is given by a high energy neutron impacting concrete containment structures.
A typical model used in molecular dynamics simulation of open systems at given temperature is a stochastic differential equation of Langevin type. The large time behavior of such systems is typically characterized by a hopping dynamics between 'metastable' configurations, usually defined by local minima of a potential energy. In order to bridge the time and space scales between the atomistic level and the macroscopic level, specific algorithms enforcing the realization of rare events have been developed. For instance, splitting particle methods (Objective 1) have become popular within the computational physics community only within the last few years, partially as a consequence of interactions between physicists and Inria mathematicians in ASPI (parent of SIMSMART) and MATHERIALS projectteams.
SIMSMART also focuses on various models described by partial differential equations (reactiondiffusion, conservation laws), with unknown parameters modeled by random variables.
4.2 Domain 2 – Meteorology
The traditional trend in data assimilation in geophysical sciences (climate, meteorology) is to use as prior information some very complex deterministic models formulated in terms of fluid dynamics and reflecting as much as possible the underlying physical phenomenon (see e.g.). Weather/climate forecasting can then be recast in terms of a Bayesian filtering problem (see Objective 2) using weather observations collected in situ.
The main issue is therefore to perform such Bayesian estimations with very expensive infinite dimensional prior models, and observations in large dimension. The use of some linear assumption in prior models (Kalman filtering) to filter nonlinear hydrodynamical phenomena is the stateoftheart approach, and a current field of research, but is plagued with intractable instabilities.
This context motivates two research trends: (i) the introduction of nonparametric, modelfree prior dynamics constructed from a large amount of past, recorded real weather data; and (ii) the development of appropriate nonlinear filtering approaches (Objective 2 and Objective 3).
SIMSMART will also test its new methods on multisource data collected in NorthAtlantic paying particular attention to coastal areas (e.g. within the interLabex SEACS).
4.3 Other Applicative Domains
SIMSMART focuses on various applications including:
 Tracking and hidden Markov models.
 Robustness and certification in Machine Learning.
5 New software and platforms
5.1 New software
5.1.1 Screening4L0Problem

Keywords:
Global optimization, Sparsity

Functional Description:
This software contains "Branch and bound" optimization routines exploiting "screening" acceleration rules for solving sparse representation problems involving the L0 pseudonorm.
 URL:
 Publication:

Contact:
Cedric Herzet

Participants:
Clément Elvira, Theo Guyard, Cedric Herzet
5.1.2 Screen&Relax

Keywords:
Optimization, Sparsity

Functional Description:
This software provides optimization routines to efficiently solve the "ElasticNet" problem.
 URL:
 Publication:

Contact:
Cedric Herzet

Participants:
Clément Elvira, Theo Guyard, Cedric Herzet
5.1.3 npSEM

Name:
Stochastic expectationmaximization algorithm for nonparametric statespace models

Keyword:
Statistic analysis

Functional Description:
npSEM is the combination of a nonparametric estimate of the dynamic using local linear regression (LLR), a conditional particle smoother and a stochastic ExpectationMaximization (SEM) algorithm. Further details of its construction and implementation are introduced in the article An algorithm for nonparametric estimation in statespace models of authors "T.T.T. Chau, P. Ailliot, V. Monbet", https://doi.org/10.1016/j.csda.2020.107062.
 URL:

Contact:
Thi Tuyet Trang Chau

Participants:
Valérie Monbet, Thi Tuyet Trang Chau
5.1.4 NHMSAR

Name:
NonHomogeneous Markov Switching Autoregressive Models

Keyword:
Statistical learning

Functional Description:
Calibration, simulation, validation of (non)homogeneous Markov switching autoregressive models with Gaussian or von Mises innovations. Penalization methods are implemented for Markov Switching Vector Autoregressive Models of order 1 only. Most functions of the package handle missing values.
 URL:

Contact:
Valérie Monbet

Participant:
Valérie Monbet
5.1.5 3D Winds Fields Profiles

Keywords:
3D modeling, Opticflow, Atmosphere

Functional Description:
The algorithm computes 3D Atmospheric Motion Vectors (AMVs) vertical profiles, using incomplete maps of humidity, temperature and ozone concentration observed in a range of isobaric levels. The code is implemented for operational use with the Infrared Atmospheric Sounding Interferometer (IASI) carried on the MetOp satellite.
 URL:

Contact:
Patrick Heas

Participant:
Patrick Heas
5.1.6 Screening4SLOPE

Keyword:
Optimization

Functional Description:
This software provides optimization routines to solve the SLOPE problem by exploiting "safe screening" reduction techniques.
 URL:
 Publication:

Contact:
Cedric Herzet

Participants:
Clément Elvira, Cedric Herzet
6 New results
6.1 Objective 1 – Rare events and Monte Carlo simulation
MonteCarlo simulation
Participants: Frédéric Cérou, Patrick Héas, Mathias Rousset, François Ernoult.
In CITATION NOT FOUND: cerou:hal03889692, we obtained the first large deviation analysis of the (large sample size) statistical fluctuations of the AMS algorithm. The obtained limiting quantity can provide insights on the algorithmic efficiency in practice, in particular a novel geometric criterion ensuring minimal fluctuations (asymptotic efficiency) is studied.
In CITATION NOT FOUND: heas:hal03777922 we study a real world high dimensional Bayesian sampling problem (weather variables observed by space imagery) using kinetic Langevin diffusions (Hamiltonian Monte Carlo), and show empirically the advantage for convergence of an artificial “cold” tempering taming the nonlinearities of the likelihood.
In CITATION NOT FOUND: cerou:hal03889404, we obtained a theoretical result on Importance Sampling that proves the following fact: Consider a convex set of possible target probability distributions and a reference measure. The Gibbslike distribution that minimizes entropy (with respect to the reference) on the considered convex class is also, in some rigorously defined worstcase sense, the optimal importance proposal.
In CITATION NOT FOUND: monmarche:hal02916073, we introduce and study a new family of simulable velocity jump Markov process (PDMP) with prescribed (up to normalization constant) stationary distribution (no time step error nor Metropolis correction !) that can converge towards kinetic Langevin diffusions.
6.2 Objective 2 – New topics in particle filtering and semiparametric statistics
Participants: Audrey Cuillery, François Le Gland, Valérie Monbet.
Model selection of climate and weather prediction models is a critical issue which can be tackled by constructing socalled analog forecasts; which are cheap stochastic generators of the output of different models constructed using historical simulation data. The latter can be combined with stateofart Monte Carlo filtering procedures (e.g. Gaussianbased Ensemble Kalman filters) to efficiently compare the likelihood of the prediction output of the considered different models evaluated on some real in situ observations. These applications have been studied in CITATION NOT FOUND: ruiz:hal03685531, CITATION NOT FOUND: ruiz:insu03868833.
The above results have motivated the development of an original semiparametric inference methodology able to construct stochastic weather models/generators, the nonparametric part relying on a “catalog of analogs” consisting of past data (e.g. a time series). In CITATION NOT FOUND: chau:hal03616079, a hidden latent Markovian model with parametric noise and nonparametric drift is inferred from an historical catalog of data (a time series) using a stochastic ExpectationMaximisation/Estimation iterative scheme (with iteration index $k$). In the latter, the smoothed (conditional on data) distribution of the step $k$ hidden model is simulated using advanced pathwise sequential Monte Carlo particle filters (Conditional Particle Filters with Backward simulation). The step $k+1$ model is then estimated using both parametric maximum likelihood and nonparametric / machine learning tools.
6.3 Objective 3 – Semiparametric and applied statistics
Participants: Valérie Monbet, Cédric Herzet, Thu Le Tran, Saïd Obakrim.
Motivated by applications to weather/climate data, various original estimation methods have been proposed, for instance a new ExpectationMaximisation method for generalized ridge regression in CITATION NOT FOUND: monbet:hal03825411. Other works have compared various statistical methods optimally chosen and adjusted to timeseries in meteorological contexts: regression (CITATION NOT FOUND: obakrim:hal03825413) for wave height/wind relation; Gaussian mixtures for calibration of ensemble forecasts (CITATION NOT FOUND: jouan:hal03619364); regression and Deep/Machine Learning (CITATION NOT FOUND: michel:hal03825410, CITATION NOT FOUND: monbet:hal03825412) for the downscaling of sea states. In CITATION NOT FOUND: koutroulis:hal03378232 stochastic weather generators developed and studied by the team are used for the design and reliability evaluation of desalination systems.
In CITATION NOT FOUND: bleza:hal03614274, prediction of allergic pollen risk from meteorological data and assimilation is studied.
6.4 Objective 4 – Model Reduction and Sparsity
Participants: Patrick Héas, Cédric Herzet, Théo Guyard.
In the context of model reduction, an issue is to find fast algorithms to project onto lowdimensional, sparse models. CITATION NOT FOUND: heas:hal03468966 studies the linear approximation of highdimensional dynamical systems using lowrank dynamic mode decomposition. Searching this approximation in a datadriven approach is formalized as attempting to solve a lowrank constrained optimization problem. This problem is nonconvex, and stateoftheart algorithms are all suboptimal. This paper shows that there exists a closedform solution, which is computed in polynomial time, and characterizes the ${\ell}^{2}$norm of the optimal approximation error.
Another issue occurs when using prior model to solve reconstruction tasks where one wants to recover some quantity of interest from partial/noisy observations. In many situations, given the inputs of the problem at hand, some parts of the model may be irrelevant to solve the target reconstruction task. Hence, a recent trend (e.g. for “$k$sparse” models) consists in the “online” simplification of the prior model, “online” meaning here “during the reconstruction process”. This approach (named “screening” in the case of $k$sparse models) can thus be seen to some extent as a technique for “online model reduction” and aims to achieve better accuracy/complexity tradeoffs. We showed that the principle of screening goes well beyond the standard $k$sparse model.We were the first team to derive a valid screening method applying on “nonseparable” sparse models, see CITATION NOT FOUND: elvira:hal03400322; see also CITATION NOT FOUND: herzet:hal03806099 and applications to ${\ell}_{1}$ or LASSO models in CITATION NOT FOUND: guyard:hal03688032, CITATION NOT FOUND: guyard:hal03778139, CITATION NOT FOUND: tran:hal03806044, CITATION NOT FOUND: tran:hal03805966. We also have investigated how the mechanics of screening can be extended to more general problems involving nonconvex functions: in CITATION NOT FOUND: guyard:hal03688011, CITATION NOT FOUND: guyard:hal03784682 we dealt with the identification of zeros in the solution of optimization problems involving the “${\ell}_{0}$” counting function.
Finally, in CITATION NOT FOUND: courcouxcaro:hal03780626, we study the design of sensor arrays in the context involving the localization of a few acoustic sources using sparse approximation to find the source locations.
7 Bilateral contracts and grants with industry
7.1 Bilateral contracts with industry
7.1.1 CIFRE grants

Participants: Valérie Monbet.
Through the CIFRE PhD project of EssoRidah Bleza supervised by Valérie Monbet on IA for multisource pollen detection, see CITATION NOT FOUND: bleza:hal03614274.
7.1.2 Meteorological Satellite Data Processing
Participants: Patrick Héas.
Industrial Partner:EUMETSAT of Darmstadt.
Partner Contact:Regis.Borde@eumetsat.int
The transferred technology concerns an algorithm for the operational and realtime production of vertically resolved 3D atmospheric motion vector fields (AMVs) from measurements of new hyperspectral instruments: the infrared radiosounders on the third generation Meteosat satellites (MTG), developed by the European Space Agency (ESA) and the Infrared Atmospheric Sounding Interferometer (IASI) on MetOpA and MetOpB developed by the French Space Agency (CNES).
8 Partnerships and cooperations
8.1 International initiatives
8.1.1 Participation in other International Programs
Participants: Valérie Monbet.

Title:
ECOS ARGENTINE
 ECOS ARGENTINE. Funding program through the ECOS Sud  MINCyT intiative. The program involves a collaboration with the FrenchArgentinian Climate Institute.
8.2 National initiatives
8.2.1 ANR

ANR MELODY (20202024)
Participants: Cédric Herzet.
The MELODY project aims to bridge the physical model‐driven paradigm underlying ocean / atmosphere science and AI paradigms with a view to developing geophysically‐sound learning‐based and data‐driven representations of geophysical flows accounting for their key features (e.g., chaos, extremes, high‐dimensionality).
The partners involved in the project were: IMT Atlantique (PI: Ronan Fablet), InriaRennes, InriaGrenoble, Laboratoire d'Océanographie Physique et Spatiale, Institut des géosciences et de l'environnement, Institut PierreSimon Laplace.

ANR SINEQ (20212025)
Participants: Mathias Rousset, Frédéric Cérou.
Simulating nonequilibrium stochastic dynamics. The goal of the SINEQ project is, within a mathematical perspective, to extend various variance reduction techniques used in the Monte Carlo computation of equilibrium properties of statistical physics models.
The partners involved in the project are: CERMICS (PI: G. Stoltz), CEREMADE and Inria Rennes.
9 Dissemination
Participants: Mathias Rousset, Frédéric Cérou, Cédric Herzet, Patrick Héas, François Le Gland, Valérie Monbet.
9.1 PhD and HDR defenses
 Cédric Herzet has defended his HdR (Habilitation à Diriger des Recherches) on november 24, 2022. Manuscript: CITATION NOT FOUND: herzet:tel03888741.
 Audrey Cuillery has defended her PhD in 2022. Manuscript: CITATION NOT FOUND: cuillery:tel03850485 (supervision: F. Le Gland).
 Saïd Obakrim has defended his PhD in 2022. (supervision: V. Monbet).
9.2 Promoting scientific activities
9.2.1 Scientific events: organisation
 F. Cérou is organizing the weekly