# Keywords

- A6. Modeling, simulation and control
- A6.1. Methods in mathematical modeling
- A6.1.1. Continuous Modeling (PDE, ODE)
- A6.1.2. Stochastic Modeling
- A6.1.4. Multiscale modeling
- A6.2. Scientific computing, Numerical Analysis & Optimization
- A6.2.1. Numerical analysis of PDE and ODE
- A6.2.2. Numerical probability
- A6.2.3. Probabilistic methods
- A6.2.4. Statistical methods
- A6.2.5. Numerical Linear Algebra
- A6.3. Computation-data interaction
- A6.3.1. Inverse problems
- A6.3.2. Data assimilation
- A6.3.4. Model reduction
- A6.3.5. Uncertainty Quantification
- A6.5. Mathematical modeling for physical sciences
- A6.5.2. Fluid mechanics
- A6.5.5. Chemistry

- B1. Life sciences
- B2. Health
- B3. Environment and planet
- B3.2. Climate and meteorology
- B4. Energy
- B4.2. Nuclear Energy Production
- B4.2.1. Fission

# 1 Team members, visitors, external collaborators

## Research Scientists

- Mathias Rousset [Team leader, Inria, Researcher, HDR]
- Frédéric Cérou [Inria, Researcher]
- Cédric Herzet [Inria, Researcher]
- Patrick Héas [Inria, Researcher]
- François Le Gland [Inria, Senior Researcher]

## Faculty Member

- Valérie Monbet [Univ de Rennes I, Professor, HDR]

## Post-Doctoral Fellow

- Sofiane Martel [Inria, from Mar 2020]

## PhD Students

- Francois Ernoult [Univ de Rennes I, from Oct 2020]
- Said Obakrim [Univ de Rennes I]
- Thu Le Tran [Univ de Rennes I, from Oct 2020]

## Interns and Apprentices

- Francois Ernoult [Inria, from Apr 2020 until Jun 2020]
- Theo Guyard [Inria, from May 2020 until Jul 2020]
- Anne Kieffer [Inria, from Jun 2020 until Sep 2020]
- Thu Le Tran [Inria, from Jun 2020 until Aug 2020]

## Administrative Assistant

- Fabienne Cuyollaa [Inria]

## External Collaborator

- Arnaud Guyader [Sorbonne Université, HDR]

# 2 Overall objectives

As the constant surge of computational power is nurturing scientists into simulating the most detailed features of reality, from complex molecular systems to climate or weather forecast, the computer simulation of physical systems is becoming reliant on highly complex stochastic dynamical models and very abundant observational data. The complexity of such models and of the associated observational data stems from intrinsic physical features, which do include high dimensionality as well as intricate temporal and spatial multi-scales. It also results in much less control over simulation uncertainty.

Within this highly challenging context, SIMSMART positions itself as a mathematical and computational probability and statistics research team, dedicated to Monte Carlo simulation methods. Such methods include in particular particle Monte Carlo methods for rare event simulation, data assimilation and model reduction, with application to stochastic random dynamical physical models. The main objective of SIMSMART is to disrupt this now classical field by creating deeper mathematical frameworks adapted to the management of contemporary highly sophisticated physical models.

# 3 Research program

Introduction. Computer simulation of physical systems is becoming increasingly reliant on highly complex models, as the constant surge of computational power is nurturing scientists into simulating the most detailed features of reality – from complex molecular systems to climate/weather forecast.

Yet, when modeling physical reality, bottom-up approaches are stumbling over intrinsic difficulties. First, the timescale separation between the fastest simulated microscopic features, and the macroscopic effective slow behavior becomes huge, implying that the fully detailed and direct long time simulation of many interesting systems (e.g. large molecular systems) are out of reasonable computational reach. Second, the chaotic dynamical behaviors of the systems at stake, coupled with such multi-scale structures, exacerbate the intricate uncertainty of outcomes, which become highly dependent on intrinsic chaos, uncontrolled modeling, as well as numerical discretization. Finally, the massive increase of observational data addresses new challenges to classical data assimilation, such as dealing with high dimensional observations and/or extremely long time series of observations.

SIMSMART Identity. Within this highly challenging applicative context, SIMSMART positions itself as a computational probability and statistics research team, with a mathematical perspective. Our approach is based on the use of stochastic modeling of complex physical systems, and on the use of Monte Carlo simulation methods, with a strong emphasis on dynamical models. The two main numerical tasks of interest to SIMSMART are the following: (i) simulating with pseudo-random number generators - a.k.a. sampling - dynamical models of random physical systems, (ii) sampling such random physical dynamical models given some real observations - a.k.a. Bayesian data assimilation. SIMSMART aims at providing an appropriate mathematical level of abstraction and generalization to a wide variety of Monte Carlo simulation algorithms in order to propose non-superficial answers to both methodological and mathematical challenges. The issues to be resolved include computational complexity reduction, statistical variance reduction, and uncertainty quantification.

SIMSMART Objectives. The main objective of SIMSMART is to disrupt this now classical field of particle Monte Carlo simulation by creating deeper mathematical frameworks adapted to the challenging world of complex (e.g. high dimensional and/or multi-scale), and massively observed systems, as described in the beginning of this introduction.

To be more specific, we will classify SIMSMART objectives using the following four intertwined topics:

- Objective 1: Rare events and random simulation.
- Objective 2: High dimensional and advanced particle filtering.
- Objective 3: Non-parametric approaches.
- Objective 4: Model reduction and sparsity.

Rare events Objective 1 are ubiquitous in random simulation, either to accelerate the occurrence of physically relevant random slow phenomenons, or to estimate the effect of uncertain variables. Objective 1 will be mainly concerned with particle methods where splitting is used to enforce the occurrence of rare events.

The problem of high dimensional observations, the main topic in Objective 2, is a known bottleneck in filtering, especially in non-linear particle filtering, where linear data assimilation methods remain the state-of-the-art approaches.

The increasing size of recorded observational data and the increasing complexity of models also suggest to devote more effort into non-parametric data assimilation methods, the main issue of Objective 3.

In some contexts, for instance when one wants to compare solutions of a complex (e.g. high dimensional) dynamical systems depending on uncertain parameters, the construction of relevant reduced-order models becomes a key topic. Model reduction aims at proposing efficient algorithmic procedures for the resolution (to some reasonable accuracy) of high-dimensional systems of parametric equations. This overall objective entails many different subtasks:1) the identification of low-dimensional surrogates of the target “solution’’ manifold, 2) The devise of efficient methodologies of resolution exploiting low-dimensional surrogates, 3) The theoretical validation of the accuracy achievable by the proposed procedures. This is the content of Objective 4.

With respect to volume of research activity, Objective 1, Objective 4 and the sum (Objective 2+Objective 3) are comparable.

Some new challenges in the simulation and data assimilation of random physical dynamical systems have become prominent in the last decade. A first issue (i) consists in the intertwined problems of simulating on large, macroscopic random times, and simulating rare events (see Objective 1). The link between both aspects stems from the fact that many effective, large times dynamics can be approximated by sequences of rare events. A second, obvious, issue (ii) consists in managing very abundant observational data (see Objective 2 and 3). A third issue (iii) consists in quantifying uncertainty/sensitivity/variance of outcomes with respect to models or noise. A fourth issue (iv) consists in managing high dimensionality, either when dealing with complex prior physical models, or with very large data sets. The related increase of complexity also requires, as a fifth issue (v), the construction of reduced models to speed-up comparative simulations (see Objective 4). In a context of very abundant data, this may be replaced by a sixth issue (vi) where complexity constraints on modeling is replaced by the use of non-parametric statistical inference (see Objective 3).

Hindsight suggests that all the latter challenges are related. Indeed, the contemporary digital condition, made of a massive increase in computational power and in available data, is resulting in a demand for more complex and uncertain models, for more extreme regimes, and for using inductive approaches relying on abundant data. In particular, uncertainty quantification (item (iii)) and high dimensionality (item (iv)) are in fact present in all 4 Objectives considered in SimSmart.

# 4 Application domains

## 4.1 Domain 1 – Computational Physics

The development of large-scale computing facilities has enabled simulations of systems at the atomistic scale on a daily basis. The aim of these simulations is to bridge the time and space scales between the macroscopic properties of matter and the stochastic atomistic description. Typically, such simulations are based on the ordinary differential equations of classical mechanics supplemented with a random perturbation modeling temperature, or collisions between particles.

Let us give a few examples. In bio-chemistry, such simulations are key to predict the influence of a ligand on the behavior of a protein, with applications to drug design. The computer can thus be used as a numerical microscope in order to access data that would be very difficult and costly to obtain experimentally. In that case, a rare event (Objective 1) is given by a macroscopic system change such as a conformation change of the protein. In nuclear safety, such simulations are key to predict the transport of neutrons in nuclear plants, with application to assessing aging of concrete. In that case, a rare event is given by a high energy neutron impacting concrete containment structures.

A typical model used in molecular dynamics simulation of open systems at given temperature is a stochastic differential equation of Langevin type. The large time behavior of such systems is typically characterized by a hopping dynamics between 'metastable' configurations, usually defined by local minima of a potential energy. In order to bridge the time and space scales between the atomistic level and the macroscopic level, specific algorithms enforcing the realization of rare events have been developed. For instance, splitting particle methods (Objective 1) have become popular within the computational physics community only within the last few years, partially as a consequence of interactions between physicists and Inria mathematicians in ASPI (parent of SIMSMART) and MATHERIALS project-teams.

SIMSMART also focuses on various models described by partial differential equations (reaction-diffusion, conservation laws), with unknown parameters modeled by random variables.

## 4.2 Domain 2 – Meteorology

The traditional trend in data assimilation in geophysical sciences (climate, meteorology) is to use as prior information some very complex deterministic models formulated in terms of fluid dynamics
and reflecting as much as possible the underlying physical phenomenon (see e.g.https://

The main issue is therefore to perform such Bayesian estimations with very expensive infinite dimensional prior models, and observations in large dimension. The use of some linear assumption in prior models (Kalman filtering) to filter non-linear hydrodynamical phenomena is the state-of-the-art approach, and a current field of research, but is plagued with intractable instabilities.

This context motivates two research trends: (i) the introduction of non-parametric, model-free prior dynamics constructed from a large amount of past, recorded real weather data; and (ii) the development of appropriate non-linear filtering approaches (Objective 2 and Objective 3).

SIMSMART will also test its new methods on multi-source data collected in North-Atlantic paying particular attention to coastal areas (e.g. within the inter-Labex SEACS).

## 4.3 Other Applicative Domains

SIMSMART focuses on various applications including:

- Tracking and hidden Markov models.
- Robustness and certification in Machine Learning.

# 5 New results

## 5.1 Objective 1 – Rare events and Monte Carlo simulation

#### Monte-Carlo simulation

Participants: Frédéric Cérou, Patrick Héas, Mathias Rousset.

In 7, we consider Langevin processes, which are widely used in molecular simulation to compute reaction kinetics using rare event algorithms. We prove convergence in distribution in the overdamped asymptotics. The proof relies on the classical perturbed test function (or corrector) method, which is used both to show tightness in path space, and to identify the extracted limit with a martingale problem. The result holds assuming the continuity of the gradient of the potential energy, and a mild control of the initial kinetic energy.

In 5, we study the estimation of rare event probabilities using importance sampling (IS), where an optimal proposal distribution is computed with the cross-entropy (CE) method. Although, IS optimised with the CE method leads to an efficient reduction of the estimator variance, this approach remains unaffordable for problems where the repeated evaluation of the score function represents a too intensive computational effort. This is often the case for score functions related to the solution of a partial differential equation (PDE) with random inputs. This work proposes to alleviate computation by adapting a score function approximation along the CE optimisation process.

In 14, we introduce and study a new family of velocity jump Markov processes directly amenable to exact simulation (leading to a MCMC-like algorithm) with the following two properties: i) trajectories converge in law when a time-step parameter vanishes towards a given Langevin or Hamiltonian dynamics; ii) the stationary distribution of the process is always exactly given by the product of a Gaussian (for velocities) by any target log-density whose gradient is pointwise computable together with some additional explicit appropriate upper bound.

## 5.2 Objective 2 – New topics in particle filtering

– Multitarget tracking in track–before–detect context

Participants: Audrey Cuillery, François Le Gland.

Several algorithms have been proposed in the literature, such as the adaptive auxiliary particle filter of Úbeda–Medina, Garcia–Fernández and Grajal (2017) or the interacting population–based MCMC particle filter of Bocquel, Driessen and Bagchi (2012), that all try to break the artificial association of targets that is inherent in using a multitarget state–vector. Our current work is to further investigate this idea, that we see as implementing a crossover step in a particle filter.

– Stability of Feynman-Kac semi-groups

Participants: Mathias Rousset.

In 4, Feynman-Kac semigroups is the key mathematical structure underpinning non-linear particle filtering. Their long time behavior provides important information to assess the stability of the filter. In this paper, we propose a simple and natural extension of the stability of Markov chains for these non-linear evolutions. As other classical ergodicity results, it relies on two assumptions: a Lyapunov condition that induces some compactness, and a minorization condition ensuring some mixing. We show that these conditions are satisfied in a variety of situations.

## 5.3 Objective 3 – Semi-parametric statistics

Participants: Valérie Monbet.

The operation of Renewable Energy Sources (RES) systems is highly affected by the continuously changing meteorological conditions and the design of a RES system has to be robust to the unknown weather conditions that it will encounter during its lifetime. In 1, the use of data-driven Stochastic Weather Generators (SWGENs) is introduced for the optimal design and reliability evaluation of hybrid Photovoltaic / Wind-Generator (PV-W/G) systems providing energy to desalination plants. A SWGEN is proposed, which is based on parametric Markov-Switching Auto-Regressive (MSAR) models and is capable to simulate realistic hourly multivariate time series of solar irradiance, temperature and wind speed of the target installation site.

## 5.4 Objective 4 – Model Reduction

Participants: Patrick Héas, Cédric Herzet.

In the context of model reduction, an issue is to find fast algorithms to project onto low-dimensional, sparse models. The inverse problem, spreading the information over all coefficients of a representation is a desirable property in many applications such as digital communication or machine learning. This so-called antisparse representation can be obtained by solving a convex program involving an $\infty $-norm penalty combined with a quadratic discrepancy. In 3, 11, we propose a new methodology ('safe squeezing'), to accelerate the computation of antisparse representation.

Another avenue of research has been the study of the sparse surrogate in the context of “continuous’’ dictionaries, where the elementary signals forming the decomposition catalog are functions of some parameters taking their values in some continuous domain. In this context, we contributed to the theoretical characterization of the performance of some well-known algorithmic procedure, namely “orthogonal matching pursuit’’ (OMP). More specifically, we proposed the first theoretical analysis of the behavior of OMP in the continuous setup, see 2.

Several contributions to model reduction methodologies have shown that refined prior models on target solutions (based on a set of embedded approximation subspaces) may lead to enhanced approximation performance. In6, we focus on a particular decoder exploiting such a “multi-space” information and evaluating the approximate reduced model as the solution of a constrained optimization problem. To date, no theoretical results have been derived to support the good empirical performance of this decoder. The goal of the present paper is to fill this gap.

Reduced modeling in high-dimensional reproducing kernel Hilbert spaces offers the opportunity to approximate efficiently non-linear dynamics. In 12, we devise an algorithm based on low rank constraint optimization and kernel-based computation that generalizes a recent approach called ”kernel-based dynamic mode decomposition”. This new algorithm is characterized by a gain in approximation accuracy, as evidenced by numerical simulations, and in computational complexity.

In 8, 9, we address the problem of approximating the atoms of a parametric dictionary, commonly encountered in the context of sparse representations in "continuous" dictionaries. We focus on the case of translation-invariant dictionaries. We derive necessary and sufficient conditions characterizing the existence of an "interpolating" and "translation-invariant" low-rank approximation.

10 deals with the sensor placement problem for an array designed for source localization. When it involves the identification of a few sources, the compressed sensing framework is known to find directions effectively thanks to sparse approximation. The present contribution intends to provide an answer to the following question: given a set of observations, how should we make the next measurement to minimize (some form of) uncertainty on the localization of the sources?

In 13, we bridge two seemingly unrelated sparse approximation topics: continuous sparse coding and low-rank approximations. We show that for a specific choice of continuous dictionary, linear systems with nuclear-norm regularization have the same solutions as a BLasso problem.

# 6 Bilateral contracts and grants with industry

## 6.1 Bilateral contracts with industry

- Scalian, through the CIFRE PhD project of Gabriel Jouan, dedicated to weather forecast corrections.
- LIFY air, through the CIFRE PhD project of Esso-Ridah Bleza on IA for multi-source pollen detection.
- Cooper Standard, Machine Learning for joints design.

## 6.2 Preliminary collaboration

The agency European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) of Darmstadt. The transfer focus on the estimation of atmospheric 3D winds from the future hyperspectral instruments.

# 7 Partnerships and cooperations

## 7.1 National initiatives

### 7.1.1 ANR

ANR BECOSE (2016-2020): Beyond Compressive Sensing: Sparse approximation algorithms for ill-conditioned inverse problems.

Cédric Herzet is part of the BECOSE project. The BECOSE project aims to extend the scope of sparsity techniques much beyond the academic setting of random and well-conditioned dictionaries. In particular, one goal of the project is to step back from the popular L1-convexification of the sparse representation problem and consider more involved nonconvex formulations, both from a methodological and theoretical point of view. The algorithms will be assessed in the context of tomographic Particle Image Velocimetry (PIV), a rapidly growing imaging technique in fluid mechanics that will have strong impact in several industrial sectors including environment, automotive and aeronautical industries.

ANR Melody (2020-2024): Bridging geophysics and MachinE Learning for the modeling, simulation and reconstruction of Ocean DYnamics.

Cédric Herzet is part of the MELODY project. The MELODY project aims to bridge the physical model‐driven paradigm underlying ocean/atmosphere science and AI paradigms with a view to developing geophysically‐sound learning‐based and data‐driven representations of geophysical flows accounting for their key features (e.g., chaos, extremes, high‐dimensionality).

# 8 Dissemination

## 8.1 Promoting scientific activities

### 8.1.1 Scientific events: organisation

Cédric Herzet is part of the organizing committee of the iTwist’20 Workshop.

### 8.1.2 Research administration

Valérie Monbet is:

- Membre suppléante du CNU (section 26)
- Membre nommée du bureau de la commission recherche de UR1

Frédéric Cérou is organizing the weekly Seminar 'Stochastic Processes' at IRMAR, Univ Rennes.

## 8.2 Teaching - Supervision - Juries

### 8.2.1 Teaching

Cédric Herzet has given:

- INSA RENNES, 5ième année de l’option Génie Mathématique, cours de Parcimonie en traitement du signal et des images, 10h de cours magistraux + responsable du module
- Ensai RENNES, Master international « Smart Data » , cours « Foundations of Smart Sensing », 15h de cours magistraux
- Ensai RENNES, Master international « Smart Data » , cours « Advanced topics in Smart Sensing » , 3h de cours magistraux
- Ensai RENNES, Master 2, cours « Régression pénalisée et sélection de modèles » , 6h de cours magistraux + 3h TPs + responsable du module
- Ensai RENNES, Master 2, suivi de projets, 15h

François Le Gland has given

- a 2nd year course on introduction to stochastic differential equations, at INSA (institut national des sciences appliquées) Rennes, within the cursus in mathematical engineering,
- a 3rd year course on Bayesian filtering and particle approximation, at ENSTA (école nationale supérieure de techniques avancées), Palaiseau, within the statistics and control module,
- a 3rd year course on linear and nonlinear filtering, at ENSAI (école nationale de la statistique et de l'analyse de l'information), Ker Lann, within the statistical engineering track,
- and a course on Kalman filtering and hidden Markov models, at université de Rennes 1, within the SISEA (signal, image, systèmes embarqués, automatique) track of the master in electronical engineering and telecommunicationst.

Mathias Rousset has given a 24h specialized course 'Large Deviations Theory' in Master 2 Fundamental Mathematics Univ Rennes.

### 8.2.2 Supervision

Cédric Herzet has supervised:

- Soufiane Ait Tilat, PhD, co-supervision with Frédéric Champagnat (Onera, Palaiseau)
- Milan Courcoux-Caro, PhD, co-supervision with Charles Vanwynsberghe (ENSTA Bretagne) and Alexandre Baussard (IUT de Troyes),
- Le Tran Thu, PhD, co-supervision with V. Monbet (Université de Rennes 1), Hong Phuong Dang (ENSAI), Madison Giacofci (Université de Rennes 2)
- Théo Guyard, Master 2 thesis, INSA, co-supervision with Clément Elvira (CentraleSupélec)

Mathias Rousset and Frédéric Cérou have supervised:

- François Ernoult, master 2 and then PhD (with Fredéric Cérou).

François Le Gland is supervising one PhD student

- Audrey Cuillery, provisional title: Bayesian tracking from raw data, université de Rennes 1, started in November 2017, expected defense in early 2021.

V. Monbet has supervised

- Le Thu Tran, PhD, Univ Rennes, Diagnosis Learning from Scarce Data with Sparse Representations in Continuous Dictionaries. Funding: 1/2 UR1 + 1/2 ANR AI4SDA
- Gabriel Jouan, PhD, Univ Rennes, Scalian, granted by CIFRE - Scalian. Co-supervision with A. Cuzol (UBS) et G. Monnier (Scalian).
- Esso-Ridah Bleza, PhD, Univ Bretagne Sud, Janasense, granted by CIFRE- LIFY air. Co-supervision: PF Marteau (UBS)
- Said Obakrim, PhD, Univ Rennes, Ifremer (co-supervision: N. Raillard (Ifremer) et P. Ailliot (UBO)). Funding: 1/2 UR1 + 1/2 Ifremer.

### 8.2.3 PhD defended

- Soufiane Ait Tilat, PhD, co-supervision with Frédéric Champagnat (Onera, Palaiseau). Soufiane obtained his PhD on the 20th of December 2020. PhD Title : « Détection et localisation de particules dans des images PIV via des approches parcimonieuses à grille ».

### 8.2.4 Juries

V. Monbet has been a member of the following PhD juries:

- (Reviewer) G. Toulemonde (Montpellier), extreme values statistics and meteo.
- (Reviewer) S. Parey (Saclay), extreme values statistics and meteo.

# 9 Scientific production

## 9.1 Publications of the year

### International journals

- 1 articleStochastic weather generator for the design and reliability evaluation of desalination systems with Renewable Energy SourcesRenewable Energy1582020, 541-553
- 2 articleWhen does OMP achieve exact recovery with continuous dictionaries?Applied and Computational Harmonic Analysis51March 2021, 39
- 3 articleSafe Squeezing for Antisparse CodingIEEE Transactions on Signal Processing14May 2020, 3252-3265
- 4 article More on the long time stability of Feynman-Kac semigroups Stochastics and Partial Differential Equations: Analysis and Computations 2020
- 5 articleAdapting Reduced Models in the Cross-Entropy MethodSIAM/ASA Journal on Uncertainty Quantification822020, 511–538
- 6 articlePerformance guarantees for a variational "multi-space" decoderAdvances in Computational Mathematics4610February 2020, 1-23
- 7 articleA Weak Overdamped Limit Theorem for Langevin ProcessesALEA : Latin American Journal of Probability and Mathematical Statistics1712020, 1-21

### International peer-reviewed conferences

- 8 inproceedings Interpolating and translation-invariant approximations of parametric dictionaries 28th European Signal Processing Conference (EUSIPCO 2020) Amsterdam, Netherlands December 2021
- 9 inproceedings Translation-invariant interpolation of parametric dictionaries iTwist 2020 - International Traveling Workshop on Interactions between low-complexity data models and Sensing Techniques Nantes, France December 2020
- 10 inproceedingsSequential Sensor Placement using Bayesian Compressed Sensing for Source Localization28th European Signal Processing Conference (EUSIPCO 2020)2021European Signal Processing ConferenceAmsterdam, NetherlandsJanuary 2021, 241-245
- 11 inproceedingsShort and squeezed: accelerating the computation of antisparse representations with safe squeezingICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)Barcelona, SpainMay 2020, 5615-5619
- 12 inproceedingsGeneralized Kernel-Based Dynamic Mode DecompositionICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)Barcelona, FranceMay 2020, 3877-3881

### Conferences without proceedings

- 13 inproceedingsContinuous dictionaries meet low-rank tensor approximationsiTwist 2020 - International Traveling Workshop on Interactions between low-complexity data models and Sensing TechniquesNantes, FranceJune 2020, 1-3

### Reports & preprints

- 14 misc Exact targeting of Gibbs distributions using velocity-jump processes August 2020