The general scope of the AIRSEA project-team is to develop mathematical and computational methods for the modeling of oceanic and atmospheric flows.
The mathematical tools used involve both deterministic and statistical approaches. The main research topics cover a) modeling and coupling b) model reduction for sensitivity analysis, coupling and multiscale optimizations c) sensitivity analysis, parameter estimation and risk assessment d) algorithms for high performance computing. The range of application is from climate modeling to the prediction of extreme events.

Recent events have raised questions regarding the social and economic implications of anthropic alterations of the Earth system, i.e. climate change and the associated risks of increasing extreme events. Ocean and atmosphere, coupled with other components (continent and ice) are the building blocks of the Earth system. A better understanding of the ocean atmosphere system is a key ingredient for improving prediction of such events. Numerical models are essential tools to understand processes, and simulate and forecast events at various space and time scales. Geophysical flows generally have a number of characteristics that make it difficult to model them. This justifies the development of specifically adapted mathematical methods:

Our scientific objectives are divided into four major points. The first objective focuses on developing advanced mathematical methods for both the ocean and atmosphere, and the coupling of these two components. The second objective is to investigate the derivation and use of model reduction to face problems associated with the numerical cost of our applications. The third objective is directed toward the management of uncertainty in numerical simulations. The last objective deals with efficient numerical algorithms for new computing platforms. As mentioned above, the targeted applications cover oceanic and atmospheric modeling and related extreme events using a hierarchy of models of increasing complexity.

Current numerical oceanic and atmospheric models suffer from a number of well-identified problems. These problems are mainly related to lack of horizontal and vertical resolution, thus requiring the parameterization of unresolved (subgrid scale) processes and control of discretization errors in order to fulfill criteria related to the particular underlying physics of rotating and strongly stratified flows. Oceanic and atmospheric coupled models are increasingly used in a wide range of applications from global to regional scales. Assessment of the reliability of those coupled models is an emerging topic as the spread among the solutions of existing models (e.g., for climate change predictions) has not been reduced with the new generation models when compared to the older ones.

Advanced methods for modeling 3D rotating and stratified flows
The continuous increase of computational power and the resulting finer grid resolutions have triggered a recent regain of interest in numerical methods and their relation to physical processes. Going beyond present knowledge requires a better understanding of numerical dispersion/dissipation ranges and their connection to model fine scales. Removing the leading order truncation error of numerical schemes is thus an active topic of research and each mathematical tool has to adapt to the characteristics of three dimensional stratified and rotating flows. Studying the link between discretization errors and subgrid scale parameterizations is also arguably one of the main challenges.

Complexity of the geometry, boundary layers, strong stratification and lack of resolution are the main sources of discretization errors in the numerical simulation of geophysical flows. This emphasizes the importance of the definition of the computational grids (and coordinate systems) both in horizontal and vertical directions, and the necessity of truly multi resolution approaches. At the same time, the role of the small scale dynamics on large scale circulation has to be taken into account. Such parameterizations may be of deterministic as well as stochastic nature and both approaches are taken by the AIRSEA team. The design of numerical schemes consistent with the parameterizations is also arguably one of the main challenges for the coming years. This work is complementary and linked to that on parameters estimation described in 3.3.

Ocean Atmosphere interactions and formulation of coupled models
State-of-the-art climate models (CMs) are complex systems under continuous development. A fundamental aspect of climate modeling is the representation of air-sea interactions. This covers a large range of issues: parameterizations of atmospheric and oceanic boundary layers, estimation of air-sea fluxes, time-space numerical schemes, non conforming grids, coupling algorithms ...Many developments related to these different aspects were performed over the last 10-15 years, but were in general conducted independently of each other.

The aim of our work is to revisit and enrich several aspects of the representation of air-sea interactions in CMs, paying special attention to their overall consistency with appropriate mathematical tools. We intend to work consistently on the physics and numerics. Using the theoretical framework of global-in-time Schwarz methods, our aim is to analyze the mathematical formulation of the parameterizations in a coupling perspective. From this study, we expect improved predictability in coupled models (this aspect will be studied using techniques described in 3.3). Complementary work on space-time nonconformities and acceleration of convergence of Schwarz-like iterative methods (see 8.1.2) are also conducted.

The high computational cost of the applications is a common and major concern to have in mind when deriving new methodological approaches. This cost increases dramatically with the use of sensitivity analysis or parameter estimation methods, and more generally with methods that require a potentially large number of model integrations.

A dimension reduction, using either stochastic or deterministic methods, is a way to reduce significantly the number of degrees of freedom, and therefore the calculation time, of a numerical model.

Model reduction
Reduction methods can be deterministic (proper orthogonal decomposition, other reduced bases) or stochastic (polynomial chaos, Gaussian processes, kriging), and both fields of research are very active. Choosing one method over another strongly depends on the targeted application, which can be as varied as real-time computation, sensitivity analysis (see e.g., section 8.4) or optimisation for parameter estimation (see below).

Our goals are multiple, but they share a common need for certified error bounds on the output. Our team has a 4-year history of working on certified reduction methods and has a unique positioning at the interface between deterministic and stochastic approaches. Thus, it seems interesting to conduct a thorough comparison of the two alternatives in the context of sensitivity analysis. Efforts will also be directed toward the development of efficient greedy algorithms for the reduction, and the derivation of goal-oriented sharp error bounds for non linear models and/or non linear outputs of interest. This will be complementary to our work on the deterministic reduction of parametrized viscous Burgers and Shallow Water equations where the objective is to obtain sharp error bounds to provide confidence intervals for the estimation of sensitivity indices.

Reduced models for coupling applications
Global and regional high-resolution oceanic models are either coupled to an atmospheric model
or forced at the air-sea interface by fluxes computed empirically preventing proper physical
feedback between the two media. Thanks to high-resolution observational studies, the existence of air-sea
interactions at oceanic mesoscales (i.e., at

Multiphysics coupling often requires iterative methods to obtain a mathematically correct numerical solution. To mitigate the cost of the iterations, we will investigate the possibility of using reduced-order models for the iterative process. We will consider different ways of deriving a reduced model: coarsening of the resolution, degradation of the physics and/or numerical schemes, or simplification of the governing equations. At a mathematical level, we will strive to study the well-posedness and the convergence properties when reduced models are used. Indeed, running an atmospheric model at the same resolution as the ocean model is generally too expensive to be manageable, even for moderate resolution applications. To account for important fine-scale interactions in the computation of the air-sea boundary condition, the objective is to derive a simplified boundary layer model that is able to represent important 3D turbulent features in the marine atmospheric boundary layer.

Reduced models for multiscale optimization
The field of multigrid methods for optimisation has known a tremendous development over the past few decades. However, it has not been applied to oceanic and atmospheric problems apart from some crude (non-converging) approximations or applications to simplified and low dimensional models. This is mainly due to the high complexity of such models and to the difficulty in handling several grids at the same time. Moreover, due to complex boundaries and physical phenomena, the grid interactions and transfer operators are not trivial to define.

Multigrid solvers (or multigrid preconditioners) are efficient methods for the solution of variational data assimilation problems. We would like to take advantage of these methods to tackle the optimization problem in high dimensional space. High dimensional control space is obtained when dealing with parameter fields estimation, or with control of the full 4D (space time) trajectory. It is important since it enables us to take into account model errors. In that case, multigrid methods can be used to solve the large scales of the problem at a lower cost, this being potentially coupled with a scale decomposition of the variables themselves.

There are many sources of uncertainties in numerical models. They are due to imperfect external forcing, poorly known parameters, missing physics and discretization errors. Studying these uncertainties and their impact on the simulations is a challenge, mostly because of the high dimensionality and non-linear nature of the systems. To deal with these uncertainties we work on three axes of research, which are linked: sensitivity analysis, parameter estimation and risk assessment. They are based on either stochastic or deterministic methods.

Sensitivity analysis
Sensitivity analysis (SA), which links uncertainty in the model inputs to uncertainty in the model outputs, is a powerful tool for model design and validation. First, it can be a pre-stage for parameter estimation (see 3.3), allowing for the selection of the more significant parameters. Second, SA permits understanding and quantifying (possibly non-linear) interactions induced by the different processes defining e.g., realistic ocean atmosphere models. Finally SA allows for validation of models, checking that the estimated sensitivities are consistent with what is expected by the theory.
On ocean, atmosphere and coupled systems, only first order deterministic SA are performed, neglecting the initialization process (data assimilation). AIRSEA members and collaborators proposed to use second order information to provide consistent sensitivity measures, but so far it has only been applied to simple academic systems. Metamodels are now commonly used, due to the cost induced by each evaluation of complex numerical models: mostly Gaussian processes, whose probabilistic framework allows for the development of specific adaptive designs, and polynomial chaos not only in the context of intrusive Galerkin approaches but also in a black-box approach. Until recently, global SA was based primarily on a set of engineering practices. New mathematical and methodological developments have led to the numerical computation of Sobol' indices, with confidence intervals assessing for both metamodel and estimation errors. Approaches have also been extended to the case of dependent entries, functional inputs and/or output and stochastic numerical codes. Other types of indices and generalizations of Sobol' indices have also been introduced.

Concerning the stochastic approach to SA we plan to work with parameters that show spatio-temporal dependencies and to continue toward more realistic applications where the input space is of huge dimension with highly correlated components. Sensitivity analysis for dependent inputs also introduces new challenges. In our applicative context, it would seem prudent to carefully learn the spatio-temporal dependences before running a global SA. In the deterministic framework we focus on second order approaches where the sought sensitivities are related to the optimality system rather than to the model; i.e., we consider the whole forecasting system (model plus initialization through data assimilation).

All these methods allow for computing sensitivities and more importantly a posteriori error statistics.

Parameter estimation
Advanced parameter estimation methods are barely used in ocean, atmosphere and coupled systems, mostly due to a difficulty of deriving adequate response functions, a lack of knowledge of these methods in the ocean-atmosphere community, and also to the huge associated computing costs. In the presence of strong uncertainties on the model but also on parameter values, simulation and inference are closely associated. Filtering for data assimilation and Approximate Bayesian Computation (ABC) are two examples of such association.

Stochastic approach can be compared with the deterministic approach, which allows to determine the sensitivity of the flow to parameters and optimize their values relying on data assimilation. This approach is already shown to be capable of selecting a reduced space of the most influent parameters in the local parameter space and to adapt their values in view of correcting errors committed by the numerical approximation. This approach assumes the use of automatic differentiation of the source code with respect to the model parameters, and optimization of the obtained raw code.

AIRSEA assembles all the required expertise to tackle these difficulties. As mentioned previously, the choice of parameterization schemes and their tuning has a significant impact on the result of model simulations. Our research will focus on parameter estimation for parameterized Partial Differential Equations (PDEs) and also for parameterized Stochastic Differential Equations (SDEs). Deterministic approaches are based on optimal control methods and are local in the parameter space (i.e., the result depends on the starting point of the estimation) but thanks to adjoint methods they can cope with a large number of unknowns that can also vary in space and time. Multiscale optimization techniques as described in 8.3 will be one of the tools used. This in turn can be used either to propose a better (and smaller) parameter set or as a criterion for discriminating parameterization schemes. Statistical methods are global in the parameter state but may suffer from the curse of dimensionality. However, the notion of parameter can also be extended to functional parameters. We may consider as parameter a functional entity such as a boundary condition on time, or a probability density function in a stationary regime. For these purposes, non-parametric estimation will also be considered as an alternative.

Risk assessment
Risk assessment in the multivariate setting suffers from a lack of consensus on the choice of indicators. Moreover, once the indicators are designed, it still remains to develop estimation procedures, efficient even for high risk levels. Recent developments for the assessment of financial risk have to be considered with caution as methods may differ pertaining to general financial decisions or environmental risk assessment. Modeling and quantifying uncertainties related to extreme events is of central interest in environmental sciences. In relation to our scientific targets, risk assessment is very important in several areas: hydrological extreme events, cyclone intensity, storm surges...Environmental risks most of the time involve several aspects which are often correlated. Moreover, even in the ideal case where the focus is on a single risk source, we have to face the temporal and spatial nature of environmental extreme events.
The study of extremes within a spatio-temporal framework remains an emerging field where the development of adapted statistical methods could lead to major progress in terms of geophysical understanding and risk assessment thus coupling data and model information for risk assessment.

Based on the above considerations we aim to answer the following scientific questions: how to measure risk in a multivariate/spatial framework? How to estimate risk in a non stationary context? How to reduce dimension (see 3.2) for a better estimation of spatial risk?

Extreme events are rare, which means there is little data available to make inferences of risk measures. Risk assessment based on observation therefore relies on multivariate extreme value theory. Interacting particle systems for the analysis of rare events is commonly used in the community of computer experiments. An open question is the pertinence of such tools for the evaluation of environmental risk.

Most numerical models are unable to accurately reproduce extreme events. There is therefore a real need to develop efficient assimilation methods for the coupling of numerical models and extreme data.

Methods for sensitivity analysis, parameter estimation and risk assessment are extremely costly due to the necessary number of model evaluations. This number of simulations require considerable computational resources, depends on the complexity of the application, the number of input variables and desired quality of approximations. To this aim, the AIRSEA team is an intensive user of HPC computing platforms, particularly grid computing platforms. The associated grid deployment has to take into account the scheduling of a huge number of computational requests and the links with data-management between these requests, all of these as automatically as possible. In addition, there is an increasing need to propose efficient numerical algorithms specifically designed for new (or future) computing architectures and this is part of our scientific objectives. According to the computational cost of our applications, the evolution of high performance computing platforms has to be taken into account for several reasons. While our applications are able to exploit space parallelism to its full extent (oceanic and atmospheric models are traditionally based on a spatial domain decomposition method), the spatial discretization step size limits the efficiency of traditional parallel methods. Thus the inherent parallelism is modest, particularly for the case of relative coarse resolution but with very long integration time (e.g., climate modeling). Paths toward new programming paradigms are thus needed. As a step in that direction, we plan to focus our research on parallel in time methods.

New numerical algorithms for high performance computing
Parallel in time methods can be classified into three main groups. In the first group, we find methods using parallelism across the method, such as parallel integrators for ordinary differential equations. The second group considers parallelism across the problem. Falling into this category are methods such as waveform relaxation
where the space-time system is decomposed into a set of subsystems which can then be solved independently using some form of relaxation techniques or multigrid reduction in time.
The third group of methods focuses on parallelism across the steps. One of the best known algorithms in this family is parareal.
Other methods combining the strengths of those listed above (e.g., PFASST) are currently under investigation in the community.

Parallel in time methods are iterative methods that may require a large number of iteration before convergence. Our first focus will be on the convergence analysis of parallel in time (Parareal / Schwarz) methods for the equation systems of oceanic and atmospheric models. Our second objective will be on the construction of fast (approximate) integrators for these systems. This part is naturally linked to the model reduction methods of section (8.3.1). Fast approximate integrators are required both in the Schwarz algorithm (where a first guess of the boundary conditions is required) and in the Parareal algorithm (where the fast integrator is used to connect the different time windows). Our main application of these methods will be on climate (i.e., very long time) simulations. Our second application of parallel in time methods will be in the context of optimization methods. In fact, one of the major drawbacks of the optimal control techniques used in 3.3 is a lack of intrinsic parallelism in comparison with ensemble methods. Here, parallel in time methods also offer ways to better efficiency. The mathematical key point is centered on how to efficiently couple two iterative methods (i.e., parallel in time and optimization methods).

The evolution of natural systems, in the short, mid, or long term, has extremely important consequences for both the global Earth system and humanity. Forecasting this evolution is thus a major challenge from the scientific, economic, and human viewpoints.

Humanity has to face the problem of global warming, brought on by the
emission of greenhouse gases from human activities. This warming will probably cause huge changes at global and regional
scales, in terms of climate, vegetation and biodiversity, with major consequences for local populations.
Research has therefore been conducted over the past 15 to 20 years in an effort to
model the Earth's climate and forecast its evolution in the 21st century in response to anthropic
action.

With regard to short-term forecasts, the best and oldest example is of course weather forecasting.
Meteorological services have been providing daily short-term forecasts for several decades which are of
crucial importance for numerous human activities.

Numerous other problems can also be mentioned, like seasonal weather
forecasting (to enable powerful phenomena like an El Nioperational oceanography (short-term forecasts of the evolution of the ocean system to provide services for the fishing industry, ship routing, defense, or the fight against marine pollution) or the prediction of floods.

As mentioned previously, mathematical and numerical tools are omnipresent and play a fundamental role in these areas of research. In this context, the vocation of AIRSEA is not to carry out numerical prediction, but to address mathematical issues raised by the development of prediction systems for these application fields, in close collaboration with geophysicists.

Most of the research activities of the AIRSEA team are directed towards the improvement of numerical systems of the ocean and the atmosphere. This includes the development of appropriated numerical methods, model/parameter calibration using observational data and uncertainty quantification for decision making. The AIRSEA team members work in close collaboration with the researchers in the field of geophyscial fluid and are partners of several interdisciplinary projects. They also strongly contribute to the development of state of the art numerical systems, like NEMO and CROCO in the ocean community.

With the increase of resolution, the hydrostatic assumption becomes less valid and the AIRSEA group also works on the development of non-hydrostatic ocean models. The treatment of non-hydrostatic incompressible flows leads to a 3D elliptic system for pressure that can be ill conditioned in particular with non geopotential vertical coordinates. That is why we favour the use of the non-hydrostatic compressible equations that removes the need for a 3D resolution at the price of reincluding acoustic waves 17.

A comparison between 2D and 3D simulations of non hydrostatic surface waves has been performed in 44.

In addition, Emilie Duval started her PhD in September 2018 on the coupling between the hydrostatic incompressible and non-hydrostatic compressible equations. A detailed analysis of acoustic-gravity waves in a free-surface compressible and stratified ocean is presented in 35.

Accurate and stable implementation of bathymetry boundary conditions in ocean models remains a challenging problem. The dynamics of ocean flow often depend sensitively on satisfying bathymetry boundary conditions and correctly representing their complex geometry. Generalized (e.g. ) terrain-following coordinates are often used in ocean models, but they require smoothing the bathymetry to reduce pressure gradient errors.
Geopotential -coordinates are a common alternative that avoid pressure gradient and numerical diapycnal diffusion errors, but they generate spurious flow due to their “staircase” geometry. In 9, we introduce a new Brinkman volume penalization to approximate the no-slip boundary condition and complex geometry of bathymetry in ocean models.
This approach corrects the staircase effect of -coordinates, does not introduce any new stability constraints on the geometry of the bathymetry and is easy to implement in an existing ocean model. The porosity parameter allows modelling subgrid scale details of the geometry. We illustrate the penalization and confirm its accuracy by applying it to three standard test flows: upwelling over a sloping bottom, resting state over a seamount and internal tides over highly peaked bathymetry features.
Figure (1) shows strong improvements obtained when the penalization method is used in comparison with traditional terrain following

In the context of the H2020 IMMERSE project, the AIRSEA team is in charge of the development of a new time stepping strategy for the NEMO ocean model. Nicolas Ducousso works particularly on the design of Runge Kutta schemes for ocean models, taking into account the splitting between barotropic and baroclinic models. The team also studies the use of exponential time integrators (40) for ocean models.

The Airsea team is involved in the modeling and algorithmic aspects of ocean-atmosphere (OA) coupling. For the last few years we have been actively working on the analysis of such coupling both in terms of its continuous and numerical formulation (see

61for an overview). Our activities can be divided into four general topics

Continuous and discrete analysis of Schwarz algorithms for OA coupling:
we have been developing coupling approaches for several years, based on so-called Schwarz algorithms.
Schwarz-like domain decomposition methods are very popular in mathematics, computational sciences and
engineering notably for the implementation of coupling strategies. However, for complex applications (like in OA coupling)
it is challenging to have an a priori knowledge of the convergence properties of such methods.
Indeed coupled problems arising in Earth system modeling often exhibit sharp turbulent boundary layers whose
parameterizations lead to peculiar transmission conditions and diffusion coefficients. In the framework of S. Thery PhD (defended in February 2021)
the well-posedness of the non-linear coupling problem including parameterizations has been addressed and a detailed
continuous analysis of the convergence properties of the Schwarz methods has been pursued to entangle the impact of the
different parameters at play in such coupling problem 45. During the first year of C. Simon PhD,
a general framework has been proposed to study the convergence properties at a (semi)-discrete level to allow a systematic
comparison with the results obtained from the continuous problem.

Within the COCOA project, a Schwarz-like iterative method has been applied in a state-of-the-art Earth-System model to evaluate the consequences of inaccuracies in the usual ad-hoc ocean-atmosphere coupling algorithms used in realistic models 20. Numerical results obtained with an iterative process show large differences at sunrise and sunset compared to usual ad-hoc algorithms thus showing that synchrony errors inherent to ad-hoc coupling methods can be significant.

Representation of the air-sea interface in coupled models:
During the PhD-thesis of Charles Pelletier the scope was on including the formulation
of physical parameterizations in the theoretical analysis of the coupling, in particular
the parameterization schemes to compute air-sea fluxes. Following this work, a novel and rigorous framework
for a consistent two-sided modeling of the surface boundary layer has been proposed 22.
This framework allows for a more general representation of the vertical physics at the air-sea interface
while improving the mathematical regularity of the numerical solutions. Moreover, it is flexible enough to
include additional physical parameters for example to account for the effect of surface waves in the turbulent
flux computation. This work is the first step toward more adequate discretization methods for the parameterization
of surface and planetary boundary layers in climate models 3162
(PhD-thesis of C. Simon). The problem of interest takes the form of an nonstationary nonlinear
parabolic equation. The objective is to derive a discretization for which we could prove
nonlinear stability criteria and show robustness to large variations in parabolic Courant
number while being consistent with our knowledge of the underlying physical
principles (e.g. the Monin-Obukhov theory in the surface layer).This work
will be carried out in the framework of a project funded by SHOM (contract 19CP07).

Lately, in the framework of the M. Perrot internship, we have been working in collaboration with Etienne Mémin (Fluminance project team) to investigate the possibility to derive a stochastic representation of the atmospheric surface layer dynamics and thermodynamics based on a modeling under location uncertainty. Efforts in this direction will be pursued.

These topics are addressed through strong collaborations between the applied mathematicians and the climate and operational community (Meteo-France, Ifremer, SHOM, Mercator-Ocean, LMD, and LOCEAN). Our work on ocean-atmosphere coupling has steadily matured over the last few years and has reached a point where it triggered interest from the physicists. Through the funding of the the projects ANR COCOA (started in January 2017, PI: E. Blayo) and SHOM 19CP07 (started in January 2020, PI: F. Lemarié), Airsea team members play a major role in the structuration of a multi-disciplinary scientific community working on ocean-atmosphere coupling spanning a broad range from mathematical theory to practical implementations in climate and operational models. An expected outcome of those projects should be the design of a benchmark suite of idealized coupled test cases representative of known issues in coupled models. Such idealized test cases should motivate further collaborations at an international level. In this context, a single-column version of the CNRM climate models has been designed and several coupling algorithms have been implemented (work done by S. Valcke, CERFACS). This model will be used to illustrate the relevance of our theoretical work in a semi-realistic context.

In the context of the French initiative CROCO (Coastal and Regional Ocean COmmunity model, https://

Artificial intelligence and machine learning may be considered as a potential way to address unresolved model scales and to approximate poorly known processes such as dissipation that occurs essentially at small scales. In order to understand the possibility to combine numerical model and neural network learned with the aid of external data, we develop a network generation and learning algorithm and use it to approximate nonlinear model operators. Beginning with a simple nonlinear equations like transport-diffusion and Burgers ones, we use artificially generated external data to learn the network by Adam algorithm

59. Results show the possibility to approximate nonlinear, and even discontinuous dissipation operator with a quite good accuracy, however, several millions iterations are necessary to learn. Another potential way to reconstruct subgrid scales consists in application the Image Super-Resolution methods that refer to the process of recovering high-resolution images from low-resolution image in computer vision and image processing. Recent years have shown remarkable progress of image super-resolution using machine learning techniques

73. We try to use this methodology in order to identify fine structure of the chaotic turbulent solution of a simple barotropic ocean model. After the learning the flow patterns obtained by the high resolution model, the neuron net can identify fine structure in the low-resolution model solution with better precision than bicubic interpolation.

At the present time the observation of Earth from space is done by more than thirty satellites. These platforms provide two kinds of observational information:

Our current developments are targeted at the use of learning methods methods to describe the evolution of the images. This approach is being applied to the tracking of oceanic oil spills in the framework of a Long Li's Phd in co-supervision with Jianwei Ma.

Accounting for realistic observations errors is a known bottleneck in data assimilation, because dealing with error correlations is complex. Following a previous study on this subject, we propose to use multiscale modelling, more precisely wavelet transform, to address this question. In 7 we investigate the problem further by addressing two issues arising in real-life data assimilation: how to deal with partially missing data (e.g., concealed by an obstacle between the sensor and the observed system); how to solve convergence issues associated to complex observation error covariance matrices? Two adjustments relying on wavelets modelling are proposed to deal with those, and offer significant improvements. The first one consists in adjusting the variance coefficients in the frequency domain to account for masked information. The second one consists in a gradual assimilation of frequencies. Both of these fully rely on the multiscale properties associated with wavelet covariance modelling.

A collaborative project started with C. Lauvernet (IRSTEA) in order to make use of this kind of assimilation strategies on the control of pesticide transfer and it led to the co supervision of E. Rouzies PhD, started in Dec 2019.

Numerical models describing the evolution of the system (ocean + atmosphere) contain a large number of parameters which are generally poorly known. The reliability of the numerical simulations strongly depends on the identification and calibration of these parameters from observed data. In this context, it seems important to understand the kinds of low-dimensional structure that may be present in geophysical models and to exploit this low-dimensional structure with appropriate algorithms. We focus in the team, on parameter space dimension reduction techniques, low-rank structures and transport maps techniques for probability measure approximation.

In 29, we propose a framework for the greedy approximation of high-dimensional Bayesian inference problems, through the composition of multiple low-dimensional transport maps or flows. Our framework operates recursively on a sequence of “residual” distributions, given by pulling back the posterior through the previously computed transport maps. The action of each map is confined to a low-dimensional subspace that we identify by minimizing an error bound. At each step, our approach thus identifies (i) a relevant subspace of the residual distribution, and (ii) a low-dimensional transformation between a restriction of the residual onto this sub-space and a standard Gaussian. We prove weak convergence of the approach to the posterior distribution, and we demonstrate the algorithm on a range of challenging inference problems in differential equations and spatial statistics.

Identifying a low-dimensional informed parameter subspace offers a viable path to alleviating the dimensionality challenge in the sampled-based solution to large-scale Bayesian inverse problems. The article 41 introduces a novel gradient-based dimension reduction method in which the informed subspace does not depend on the data. This permits online-offline computational strategy where the expensive low-dimensional structure of the problem is detected in an offline phase, meaning before observing the data. This strategy is particularly relevant for multiple inversion problems as the same informed subspace can be reused. The proposed approach allows to control the approximation error (in expectation over the data) of the posterior distribution. We also present sampling strategies which exploit the informed subspace to draw efficiently samples from the exact posterior distribution. The method is successfully illustrated on two numerical examples: a PDE-based inverse problem and a tomography problem with Poisson data.

In 36 we propose to robustly characterize joint and conditional probability distributions via transport maps. Transport maps or "flows" deterministically couple two distributions via an expressive monotone transformation. Yet, learning the parameters of such transformations in high dimensions is challenging given few samples from the unknown target distribution, and structural choices for these transformations can have a significant impact on performance. Here we formulate a systematic framework for representing and learning monotone maps, via invertible transformations of smooth functions, and demonstrate that the associated minimization problem has a unique global optimum. Given a hierarchical basis for the appropriate function space, we propose a sample-efficient adaptive algorithm that estimates a sparse approximation for the map. We demonstrate how this framework can learn densities with stable generalization performance across a wide range of sample sizes on real-world datasets.

In 38 we introduce a method for the nonlinear dimension reduction of a high-dimensional function

The paper 43 is concerned with minimizing a sum of rational functions over a compact set of high-dimension. This work is motivated by the paper 38 in which one needs to optimze the sum of Rayleigh quotients. The proposed approach relies on the second Lasserre's hierarchy (also known as the upper bounds hierarchy) formulated on the pushforward measure in order to work in a space of smaller dimension. We show that in the general case the minimum can be approximated as closely as desired from above with a hierarchy of semidefinite programs problems or, in the particular case of a single fraction, with a hierarchy of generalized eigenvalue problems. We numerically illustrate the potential of using the pushforward measure rather than the standard upper bounds hierarchy. In our opinion, this potential should be a strong incentive to investigate a related challenging problem interesting in its own; namely integrating an arbitrary power of a given polynomial on a simple set (e.g., unit box or unit sphere) with respect to Lebesgue or Haar measure.

In the framework of Arthur Macherey’s PhD, we have proposed algorithms for solving high-dimensional Partial Differential Equations (PDEs) that combine a probabilistic interpretation of PDEs, through Feynman-Kac representation, with sparse interpolation 49. Monte-Carlo methods and time-integration schemes are used to estimate pointwise evaluations of the solution of a PDE. We use a sequential control variates algorithm, where control variates are constructed based on successive approximations of the solution of the PDE. We are now interested in solving parametrized PDE with stochastic algorithms in the framework of potentially high dimensional parameter space. A preliminary step was the development of a PAC algorithm in relative precision for bandit problem with costly sampling 39.

Reduced models are also developed In the framework of robust inversion. In 55, we have combined a new greedy algorithm for functional quantization with a Stepwise Uncertainty Reduction strategy to solve a robust inversion problem under functional uncertainties. In a more recent work, we further reduced the number of simulations required to solve the same robust inversion problem, based on Gaussian process meta-modeling on the joint input space of deterministic control parameters and functional uncertain variable 34. These results are applied to automotive depollution. This research axis was conducted in the framework of the Chair OQUAIDO.

Forecasting geophysical systems require complex models, which sometimes need to be coupled, and which make use of data assimilation. The objective of this project is, for a given output of such a system, to identify the most influential parameters, and to evaluate the effect of uncertainty in input parameters on model output. Existing stochastic tools are not well suited for high dimension problems (in particular time-dependent problems), while deterministic tools are fully applicable but only provide limited information. So the challenge is to gather expertise on one hand on numerical approximation and control of Partial Differential Equations, and on the other hand on stochastic methods for sensitivity analysis, in order to develop and design innovative stochastic solutions to study high dimension models and to propose new hybrid approaches combining the stochastic and deterministic methods. We took part to the writing of a position paper on the futur of sensitivity analysis 23.

An important challenge for stochastic sensitivity analysis is to develop methodologies which work for dependent inputs. Recently, the Shapley value, from econometrics, was proposed as an alternative to quantify the importance of random input variables to a function. Owen 67 derived Shapley value importance for independent inputs and showed that it is bracketed between two different Sobol' indices. Song et al. 71 recently advocated the use of Shapley value for the case of dependent inputs. In a recent work 66, in collaboration with Art Owen (Standford's University), we show that Shapley value removes the conceptual problems of functional ANOVA for dependent inputs. We do this with some simple examples where Shapley value leads to intuitively reasonable nearly closed form values. We also investigated further the properties of Shapley effects in 58.

In the field of sensitivity analysis, Sobol’ indices are widely used to assess the importance of the inputs of a model to its output. Among the methods that estimate these indices, the replication procedure is noteworthy for its efficient cost. A practical problem is how many model evaluations must be performed to guarantee a sufficient precision on the Sobol’ estimates. We proposed to tackle this issue by rendering the replication procedure iterative 14. The idea is to enable the addition of new model evaluations to progressively increase the accuracy of the estimates. These evaluations are done at points located in under-explored regions of the experimental designs, but preserving their characteristics. The key feature of this approach is the construction of nested space-filling designs. For the estimation of first-order indices, a nested Latin hypercube design is used. For the estimation of closed second-order indices, two constructions of a nested orthogonal array design are proposed. Regularity and uniformity properties of the nested designs are studied.

Another research direction for global SA algorithm starts with the report that most of the algorithms to compute sensitivity measures require special sampling schemes or additional model evaluations so that available data from previous model runs (e.g., from an uncertainty analysis based on Latin Hypercube Sampling) cannot be reused. One challenging task for estimating global sensitivity measures consists in recycling an available finite set of input/output data. Green sensitivity, by recycling, avoids wasting. These given data have been discussed, e.g., in 68, 69. Most of the given data procedures depend on parameters (number of bins, truncation argument…) not easy to calibrate with a bias-variance compromise perspective. Adaptive selection of these parameters remains a challenging issue for most of these given-data algorithms. In the context of María Belén Heredia’s PhD thesis, we have proposed 2 a non-parametric given data estimator for aggregated Sobol’ indices, introduced in 60 and further developed in 56 for multivariate or functional outputs. We also introduced aggregated Shapley effects and we have extended a nearest neighbor estimation procedure to estimate these indices 37.

Many models are stochastic in nature, and some of them may be driven by parametrized stochastic differential equations. It is important for applications to propose a strategy to perform global sensitivity analysis (GSA) for such models, in presence of uncertainties on the parameters. In collaboration with Pierre Etoré (DATA department in Grenoble), Clémentine Prieur proposed an approach based on Feynman-Kac formulas 12. The research on GSA for stochastic simulators is still ongoing, first in the context of the MATH-AmSud project FANTASTIC (Statistical inFerence and sensitivity ANalysis for models described by sTochASTIC differential equations) with Chile and Uruguay, secondly through the PhD thesis of Henri Mermoz Kouye, co-supervised by Clémentine Prieur, in collaboration with INRA Jouy.

Pesticide transfer models are valuable tools to predict and prevent pollution of water bodies. However, using such models in operational contexts requires a strong knowledge of their structure including influential parameters. This project aims at performing global sensitivity analysis (GSA) of the PESHMELBA model (pesticide and hydrology: modelling at the catchment scale). This work is made hard due to the modular, complex structure of the model that couples different physical processes. It results in a large input space dimension and a high computational cost that limits the number of available runs. Using classical GSA tools such as Sobol' indices is thus not feasible. In order to circumvect those limitations, alternative techniques such as HSIC dependence measure or Random Forest metamodel are explored in 33 and 48. Additionally, the use of such methods in the specific context of spatially distributed output is explored in a paper to be submitted soon.

Physically-based avalanche propagation models must still be locally calibrated to provide robust predictions, e.g. in long-term forecasting and subsequent risk assessment. Friction parameters cannot be measured directly and need to be estimated from observations. Rich and diverse data is now increasingly available from test-sites, but for measurements made along ow propagation, potential autocorrelation should be explicitly accounted for. In the context of María Belén Heredia’s PhD, in collaboration with IRSTEA Grenoble, we have proposed in 57 a comprehensive Bayesian calibration and statistical model selection framework with application to an avalanche sliding block model with the standard Voellmy friction law and high rate photogrammetric images. An avalanche released at the Lautaret test-site and a synthetic data set based on the avalanche were used to test the approach. Results have demonstrated i) the efficiency of the proposed calibration scheme, and ii) that including autocorrelation in the statistical modelling definitely improves the accuracy of both parameter estimation and velocity predictions. In the context of the energy transition, wind power generation is developing rapidly in France and worldwide. Research and innovation on wind resource characterisation, turbin control, coupled mechanical modelling of wind systems or technological development of offshore wind turbines floaters are current research topics. In particular, the monitoring and the maintenance of wind turbine is becoming a major issue. Current solutions do not take full advantage of the large amount of data provided by sensors placed on modern wind turbines in production. These data could be advantageously used in order to refine the predictions of production, the life of the structure, the control strategies and the planning of maintenance. In this context, it is interesting to optimally combine production data and numerical models in order to obtain highly reliable models of wind turbines. This process is of interest to many industrial and academic groups and is known in many fields of the industry, including the wind industry, as "digital twin”. The objective of Adrien Hirvoas's PhD work is to develop of data assimilation methodology to build the "digital twin" of an onshore wind turbine. Based on measurements, the data assimilation should allow to reduce the uncertainties of the physical parameters of the numerical model developed during the design phase to obtain a highly reliable model. Various ensemble data assimilation approches are currently under consideration to address the problem. In the context of this work, it is necessary to develop algorithms of identification quantifying and ranking all the uncertainty sources. This work in done in collaboration with IFPEN. A first paper has been accepted for publication 42.

Due to the sanitary context, Clémentine Prieur decided to join a working group, SEEPIA Simulation & Estimation of EPIdemics with Algorithms, animated by Didier Georges (Gipsa-lab). A first work has been published 15. An extension of the classical pandemic SIRD model was considered for the regional spread of COVID-19 in France under lockdown strategies. This compartment model divides the infected and the recovered individuals into undetected and detected compartments respectively. By fitting the extended model to the real detected data during the lockdown, an optimization algorithm was used to derive the optimal parameters, the initial condition and the epidemics start date of regions in France. Considering all the age classes together, a network model of the pandemic transport between regions in France was presented on the basis of the regional extended model and was simulated to reveal the transport effect of COVID-19 pandemic after lockdown. Using the the measured values of displacement of people mobilizing between each city, the pandemic network of all cities in France was simulated by using the same model and method as the pandemic network of regions. Finally, a discussion on an integro-differential equation was given and a new model for the network pandemic model of each age was provided.

This research is the subject of a collaboration with Chile and Uruguay. More precisely, we started working with Venezuela. Due to the crisis in Venezuela, our main collaborator on that topic moved to Uruguay.

We are focusing our attention on models derived from the linear Fokker-Planck equation. From a probabilistic viewpoint, these models have received particular attention in recent years, since they are a basic example for hypercoercivity. In fact, even though completely degenerated, these models are hypoelliptic and still verify some properties of coercivity, in a broad sense of the word. Such models often appear in the fields of mechanics, finance and even biology. For such models we believe it appropriate to build statistical non-parametric estimation tools. Initial results have been obtained for the estimation of invariant density, in conditions guaranteeing its existence and unicity 51 and when only partial observational data are available. A paper on the non parametric estimation of the drift has been accepted recently 52 (see Samson et al., 2012, for results for parametric models). As far as the estimation of the diffusion term is concerned, a paper has been accepted 52, in collaboration with J.R. Leon (Montevideo, Uruguay) and P. Cattiaux (Toulouse). Recursive estimators have been also proposed by the same authors in 53, also recently accepted. In a recent collaboration with Adeline Samson from the statistics department in the Lab, we considered adaptive estimation, that is we proposed a data-driven procedure for the choice of the bandwidth parameters.

In 50, we focused on damping Hamiltonian systems under the so-called fluctuation-dissipation condition. Idea in that paper were re-used with applications to neuroscience in 64.

Note that Professor Jose R. Leon (Caracas, Venezuela, Montevideo, Uruguay) was funded by an international Inria Chair, allowing to collaborate further on parameter estimation.

We recently proposed a paper on the use of the Euler scheme for inference purposes, considering reflected diffusions. This paper could be extended to the hypoelliptic framework.

We also have a collaboration with Karine Bertin (Valparaiso, Chile), Nicolas Klutchnikoff (Université Rennes) and Jose R. León (Montevideo, Uruguay) funded by a MATHAMSUD project (2016-2017) and by the LIA/CNRS (2018). We are interested in new adaptive estimators for invariant densities on bounded domains 3, and would like to extend that results to hypo-elliptic diffusions.

Many physical phenomena are modelled numerically in order to better understand and/or to predict their behaviour. However, some complex and small scale phenomena can not be fully represented in the models. The introduction of ad-hoc correcting terms, can represent these unresolved processes, but they need to be properly estimated.

A good example of this type of problem is the estimation of bottom friction parameters of the ocean floor. This is important because it affects the general circulation. This is particularly the case in coastal areas, especially for its influence on wave breaking. Because of its strong spatial disparity, it is impossible to estimate the bottom friction by direct observation, so it requires to do so indirectly by observing its effects on surface movement. This task is further complicated by the presence of uncertainty in certain other characteristics linking the bottom and the surface (eg boundary conditions). The techniques currently used to adjust these settings are very basic and do not take into account these uncertainties, thereby increasing the error in this estimate.

Classical methods of parameter estimation usually imply the minimisation of an objective function, that measures the error between some observations and the results obtained by a numerical model. In the presence of uncertainties, the minimisation is not straightforward, as the output of the model depends on those uncontrolled inputs and on the control parameter as well. That is why we will aim at minimising the objective function, to get an estimation of the control parameter that is robust to the uncertainties.

The definition of robustness differs depending of the context in which it is used. In this work, two different notions of robustness are considered: robustness by minimising the mean and variance, and robustness based on the distribution of the minimisers of the function. This information on the location of the minimisers is not a novel idea, as it had been applied as a criterion in sequential Bayesian optimisation. However, the constraint of optimality is here relaxed to define a new estimate. To evaluate this estimation, a toy model of a coastal area has been implemented. The control parameter is the bottom friction, upon which classical methods of estimation are applied in a simulation-estimation experiment. The model is then modified to include uncertainties on the boundary conditions in order to apply robust control methods. This has been published in 26

In 72, we are considering the modeling of precipitation amount with semi-parametric models, modeling both the bulk of the distribution and the tails, but avoiding the arbitrary choice of a threshold. We work in collaboration with Anne-Catherine Favre (LGGE-Lab in Grenoble) and Philippe Naveau (LSCE, Paris).

In the context of Philomène Le Gall’s PhD thesis, we are applying the aforementioned modeling of extreme precipitation with the aim of regionalizing extreme precipitation.

SAMO board is in charge of the organization of the SAMO (sensitivity analysis of model outputs) conferences, every three years. It is strongly supported by the Joint Research Center of the European Commission.

In 2019, Clémentine Prieur, which is part of this board, as also co-chair of a satellite event on the future of sensitivity analysis. A position paper 23 has been published, as a synthesis of the discussions hold in Barcelona (autumn 2019).