• The Inria's Research Teams produce an annual Activity Report presenting their activities and their results of the year. These reports include the team members, the scientific program, the software developed by the team and the new results of the year. The report also describes the grants, contracts and the activities of dissemination and teaching. Finally, the report gives the list of publications of the year.

• Legal notice
• Personal data

#### ALPINES

##### ALPINES - 2021

2021
Activity report
Project-Team
ALPINES
RNSR: 201321071B
Research center
In partnership with:
CNRS, Sorbonne Université (UPMC)
Team name:
Algorithms and parallel tools for integrated numerical simulations
In collaboration with:
Laboratoire Jacques-Louis Lions (LJLL)
Domain
Networks, Systems and Services, Distributed Computing
Theme
Distributed and High Performance Computing
Creation of the Project-Team: 2014 July 01

# Keywords

• A6.1.1. Continuous Modeling (PDE, ODE)
• A6.1.4. Multiscale modeling
• A6.1.5. Multiphysics modeling
• A6.2.1. Numerical analysis of PDE and ODE
• A6.2.5. Numerical Linear Algebra
• A6.2.7. High performance computing
• A6.3. Computation-data interaction
• A6.3.1. Inverse problems
• A7.1. Algorithms
• B3.3.1. Earth and subsoil
• B9.5.2. Mathematics
• B9.5.3. Physics

# 1 Team members, visitors, external collaborators

## Research Scientists

• Laura Grigori [Team leader, Inria, Senior Researcher, HDR]
• Sever Hirstoaga [Inria, Researcher, HDR]
• Frédéric Nataf [CNRS, Senior Researcher, HDR]
• Pierre-Henri Tournier [CNRS, Researcher]

## Faculty Members

• Xavier Claeys [Sorbonne Université, Associate Professor]
• Frédéric Hecht [Sorbonne Université, Professor, HDR]

## Post-Doctoral Fellows

• Oleg Balabanov [Inria]
• Suraj Kumar [Inria]

## PhD Students

• Siwar Badreddine [Inria]
• Matthias Beaupere [Inria]
• Igor Chollet [Inria, until Mar 2021]
• Thibault Cimic [Inria, until Sep 2021]
• David Frenkiel [Inria, until Apr 2021]
• Van Thanh Nguyen [Inria]
• Edouard Timsit [Inria, from Sep 2021]

## Technical Staff

• Victor Lederer [Inria, Engineer]

## Interns and Apprentices

• Paule Grangette [Inria, Jul 2021]
• Edouard Timsit [Inria, from Apr 2021 until Aug 2021]

• Laurence Bourcier [Inria]

# 2 Overall objectives

## 2.1 Introduction

The focus of our research is on the development of novel parallel numerical algorithms and tools appropriate for state-of-the-art mathematical models used in complex scientific applications, and in particular numerical simulations. The proposed research program is by nature multi-disciplinary, interweaving aspects of applied mathematics, computer science, as well as those of several specific applications, as porous media flows, elasticity, wave propagation in multi-scale media, molecular simulations, and inverse problems.

Our first objective is to develop numerical methods and tools for complex scientific and industrial applications that will enhance their scalable execution on the emergent heterogeneous hierarchical models of massively parallel machines. Our second objective is to integrate the novel numerical algorithms into a middle-layer that will hide as much as possible the complexity of massively parallel machines from the users of these machines.

# 3 Research program

## 3.1 Overview

The research described here is directly relevant to several steps of the numerical simulation chain. Given a numerical simulation that was expressed as a set of differential equations, our research focuses on mesh generation methods for parallel computation, novel numerical algorithms for linear algebra, as well as algorithms and tools for their efficient and scalable implementation on high performance computers. The validation and the exploitation of the results is performed with collaborators from applications and is based on the usage of existing tools. In summary, the topics studied in our group are the following:

• Numerical methods and algorithms
• Mesh generation for parallel computation
• Solvers for numerical linear algebra:

domain decomposition methods, preconditioning for iterative methods

• Computational kernels for numerical linear algebra
• Tensor computations for high dimensional problems
• Validation on numerical simulations and other numerical applications

## 3.2 Domain specific language - parallel FreeFem++

In the engineering, researchers, and teachers communities, there is a strong demand for simulation frameworks that are simple to install and use, efficient, sustainable, and that solve efficiently and accurately complex problems for which there are no dedicated tools or codes available. In our group we develop FreeFem++ (see https://­www.­freefem.­org/), a user-dedicated language for solving PDEs. The goal of FreeFem++ is not to be a substitute for complex numerical codes, but rather to provide an efficient and relatively generic tool for:

• getting a quick answer to a specific problem,
• prototyping the resolution of a new complex problem.

The current users of FreeFem++ are mathematicians, engineers, university professors, and students. In general for these users the installation of public libraries as MPI, MUMPS, Ipopt, Blas, lapack, OpenGL, fftw, scotch, PETSc, SLEPc is a very difficult problem. For this reason, the authors of FreeFem++ have created a user friendly language, and over years have enriched its capabilities and provided tools for compiling FreeFem++ such that the users do not need to have special knowledge of computer science. This leads to an important work on porting the software on different emerging architectures.

Today, the main components of parallel FreeFem++ are:

• definition of a coarse grid,
• splitting of the coarse grid,
• mesh generation of all subdomains of the coarse grid, and construction of parallel data structures for vectors and sparse matrices from the mesh of the subdomain,
• discretization by a chosen numerical method,
• call to a linear solver,
• analysis of the result.

All these components are parallel, except for the last point which is not in the focus of our research. However for the moment, the parallel mesh generation algorithm is very simple and not sufficient, for example it addresses only polygonal geometries. Having a better parallel mesh generation algorithm is one of the goals of our project. In addition, in the current version of FreeFem++, the parallelism is not hidden from the user, it is done through direct calls to MPI. Our goal is also to hide all the MPI calls in the specific language part of FreeFem++. In addition to these in-house domain decomposition methods, FreeFem++ is also linked to PETSc solvers which enables an easy use of third parties parallel multigrid methods.

## 3.3 Solvers for numerical linear algebra

Iterative methods are widely used in industrial applications, and preconditioning is the most important research subject here. Our research considers domain decomposition methods and iterative methods and its goal is to develop solvers that are suitable for parallelism and that exploit the fact that the matrices are arising from the discretization of a system of PDEs on unstructured grids.

One of the main challenges that we address is the lack of robustness and scalability of existing methods as incomplete LU factorizations or Schwarz-based approaches, for which the number of iterations increases significantly with the problem size or with the number of processors. This is often due to the presence of several low frequency modes that hinder the convergence of the iterative method. To address this problem, we study different approaches for dealing with the low frequency modes as coarse space correction in domain decomposition or deflation techniques.

We also focus on developing boundary integral equation methods that would be adapted to the simulation of wave propagation in complex physical situations, and that would lend themselves to the use of parallel architectures. The final objective is to bring the state of the art on boundary integral equations closer to contemporary industrial needs. From this perspective, we investigate domain decomposition strategies in conjunction with boundary element method as well as acceleration techniques (H-matrices, FMM and the like) that would appear relevant in multi-material and/or multi-domain configurations. Our work on this topic also includes numerical implementation on large scale problems, which appears as a challenge due to the peculiarities of boundary integral equations.

## 3.4 Computational kernels for numerical linear and multilinear algebra

The design of new numerical methods that are robust and that have well proven convergence properties is one of the challenges addressed in Alpines. Another important challenge is the design of parallel algorithms for the novel numerical methods and the underlying building blocks from numerical linear algebra. The goal is to enable their efficient execution on a diverse set of node architectures and their scaling to emerging high-performance clusters with an increasing number of nodes.

Increased communication cost is one of the main challenges in high performance computing that we address in our research by investigating algorithms that minimize communication, as communication avoiding algorithms. The communication avoiding algorithmic design is an approach originally developed in our group since more than ten years (initially in collaboration with researchers from UC Berkeley and CU Denver). While our first results concerned direct methods of factorization as LU or QR factorizations, our recent work focuses on designing robust algorithms for computing the low rank approximation of a matrix or a tensor. We focus on both deterministic and randomized approaches.

Our research also focuses on solving problems of large size that feature high dimensions as in molecular simulations. The data in this case is represented by objects called tensors, or multilinear arrays. The goal is to design novel tensor techniques to allow their effective compression, i.e. their representation by simpler objects in small dimensions, while controlling the loss of information. The algorithms are aiming to being highly parallel to allow to deal with the large number of dimensions and large data sets, while preserving the required information for obtaining the solution of the problem.

# 4 Application domains

## 4.1 Compositional multiphase Darcy flow in heterogeneous porous media

We study the simulation of compositional multiphase flow in porous media with different types of applications, and we focus in particular on reservoir/bassin modeling, and geological CO2 underground storage. All these simulations are linearized using Newton approach, and at each time step and each Newton step, a linear system needs to be solved, which is the most expensive part of the simulation. This application leads to some of the difficult problems to be solved by iterative methods. This is because the linear systems arising in multiphase porous media flow simulations cumulate many difficulties. These systems are non-symmetric, involve several unknowns of different nature per grid cell, display strong or very strong heterogeneities and anisotropies, and change during the simulation. Many researchers focus on these simulations, and many innovative techniques for solving linear systems have been introduced while studying these simulations, as for example the nested factorization [Appleyard and Cheshire, 1983, SPE Symposium on Reservoir Simulation].

## 4.2 Inverse problems

We focus on methods related to the blend of time reversal techniques and absorbing boundary conditions (ABC) used in a non standard way. Since the seminal paper by [M. Fink et al., Imaging through inhomogeneous media using time reversal mirrors. Ultrasonic Imaging, 13(2):199, 1991.], time reversal is a subject of very active research. The principle is to back-propagate signals to the sources that emitted them. The initial experiment was to refocus, very precisely, a recorded signal after passing through a barrier consisting of randomly distributed metal rods. In [de Rosny and Fink. Overcoming the diffraction limit in wave physics using a time-reversal mirror and a novel acoustic sink. Phys. Rev. Lett., 89 (12), 2002], the source that created the signal is time reversed in order to have a perfect time reversal experiment. In 31, we improve this result from a numerical point of view by showing that it can be done numerically without knowing the source. This is done at the expense of not being able to recover the signal in the vicinity of the source. In 32, time dependent wave splitting is performed using ABC and time reversal techniques. We now work on extending these methods to non uniform media.

All our numerical simulations are performed in FreeFem++ which is very flexible. As a byproduct, it enables us to have an end user point of view with respect to FreeFem++ which is very useful for improving it.

## 4.3 Numerical methods for wave propagation in multi-scale media

We are interested in the development of fast numerical methods for the simulation of electromagnetic waves in multi-scale situations where the geometry of the medium of propagation may be described through caracteristic lengths that are, in some places, much smaller than the average wavelength. In this context, we propose to develop numerical algorithms that rely on simplified models obtained by means of asymptotic analysis applied to the problem under consideration.

Here we focus on situations involving boundary layers and localized singular perturbation problems where wave propagation takes place in media whose geometry or material caracteristics are submitted to a small scale perturbation localized around a point, or a surface, or a line, but not distributed over a volumic sub-region of the propagation medium. Although a huge literature is already available for the study of localized singular perturbations and boundary layer pheneomena, very few works have proposed efficient numerical methods that rely on asymptotic modeling. This is due to their functional framework that naturally involves singular functions, which are difficult to handle numerically. The aim of this part of our reasearch is to develop and analyze numerical methods for singular perturbation methods that are prone to high order numerical approximation, and robust with respect to the small parameter characterizing the singular perturbation.

## 4.4 Data analysis in astrophysics

We focus on computationally intensive numerical algorithms arising in the data analysis of current and forthcoming Cosmic Microwave Background (CMB) experiments in astrophysics. This application is studied in collaboration with researchers from University Paris Diderot, and the objective is to make available the algorithms to the astrophysics community, so that they can be used in large experiments.

In CMB data analysis, astrophysicists produce and analyze multi-frequency 2D images of the universe when it was 5% of its current age. The new generation of the CMB experiments observes the sky with thousands of detectors over many years, producing overwhelmingly large and complex data sets, which nearly double every year therefore following Moore's Law. Planck (http://­planck.­esa.­int/) is a keystone satellite mission which has been developed under auspices of the European Space Agency (ESA). Planck has been surveying the sky since 2010, produces terabytes of data and requires 100 Petaflops per image analysis of the universe. It is predicted that future experiments will collect on the order of petabyte of data around 2025. This shows that data analysis in this area, as many other applications, will keep pushing the limit of available supercomputing power for the years to come.

## 4.5 Molecular simulations

Molecular simulation is one of the most dynamic areas of scientific computing. Its field of application is very broad, ranging from theoretical chemistry and drug design to materials science and nanotechnology. It provides many challenging problems to mathematicians, and computer scientists.

In the context of the ERC Synergy Grant EMC2 we address several important limitations of state of the art molecular simulation. In particular, the simulation of very large molecular systems, or smaller systems in which electrons interact strongly with each other, remains out of reach today. In an interdisciplinary collaboration between chemists, mathematicians and computer scientists, we focus on developing a new generation of reliable molecular simulation algorithms and software.

# 5 Highlights of the year

• EUMaster4HPC
project funded by EuroHPC on an european Master programme in HPC. Sorbonne University is one of the 8 awarding universities and in charge of several tasks aiming at defining the curriculum of this Master. L. Grigori is the coordinator for Sorbonne.
• An analysis of the submitted proposals to the Prace Fast Track Call for Proposals in support to mitigate impact of Covid 19 (opened in March 2020) has appeared in 20. L. Grigori was one of the five members of the Scientific Committee.

# 6 New software and platforms

• defmm
Website defmm Self-assessment:
• Software Family: research: Software as a Vector for Knowledge
• Audience: partners: to be used by people inside and outside the team but without a clear and strong dissemination and support action plan;
• Duration of the Development: 2 years.
• Free description: defmm is a C++ library implementing a variant of black-box FMM based on a directional partionning and relying on polynomial interpolation on equispaced grids and Fast Fourier Transform for M2L and M2M/L2L operators. This library was specially designed for the treatment of matrices stemming from the discretization of boundary integral formulations of Helmholtz equation.

## 6.1 New software

### 6.1.1 FreeFem++

• Name:
FeeFrem++
• Scientific Description:

FreeFem++ is a partial differential equation solver. It has its own language. freefem scripts can solve multiphysics non linear systems in 2D and 3D.

Problems involving PDE (2d, 3d) from several branches of physics such as fluid-structure interactions require interpolations of data on several meshes and their manipulation within one program. FreeFem++ includes a fast 2d̂-tree-based interpolation algorithm and a language for the manipulation of data on multiple meshes (as a follow up of bamg (now a part of FreeFem++ ).

FreeFem++ is written in C++ and the FreeFem++ language is a C++ idiom. It runs on Macs, Windows, Unix machines. FreeFem++ replaces the older freefem and freefem+.

• Functional Description:
FreeFem++ is a PDE (partial differential equation) solver based on a flexible language that allows a large number of problems to be expressed (elasticity, fluids, etc) with different finite element approximations on different meshes.
• URL:
• Contact:
Frédéric Hecht
• Partner:
UPMC

### 6.1.2 HPDDM

• Scientific Description:
HPDDM is an efficient implementation of various domain decomposition methods (DDM) such as one- and two-level Restricted Additive Schwarz methods, the Finite Element Tearing and Interconnecting (FETI) method, and the Balancing Domain Decomposition (BDD) method. This code has been proven to be efficient for solving various elliptic problems such as scalar diffusion equations, the system of linear elasticity, but also frequency domain problems like the Helmholtz equation. A comparison with modern multigrid methods can be found in the thesis of Pierre Jolivet.
• Functional Description:
HPDDM is an efficient implementation of various domain decomposition methods (DDM) such as one- and two-level Restricted Additive Schwarz methods, the Finite Element Tearing and Interconnecting (FETI) method, and the Balancing Domain Decomposition (BDD) method.
• URL:
• Contact:
Pierre Jolivet
• Participants:
Frédéric Nataf, Pierre Jolivet

### 6.1.3 preAlps

• Name:
Preconditioned enlarged Krylov subspace methods
• Keywords:
Linear Systems Solver, Preconditioner
• Functional Description:
Contains enlarged Conjugate Gradient Krylov subspace method and Lorasc preconditioner.
• URL:
• Contact:
Laura Grigori

### 6.1.4 HTool

• Keyword:
Hierarchical matrices
• Functional Description:
HTool is a C++ header-only library implementing compression techniques (e.g. Adaptive Cross Approximation) using hierarchical matrices (H-matrices). The library uses MPI and OpenMP for parallelism, and is interfaced with HPDDM for the solution of linear systems.
• URL:
• Contact:
Pierre Marchand

### 6.1.5 BemTool

• Keyword:
Boundary element method
• Functional Description:
BemTool is a C++ header-only library implementing the boundary element method (BEM) for the discretisation of the Laplace, Helmholtz and Maxwell equations, in 2D and 3D. Its main purpose is the assembly of classical boundary element matrices, which can be compressed and inverted through its interface with the HTool library.
• URL:
• Contact:
Xavier Claeys

# 7 New results

## 7.1 Randomized block Gram-Schmidt process for solution of linear systems and eigenvalue problems

In 23 a randomized Gram-Schmidt algorithm is developed for orthonormalization of high-dimensional vectors or QR factorization. The proposed process can be less computationally expensive than the classical Gram-Schmidt process while being at least as numerically stable as the modified Gram-Schmidt process. Our approach is based on random sketching, which is a dimension reduction technique consisting in estimation of inner products of high-dimensional vectors by inner products of their small efficiently-computable random projections, so-called sketches. This allows to perform the projection step in Gram-Schmidt process on sketches rather than high-dimensional vectors with a minor computational cost. This also provides an ability to efficiently certify the output. The proposed Gram-Schmidt algorithm can provide computational cost reduction in any architecture. The benefit of random sketching can be amplified by exploiting multi-precision arithmetic. We provide stability analysis for multi-precision model with coarse unit roundoff for standard high-dimensional operations. Numerical stability is proven for the unit roundoff independent of the (high) dimension of the problem. The proposed Gram-Schmidt process can be applied to Arnoldi iteration and result in new Krylov subspace methods for solving high-dimensional systems of equations or eigenvalue problems. Among them we chose randomized GMRES method as a practical application of the methodology.

In 24 we propose a block version of the randomized Gram-Schmidt process for computing a QR factorization of a matrix. Our algorithm inherits the major properties of its single-vector analogue from 23 such as higher efficiency than the classical Gram-Schmidt algorithm and stability of the modified Gram-Schmidt algorithm, which can be refined even further by using multi-precision arithmetic. As in [Balabanov and Grigori, 2020, our algorithm has an advantage of performing standard high-dimensional operations, that define the overall computational cost, with a unit roundoff independent of the dominant dimension of the matrix. This unique feature makes the methodology especially useful for large-scale problems computed on low-precision arithmetic architectures. Block algorithms are advantageous in terms of performance as they are mainly based on cache-friendly matrix-wise operations, and can reduce communication cost in high-performance computing. The block Gram-Schmidt orthogonal- ization is the key element in the block Arnoldi procedure for the construction of Krylov basis, which in its turn is used in GMRES and Rayleigh-Ritz methods for the solution of linear systems and clustered eigenvalue problems. In this article, we develop randomized versions of these methods, based on the proposed randomized Gram-Schmidt algorithm, and validate them on nontrivial numerical examples.

## 7.2 Matrix and tensor compression

To address problems arising in computational chemistry, we focus on computing low rank tensor approximation. In 25 we introduce a parallel algorithm for computing the low rank approximation of a large matrix which minimizes the number of messages exchanged between processors (modulo polylogarithmic factors) and has guarantees for the approximations of the singular values of A provided by the approximation. This operation is essential in many applications in scientific computing and data analysis when dealing with large data sets. Our algorithm is based on QR factorization that consists in selecting a subset of columns from the matrix A that allow to approximate the range of A, and then projecting the columns of A on a basis of the subspace spanned by those columns. The selection of columns is performed by using tournament pivoting, a strategy introduced previously for matrices partitioned into blocks of columns. This strategy is extended here to matrices partitioned along both dimensions that are distributed on a two-dimensional grid of processors, and also to tournaments with more general reduction trees. Performance results show that the algorithm scales well on up to 1024 cores of 16 nodes.

In 28 we consider the problem of developing parallel decomposition and approximation algorithms for high dimensional tensors. We focus on a tensor representation named Tensor Train (TT). It stores a d-dimensional tensor in O(nrd), much less than the entries in the original tensor, where $r$ is usually a very small number and depends on the application. Sequential algorithms to compute TT decomposition and TT approximation of a tensor have been proposed in the literature. Here we propose a parallel algorithm to compute TT decomposition of a tensor. We prove that the ranks of TT-representation produced by our algorithm are bounded by the ranks of unfolding matrices of the tensor. Additionally, we propose a parallel algorithm to compute approximation of a tensor in TT-representation. Our algorithm relies on a hierarchical partitioning of the dimensions of the tensor in a balanced binary tree shape and transmission of leading singular values of associated unfolding matrix from the parent to its children. We consider several approaches on the basis of how leading singular values are transmitted in the tree. We present an in-depth experimental analysis of our approaches for different low rank tensors and also assess them for tensors obtained from quantum chemistry simulations. Our results show that the approach which transmits leading singular values to both of its children performs better in practice. Compression ratios and accuracies of the approximations obtained by our approaches are comparable with the sequential algorithm and, in some cases, even better than that.

In 18 a numerical method is proposed to compress a tensor by constructing a piece-wise tensor approximation. This is defined by partitioning a tensor into sub-tensors and by computing a low-rank tensor approximation (in a given format) in each sub-tensor. Neither the partition nor the ranks are fixed a priori, but, instead, are obtained in order to fulfill a prescribed accuracy and optimize, to some extent, the storage. The different steps of the method are detailed and some numerical experiments are proposed to assess its performances.

## 7.3 A posteriori-based adaptive preconditioners

In 15 we explore adaptive preconditioners that integrate a posteriori error estimators. In particular we introduce an adaptive preconditioner for iterative solution of sparse linear systems arising from partial differential equations with self-adjoint operators. This preconditioner allows to control the growth rate of a dominant part of the algebraic error within a fixed point iteration scheme. Several numerical results that illustrate the efficiency of this adaptive preconditioner with a PCG solver are presented and the preconditioner is also compared with a previous variant in the literature.

## 7.4 Robust multilevel Schwarz preconditioner

In 14 we present a multilevel preconditioner based on overlapping Schwarz methods for symmetric positive definite (SPD) matrices. Robust two-level Schwarz preconditioners exist in the literature to guarantee fast convergence of Krylov methods. As long as the dimension of the coarse space is reasonable, that is, exact solvers can be used efficiently, two-level methods scale well on parallel architectures. However, the factorization of the coarse space matrix may become costly at scale. An alternative is then to use an iterative method on the second level, combined with an algebraic preconditioner, such as a one-level additive Schwarz preconditioner. Nevertheless, the condition number of the resulting preconditioned coarse space matrix may still be large. One of the difficulties of using more advanced methods, like algebraic multigrid or even two-level overlapping Schwarz methods, to solve the coarse problem is that the matrix does not arise from a partial differential equation (PDE) anymore. We introduce in this paper a robust multilevel additive Schwarz preconditioner where at each level the condition number is bounded, ensuring a fast convergence for each nested solver. Furthermore, our construction does not require any additional information than for building a two-level method and may thus be seen as an algebraic extension.

## 7.5 Recycling Krylov subspaces and reducing deflation subspaces for solving sequence of linear systems

In 13 we present deflation strategies related to recycling Krylov subspace methods for solving one or a sequence of linear systems of equations. Besides well-known strategies of deflation, Ritz-, and harmonic Ritz-based deflation, we introduce an Singular Value Decomposition based deflation technique. We consider the recycling in two contexts: recycling the Krylov subspace between the restart cycles and recycling a deflation subspace when the matrix changes in a sequence of linear systems. Numerical experiments on real-life reservoir simulation demonstrate the impact of our proposed strategy.

## 7.6 Adaptive Domain Decomposition method for Saddle Point problem

In 11, we introduce an adaptive domain decomposition (DD) method for solving saddle point problems defined as a block two by two matrix. The algorithm does not require any knowledge of the constrained space. We assume that all sub matrices are sparse and that the diagonal blocks are the sum of positive semi definite matrices. The latter assumption enables the design of adaptive coarse space for DD methods. Numerical results on three dimensional elasticity problems on steel-rubber structures discretized with 1 billion degrees of freedom are shown.

## 7.7 Analysis of the SORAS domain decomposition preconditioner for non-self-adjoint or indefinite problems

In 16, we analyze the convergence of the one-level overlapping domain decomposition preconditioner SORAS (Symmetrized Optimized Restricted Additive Schwarz) applied to a generic linear system whose matrix is not necessarily symmetric/self-adjoint nor positive definite. By generalizing the theory for the Helmholtz equation developed in [I.G. Graham, E.A. Spence, and J. Zou, SIAM J. Numer. Anal., 2020], we identify a list of assumptions and estimates that are sufficient to obtain an upper bound on the norm of the preconditioned matrix, and a lower bound on the distance of its field of values from the origin. We stress that our theory is general in the sense that it is not specific to one particular boundary value problem. As an illustration of this framework, we prove new estimates for overlapping domain decomposition methods with Robin-type transmission conditions for the heterogeneous reaction-convection-diffusion equation.

## 7.8 An overlapping splitting double sweep method for the Helmholtz equation

In 26, we consider the domain decomposition method approach to solve the Helmholtz equation. Double sweep based approaches for overlapping decompositions are presented. In particular, we introduce an overlapping splitting double sweep (OSDS) method valid for any type of interface boundary conditions. Despite the fact that first order interface boundary conditions are used, the OSDS method demonstrates good stability properties with respect to the number of subdomains and the frequency even for heterogeneous media. In this context, convergence is improved when compared to the double sweep methods in [Nataf,1997] and [Vion et al., 2014 and 2016] for all of our test cases: waveguide, open cavity, and wedge problems.

## 7.9 Domain decomposition preconditioning for high frequency wave propagation problems

In the context of seismic imaging, frequency-domain full-waveform inversion (FWI) is suitable for long-offset stationary-recording acquisition, since reliable subsurface models can be reconstructed with a few frequencies and attenuation is easily implemented without computational overhead. In the frequency domain, wave modeling is a Helmholtz-type boundary-value problem which requires to solve a large and sparse system of linear equations per frequency with multiple right-hand sides (sources). This system can be solved with direct or iterative methods. While the former are suitable for FWI application on 3D dense OBC acquisitions covering spatial domains of moderate size, the later should be the approach of choice for sparse node acquisitions covering large domains (more than 50 millions of unknowns). Fast convergence of iterative solvers for Helmholtz problems remains however challenging in high frequency regime, due to the non definiteness of the Helmholtz operator and also to the discretization constraints in order to minimize the dispersion error for a given frequency, hence requiring efficient preconditioners.

For such wave propagation problems, we continued development of ORAS Schwarz domain decomposition preconditioners for finite difference and finite element discretizations for different 3D seismic benchmarks. We also studied the influence of the choice of the local solver on the convergence and computing time, in particular for problems with multiple right-hand sides using pseudo-block methods. For example, Table 1 gathers results comparing PARDISO, MUMPS and the Block Low-Rank (BLR) version of MUMPS as the local solver, for an acoustic simulation on a 20 km $×$ 102 km $×$ 28.4 km subset of the 3D GO_3D_OBS crustal geomodel, which has been designed to assess seismic imaging techniques for deep crustal exploration. The table reports computing times on 1344 cores for double and single precision arithmetics for one and 130 right-hand sides. The problem is discretized using finite differences with a wavelength-adaptive 27-point stencil, resulting in 67.5 million unknowns. We can see that using the Block Low-Rank (BLR) version of MUMPS as an approximate local solver gives the best computing time, even though 55 iterations are needed to reach convergence instead of 50 when using exact local solvers. Table 2 gathers weak and strong scaling results on the 3D GO_3D_OBS crustal geomodel with finite differences up to 10 Hz frequency, where the linear system of 1160.6 million unknowns is solved in 85.9 seconds on 9856 cores. The paper, which assesses the accuracy of finite-element and finite-difference discretizations for several seismic acoustic benchmarks as well as the efficiency of ORAS Schwarz domain decomposition preconditioners for the solution of the linear systems, is under review.

 1 RHS 130 RHS Solver arith. ortho. ${T}_{f}\left(s\right)$ #it ${T}_{s}\left(s\right)$ ${T}_{tot}\left(s\right)$ #it ${T}_{s}\left(s\right)$ ${T}_{tot}\left(s\right)$ PARDISO double MGS 12.1 48 16.5 28.6 50 724.4 736.5 PARDISO double CGS 12.1 48 16.7 28.8 50 575.5 587.6 MUMPS${}_{FR}$ double MGS 11.3 48 17.2 28.5 50 654.9 666.2 MUMPS${}_{FR}$ double CGS 11.3 48 16.9 28.2 50 482.2 493.5 PARDISO single MGS 6.5 48 9.9 16.4 50 403.6 410.1 PARDISO single MGS 6.5 48 9.6 16.1 50 334.8 341.3 MUMPS${}_{FR}$ single MGS 6.2 48 8.9 15.1 50 345.8 352.0 MUMPS${}_{FR}$ single CGS 6.2 48 8.9 15.1 50 275.3 281.5 MUMPS${}_{BLR}$ single CGS 4.9 53 7.2 12.1 55 256.8 261.7
 f(Hz) #dof(${10}^{6}$) #cores #it ovl ${T}_{f}\left(s\right)$ ${T}_{s}\left(s\right)$ ${T}_{tot}\left(s\right)$ ${E}_{w}$ 2.5 21.4 60 26 3 52.5 24.8 77.3 1 2.5 21.4 168 30 3 13.7 8.9 22.6 1.222 3.75 67.5 360 40 3 21.9 18.6 40.5 1.003 3.75 67.5 660 45 3 10.8 11.5 22.3 0.994 3.75 67.5 1344 53 3 4.4 7.1 11.5 0.947 3.75 67.5 2380 59 3 2.5 4.8 7.3 0.842 5 153.5 875 61 3 20.9 26.0 46.9 0.811 5 153.5 1344 63 3 12.4 18.6 31.0 0.798 7.5 500.1 2450 98 4 26.7 60.8 87.5 0.506 7.5 500.1 4752 108 4 12.2 39.7 51.9 0.439 10 1160.6 9856 142 5 15.1 70.8 85.9 0.297

## 7.10 Numerical resolution of the inverse source problem in electroencephalography

In this work, we consider a new numerical method for the electroencephalography(EEG) inverse source problem. The novelty is to take into account the heterogeneity of the skull, as the electric conductivity of the skull can hardly be assumed to be constant: (i) hard and spongy bones, especially for neonates (fontanels), (ii) conductivity models vary between individuals, and through time for a single person. The first step of the source reconstruction method is the "cortical mapping" procedure. It deals with the numerical resolution of both data completion and transmission problems for elliptic equations: (i) data completion on the scalp knowing partial measurements of the electric potential (from electrode measurements), (ii)data transmission from the scalp to the brain surface. To this end, we propose to apply the quasi-reversibility method which provides a regularized solution of Cauchy problems in a bounded domain. The problem is solved using the finite element method in the scalp and skull layers. The second step consists in using an efficient analytic method to retrieve the source (position and moment) from the Cauchy data on the brain surface, which is assumed to be spherical with constant conductivity. The first results are encouraging: For the classical three-layer spherical model (brain, skull, scalp), using synthetic data from 257 electrodes, we recover the source with 3.6% position error and moment correlation of 1. With 5% and 20% white Gaussian noise, the position error is 5.5% and 8.4% respectively, with moment correlation of 1.

## 7.11 Full-waveform redatuming via a TRAC approach

In 22, we consider the problem of redatuming. In inverse problems, redatuming data consists in virtually moving the sensors from the original acquisition location to an arbitrary position. This is an essential tool for target oriented inversion. An exact redatuming method which has the peculiarity to be robust with respect to noise is proposed. Our iterative method is based on the Time Reversal Absorbing Conditions (TRAC) approach and avoids the need for a regularization strategy. Numerical results and comparisons with other redatuming approaches illustrate the robustness of our method.

## 7.12 Parareal simulations of highly oscillatory differential equations

In 19 we proposed a new strategy for solving by the parareal algorithm highly oscillatory ordinary differential equations which are characteristics of a six-dimensional Vlasov equation. This consists in using for the coarse solving reduced models, obtained from two-scale asymptotic expansions. Such reduced models have a low computational cost since they are free of high oscillations. The parareal method allows to improve their accuracy in a few iterations through corrections by fine solvers of the full model. We demonstrate the accuracy and the efficiency of the strategy in numerical experiments of short time and long time simulations of charged particles submitted to a large magnetic field.

Then, we considered the case of Vlasov-Poisson systems involving several time scales. In this work, we propose a new reduced model derived from the two-scale convergence theory that we use in the parareal framework for the coarse solving. The models are numerically solved by particle in cell methods. We perform numerical experiments of long time simulations of a charged beam in a focusing channel and under the influence of a rapidly oscillating external electric field. On the basis of computing times, we provide an analysis of the efficiency of the parareal algorithm in terms of speedup.

## 7.13 Non-self adjoint impedance in Generalized Optimized Schwarz Methods

In this work we derived a full convergence theory for Optimized Schwarz Methods that rely on a non-local exchange operator and cover the case of coercive possibly non-self-adjoint impedance operators. This analysis also naturally deals with the presence of cross-points in subdomain partitions of arbitrary shape. In the particular case of self-adjoint impedance, we recover the theory that we previously proposed in [Claeys & Parolin, 2021].

## 7.14 Matrix form of nonlocal OSM for electromagnetics

We revisited the derivation of the domain decomposition introduced in [Claeys&Parolin,2020] in the acoustic setting, adopting matrix based notations and remaining as explicit as possible. In addition, we extended the analysis to cover Maxwell's equations and considered the case where inductance operates over thick regions of the computational domain.

# 8 Bilateral contracts and grants with industry

## 8.1 Bilateral contracts with industry

Participants: F. Nataf, A. Lechevallier ..

• Contract with IFPen, January 2021 - December 2023, that funds the Phd thesis of Antoine Lechevallier on "Accélération des simulations avec évènements ponctuels récurrents par apprentissage : application au stockage geologique du CO2" . Supervisor F. Nataf, S. Desroziers and T. Faney.

# 9 Partnerships and cooperations

## 9.1 European initiatives

### 9.1.1 FP7 & H2020 projects

#### EMC2, ERC Synergy project

• Title:
Extreme-scale Mathematically-based Computational Chemistry
• Duration:
Nov 2019 - April 2026
• Coordinators:
E. Cances, L. Grigori, Y. Maday, and J. P. Piquemal
• Partners:
• LJLL and LCT, Sorbonne Université (France)
• Cermics, ECOLE NATIONALE DES PONTS ET CHAUSSEES (France)
• Inria contact:
Laura Grigori
• Summary:
EMC2 is an ERC Synergy project that aims to overcome some of the current limitations in the field of molecular simulation and aims to provide academic communities and industrial companies with new generation, dramatically faster and quantitatively reliable molecular simulation software. This will enable those communities to address major technological and societal challenges of the 21st century in health, energy and the environment for instance.

## 9.2 National initiatives

### 9.2.1 ANR

#### B3DCMB

ANR Decembre 2017 - March 2022 This project is in the area of data analysis of cosmological data sets as collected by contemporary and forthcoming observatories. This is one of the most dynamic areas of modern cosmology. Our special target are data sets of Cosmic Microwave Background (CMB) anisotropies, measurements of which have been one of the most fruitful of cosmological probes. CMB photons are remnants of the very early evolution of the Universe and carry information about its physical state at the time when the Universe was much younger, hotter, and denser, and simpler to model mathematically. The CMB has been, and continues to be, a unique source of information for modern cosmology and fundamental physics. The main objective of this project is to empower the CMB data analysis with novel high performance tools and algorithms superior to those available today and which are capable of overcoming the existing performance gap.

Partners: AstroParticules et Cosmologie Paris 7 (PI R. Stompor), ENSAE Paris Saclay.

#### ANR Cine-Para

October 2015 - March 2021, Laura Grigori is Principal Coordinator for Inria Paris. Funding for Inria Paris is 145 Keuros. The funding for Inria is to combine Krylov subspace methods with parallel in time methods.

Partners: University Pierre and Marie Curie, J. L. Lions Laboratory (PI Y. Maday), CEA, Paris Dauphine University, Paris 13 University.

#### Muffin

: ANR appel à projet générique 2019.

S. Hirstoaga and P.-H. Tournier are members of the project MUFFIN, whose objective is to explore and optimize original computational scenarios for multi-scale and high dimensional transport codes, with priority applications in plasma physics. Several approximation methods are planed to be developed. It is at the frontier of computing and numerical analysis and intends to reduce the computational burden in the context of intensive calculation. Principal Investigator: B. Després (Sorbonne University).

# 10 Dissemination

## 10.1 Promoting scientific activities

### 10.1.2 Scientific events: selection

#### Member of the conference program committees

• L. Grigori member of PC of IPDPS 2021, ICPP 2021

### 10.1.3 Journal

#### Member of the editorial boards

• Laura Grigori, March 2014 – current. Member of the editorial board for the SIAM book series Software, Environments and Tools. See http://­bookstore.­siam.­org/­software-environments-and-tools/.
• Laura Grigori, January 2016 – current. Associate Editor, SIAM Journal on Scientific Computing.
• Laura Grigori, January 2017 – current. Associate Editor, SIAM Journal on Matrix Analysis and Applications.
• Laura Grigori, January 2016 – current. Editorial board, Numerical linear algebra with applications Journal, Wiley.

#### Reviewer - reviewing activities

• Members of Alpines are reviewers for many journals.

### 10.1.4 Invited talks

• S. Hirstoaga was invited at the minisymposium 399 (Numerical aspects of transport, Boltzmann and kinetic equations), 14th WCCM & Eccomas Virtual Congress, Paris, January 2021.
• S. Hirstoaga was invited at the workshop Numerical Methods for Kinetic Equations 'NumKin21', CIRM, Marseille, 14-18 June 2021.
• S. Hirstoaga gave a talk at the seminar EMC2 at LJLL, 17 December 2021.
• Xavier Claeys was invited at the workshop Numerical Waves, Univ. Nice labo. Jean Dieudonné, 15-17 June 2020.
• Xavier Claeys was invited at the online seminar of applied mathematics of the University of Padova on the 31th of march 2021.
• Xavier Claeys was invited at the online seminar of applied mathematics of TU Delft on the 6th of January 2021.
• Xavier Claeys was invited lecturer at the Zurich summer school organized by the university Zurich, 23rd - 27th of March 2021.
• Laura Grigori, Plenary lecture, 91st Annual Meeting of the International Association of Applied Mathematics and Mechanics (GAMM), March 15-19, 2021, virtal conference.
• Laura Grigori, Member of panel of Supercomputing 2021 conference, entitled “Heterogeneity in Hardware: Opportunities and Challenges for Software and Applications”.

### 10.1.5 Leadership within the scientific community

• Laura Grigori, member elected of SIAM Council, 1st term January 2018 - December 2020, 2nd term January 2021 - December 2024 (re-elected), the committee supervising the scientific activities of SIAM. Nominated by a Committee and elected by the members of SIAM.
• Laura Grigori, chair of the PRACE (Partnership for Advanced Computing in Europe) Scientific Steeri ng Committee, March 2020 - March 2021.
• Laura Grigori, Member of Scientific Committee of the Prace Fast Track Call for Proposals in support to mitigate impact of Covid 19, opened in March 2020.
• Laura Grigori is the coordinator of the High Performance in Scientific Computing Major of second year of Mathematics and Applications Master, Sorbonne University.

### 10.1.6 Scientific expertise

• Xavier Claeys was part of the Comité d'Expertise Scientifique no.46 (Modèles numériques, simulation, aplication) for Agence Nationale de la Recherche.
• Xavier Claeys was president of the hiring comity for a position of maître de conférence in Jacques-Louis Lions laboratory.
• Laura Grigori: November 2015 - current, expert to the Scientific Commission of IFPEN (French Petroleum Institute). Evaluation of research programs, PhD theses, work representing a total of 5 days per year.
• Laura Grigori, Member of HCERES committee for the evaluation of GeM and ICI departments of Ecole Centrale, Nantes, 2021.
• Laura Grigori, President of selection jury for ISFP/CRCN positions Inria Lille, 2021.
• Laura Grigori, Member of selection jury for Professor position UGA, Grenoble, 2021.
• Laura Grigori: November 2015 - current, expert to the Scientific Commission of IFPEN (French Petroleum Institute). Evaluation of research programs, PhD theses, work representing a total of 5 days per year.

## 10.2 Teaching - Supervision - Juries

### 10.2.1 Teaching

• Licence 3: S. Hirstoaga, Méthodes numériques, 53 HETD (cours + TP), EISE 3, Polytech Sorbonne.
• Master 1: Xavier Claeys, Initiation to C++, 36 hrs of programming tutorials in C++, SU.
• Master 1: Xavier Claeys, Computational Linear Algebra, 51 hrs of lectures, SU.
• Master 1: Xavier Claeys, Approximation of PDEs, 51 hrs of lectures on variationnal theory of PDEs, SU.
• Master 2: Xavier Claeys, PDEs from analysis to implementation, 24hrs of lectures on the C++ programming of a PDE solver, SU.
• Master 2: Laura Grigori, Winter 2020, Participation in the course on High Performance Computing given at Sorbonne University, Computer Science, intervention for 8 hours per year.
• Master 2: Laura Grigori, Course on High performance computing for numerical methods and data analysis, Master 2nd year, Mathematics & Applications, Sorbonne University, 24 hours per year.
• Master 2: Frédéric Nataf, Course on Domain Decomposition Methods, Sorbonne University, 22h.

### 10.2.2 Juries

• Xavier Claeys was reviewer on the PhD thesis of Damien Chicaud entitled "Analysis of time-harmonic electromagnetic problems in elliptic anisotropic media" supervised by Patrick Ciarlet and defended at ENSTA Paristech in Saclay on Dec. 7th 2021.

# 11 Scientific production

## 11.1 Major publications

• 1 articleX.X. Claeys. Essential spectrum of local multi-trace boundary integral operators.IMA J. Appl. Math.8162016, 961--983
• 2 articleX.X. Claeys and R.R. Hiptmair. Integral equations for electromagnetic scattering at multi-screens.Integral Equations Operator Theory8412016, 33--68
• 3 articleX.X. Claeys, R.R. Hiptmair and E.E. Spindler. Second kind boundary integral equation for multi-subdomain diffusion problems.Adv. Comput. Math.4352017, 1075--1101
• 4 articleJ. W.J. W. Demmel, L.L. Grigori, M.M. Hoemmen and J.J. Langou. Communication-optimal parallel and sequential QR and LU factorizations.SIAM Journal on Scientific Computing1short version of technical report UCB/EECS-2008-89 from 20082012, 206-239
• 5 bookV.Victorita Dolean, P.Pierre Jolivet and F.Frédéric Nataf. An Introduction to Domain Decomposition Methods: algorithms, theory and parallel implementation.SIAM2015
• 6 articleL.L. Grigori, J. W.J. W. Demmel and H.H. Xiang. CALU: a communication optimal LU factorization algorithm.SIAM Journal on Matrix Analysis and Applications322011, 1317-1350
• 7 articleL.L. Grigori, S.S. Moufawad and F.F. Nataf. Enlarged Krylov Subspace Conjugate Gradient methods for Reducing Communication.SIAM Journal on Matrix Analysis and Applications3722016, 744-773
• 8 articleR.R. Haferssas, P.P. Jolivet and F.F. Nataf. An Additive Schwarz Method Type Theory for Lions's Algorithm and a Symmetrized Optimized Restricted Additive Schwarz Method.SIAM J. Sci. Comput.3942017, A1345--A1365
• 9 articleF.F. Hecht. New development in FreeFem++.J. Numer. Math.203-42012, 251--265
• 10 miscP.Pierre Jolivet and F.Frédéric Nataf. HPDDM: High-Performance Unified framework for Domain Decomposition methods, MPI-C++ library.2014
• 11 unpublishedF.Frédéric Nataf and P.-H.Pierre-Henri Tournier. Adaptive Domain Decomposition method for Saddle Point problem in Matrix Form.November 2019, working paper or preprint
• 12 articleN.Nicole Spillane, V.Victorita Dolean, P.Patrice Hauret, F.Frédéric Nataf, C.Clemens Pechstein and R.Robert Scheichl. Abstract robust coarse spaces for systems of PDEs via generalized eigenproblems in the overlaps.Numer. Math.12642014, 741--770

## 11.2 Publications of the year

### International journals

• 13 articleH.Hussam Al Daas, L.Laura Grigori, P.Pascal Hénon and P.Philippe Ricoux. Recycling Krylov Subspaces and Truncating Deflation Subspaces for Solving Sequence of Linear Systems.ACM Transactions on Mathematical Software472April 2021, 1-30
• 14 articleH.Hussam Al Daas, L.Laura Grigori, P.Pierre Jolivet and P.-H.Pierre-Henri Tournier. A Multilevel Schwarz Preconditioner Based on a Hierarchy of Robust Coarse Spaces.SIAM Journal on Scientific Computing433May 2021, A1907–A1928
• 15 articleA.Ani Anciaux-Sedrakian, L.Laura Grigori, Z.Zakariae Jorti and S.Soleiman Yousef. An a posteriori-based adaptive preconditioner for controlling a local algebraic error norm.BIT Numerical Mathematics611March 2021, 209-235
• 16 articleAnalysis of the SORAS domain decomposition preconditioner for non-self-adjoint or indefinite problems.Journal of Scientific Computing892021
• 17 articleX.Xavier Claeys, L.Lorenzo Giacomel, R.Ralf Hiptmair and C.Carolina Urzúa-Torres. Quotient-space boundary element methods for scattering at complex screens.BIT Numerical MathematicsMarch 2021
• 18 articleV.Virginie Ehrlacher, L.Laura Grigori, D.Damiano Lombardi and H.Hao Song. Adaptive hierarchical subtensor partitioning for tensor compression.SIAM Journal on Scientific Computing2021
• 19 articleL.Laura Grigori, S. A.Sever Adrian Hirstoaga, V.-T.Van-Thanh Nguyen and J.Julien Salomon. Reduced model-based parareal simulations of oscillatory singularly perturbed ordinary differential equations.Journal of Computational Physics4362021, 110282
• 20 articleN.Núria López, L.Luigi Del Debbio, M.Marc Baaden, M.Matej Praprotnik, L.Laura Grigori, C.Catarina Simões, S.Serge Bogaerts, F.Florian Berberich, T.Thomas Lippert, J.Janne Ignatius, P.Philippe Lavocat, O.Oriol Pineda, M. G.Maria Grazia Giuffreda, S.Sergi Girona, D.Dieter Kranzlmüller, M. M.Michael M Resch, G.Gabriella Scipione and T.Thomas Schulthess. Lessons learned from urgent computing in Europe: Tackling the COVID-19 pandemic.Proceedings of the National Academy of Sciences of the United States of America 11846November 2021, e2024891118

### Doctoral dissertations and habilitation theses

• 21 thesisI.Igor Chollet. Symmetries and Fast Multipole Methods for Oscillatory Kernels.Sorbonne UniversitéMarch 2021

### Reports & preprints

• 22 miscF.Franck Assous and F.Frédéric Nataf. Full-waveform redatuming via a TRAC approach: a first step towards target oriented inverse problem.January 2021
• 23 miscO.Oleg Balabanov and L.Laura Grigori. Randomized Gram-Schmidt process with application to GMRES.January 2021
• 24 miscO.Oleg Balabanov and L.Laura Grigori. Randomized block Gram-Schmidt process for solution of linear systems and eigenvalue problems.December 2021
• 25 miscCommunication avoiding low rank approximation based on QR with tournament pivoting.January 2021
• 26 miscN.Nacime Bouziani, H.Henri Calandra and F.Frédéric Nataf. An overlapping splitting double sweep method for the Helmholtz equation.January 2021
• 27 miscX.Xavier Claeys and E.Emile Parolin. Robust treatment of cross points in Optimized Schwarz Methods.January 2021
• 28 miscL.Laura Grigori and S.Suraj Kumar. Parallel Tensor Train through Hierarchical Decomposition.February 2021
• 29 miscF.Frédéric Hecht and S.-M.Sidi-Mahmoud Kaber. Approximation parallèle de l'exponentielle de matrices hermitiennes.April 2021
• 30 miscF.Frédéric Nataf and P.-H. H.Pierre-Henri H Tournier. A GenEO Domain Decomposition method for Saddle Point problems.April 2021

## 11.3 Cited publications

• 31 articleF.F Assous, M.M Kray, F.F Nataf and E.E Turkel. Time-reversed absorbing condition: application to inverse problems.Inverse Problems2762011, 065003
• 32 articleM. J.Marcus J. Grote, M.Marie Kray, F.Frédéric Nataf and F.Franck Assous. Time-dependent wave splitting and source separation.Journal of Computational Physics3302017, 981 - 996