Communication-optimal parallel and sequential QR and LU factorizations

ALPINES Algorithms and parallel tools for integrated numerical simulations

Distributed and High Performance Computing

Networks, Systems and Services, Distributed Computing

http://team.inria.fr/alpines Laboratoire Jacques-Louis Lions (LJLL) CNRS, Sorbonne Université Project-Team A6.1.1. - Continuous Modeling (PDE, ODE) A6.1.4. - Multiscale modeling A6.1.5. - Multiphysics modeling A6.2.1. - Numerical analysis of PDE and ODE A6.2.5. - Numerical Linear Algebra A6.2.7. - High performance computing A6.3. - Computation-data interaction A6.3.1. - Inverse problems A7.1. - Algorithms B3.3.1. - Earth and subsoil B9.5.2. - Mathematics B9.5.3. - Physics Inria Paris Center at Sorbonne University Laura Grigori Chercheur Team leader, INRIA, Senior Researcher oui Sever Hirstoaga Chercheur INRIA, Researcher oui Frédéric Nataf Chercheur CNRS, Senior Researcher oui Pierre-Henri Tournier Chercheur SORBONNE UNIVERSITE, Researcher Xavier Claeys Enseignant Sorbonne University, until Apr 2022 oui Frédéric Hecht Enseignant SORBONNE UNIVERSITE, Professor oui Oleg Balabanov PostDoc INRIA, until Jun 2022 Suraj Kumar PostDoc Inria, until Oct 2022 Siwar Badreddine PhD INRIA Matthias Beaupere PhD INRIA Jean-Guillaume De Damas PhD Inria, from Sep 2022 Antoine Lechevallier PhD IFPEN Van-Thanh Nguyen PhD Inria, until May 2022 Edouard Timsit PhD INRIA Niels Guilbert Technique Inria, Engineer Victor Lederer Technique Inria, Engineer Daniel Torres Gonzales Technique Sorbonne University, Engineer Laurence Bourcier Assistant INRIA Overall objectives Introduction

The focus of our research is on the development of novel parallel numerical algorithms and tools appropriate for state-of-the-art mathematical models used in complex scientific applications, and in particular numerical simulations. The proposed research program is by nature multi-disciplinary, interweaving aspects of applied mathematics, computer science, as well as those of several specific applications, as porous media flows, elasticity, wave propagation in multi-scale media, molecular simulations, and inverse problems.

Our first objective is to develop numerical methods and tools for complex scientific and industrial applications that will enhance their scalable execution on the emergent heterogeneous hierarchical models of massively parallel machines. Our second objective is to integrate the novel numerical algorithms into a middle-layer that will hide as much as possible the complexity of massively parallel machines from the users of these machines.

Research program Overview

The research described here is directly relevant to several steps of the numerical simulation chain. Given a numerical simulation that was expressed as a set of differential equations, our research focuses on mesh generation methods for parallel computation, novel numerical algorithms for linear algebra, as well as algorithms and tools for their efficient and scalable implementation on high performance computers. The validation and the exploitation of the results is performed with collaborators from applications and is based on the usage of existing tools. In summary, the topics studied in our group are the following:

Numerical methods and algorithms

Mesh generation for parallel computation

Solvers for numerical linear algebra:

domain decomposition methods, preconditioning for iterative methods

Computational kernels for numerical linear algebra

Tensor computations for high dimensional problems

Validation on numerical simulations and other numerical applications

Domain specific language - parallel FreeFem++

In the engineering, researchers, and teachers communities, there is a strong demand for simulation frameworks that are simple to install and use, efficient, sustainable, and that solve efficiently and accurately complex problems for which there are no dedicated tools or codes available. In our group we develop FreeFem++, a user-dedicated language for solving PDEs. The goal of FreeFem++ is not to be a substitute for complex numerical codes, but rather to provide an efficient and relatively generic tool for:

getting a quick answer to a specific problem,

prototyping the resolution of a new complex problem.

The current users of FreeFem++ are mathematicians, engineers, university professors, and students. In general for these users the installation of public libraries as MPI, MUMPS, Ipopt, Blas, lapack, OpenGL, fftw, scotch, PETSc, SLEPc is a very difficult problem. For this reason, the authors of FreeFem++ have created a user friendly language, and over years have enriched its capabilities and provided tools for compiling FreeFem++ such that the users do not need to have special knowledge of computer science. This leads to an important work on porting the software on different emerging architectures.

Today, the main components of parallel FreeFem++ are:

definition of a coarse grid,

splitting of the coarse grid,

mesh generation of all subdomains of the coarse grid, and construction of parallel data structures for vectors and sparse matrices from the mesh of the subdomain,

discretization by a chosen numerical method,

call to a linear solver,

analysis of the result.

All these components are parallel, except for the last point which is not in the focus of our research. However for the moment, the parallel mesh generation algorithm is very simple and not sufficient, for example it addresses only polygonal geometries. Having a better parallel mesh generation algorithm is one of the goals of our project. In addition, in the current version of FreeFem++, the parallelism is not hidden from the user, it is done through direct calls to MPI. Our goal is also to hide all the MPI calls in the specific language part of FreeFem++. In addition to these in-house domain decomposition methods, FreeFem++ is also linked to PETSc solvers which enables an easy use of third parties parallel multigrid methods.

Solvers for numerical linear algebra

Iterative methods are widely used in industrial applications, and preconditioning is the most important research subject here. Our research considers domain decomposition methods and iterative methods and its goal is to develop solvers that are suitable for parallelism and that exploit the fact that the matrices are arising from the discretization of a system of PDEs on unstructured grids.

One of the main challenges that we address is the lack of robustness and scalability of existing methods as incomplete LU factorizations or Schwarz-based approaches, for which the number of iterations increases significantly with the problem size or with the number of processors. This is often due to the presence of several low frequency modes that hinder the convergence of the iterative method. To address this problem, we study different approaches for dealing with the low frequency modes as coarse space correction in domain decomposition or deflation techniques.

We also focus on developing boundary integral equation methods that would be adapted to the simulation of wave propagation in complex physical situations, and that would lend themselves to the use of parallel architectures. The final objective is to bring the state of the art on boundary integral equations closer to contemporary industrial needs. From this perspective, we investigate domain decomposition strategies in conjunction with boundary element method as well as acceleration techniques (H-matrices, FMM and the like) that would appear relevant in multi-material and/or multi-domain configurations. Our work on this topic also includes numerical implementation on large scale problems, which appears as a challenge due to the peculiarities of boundary integral equations.

Computational kernels for numerical linear and multilinear algebra

The design of new numerical methods that are robust and that have well proven convergence properties is one of the challenges addressed in Alpines. Another important challenge is the design of parallel algorithms for the novel numerical methods and the underlying building blocks from numerical linear algebra. The goal is to enable their efficient execution on a diverse set of node architectures and their scaling to emerging high-performance clusters with an increasing number of nodes.

Increased communication cost is one of the main challenges in high performance computing that we address in our research by investigating algorithms that minimize communication, as communication avoiding algorithms. The communication avoiding algorithmic design is an approach originally developed in our group since more than ten years (initially in collaboration with researchers from UC Berkeley and CU Denver). While our first results concerned direct methods of factorization as LU or QR factorizations, our recent work focuses on designing robust algorithms for computing the low rank approximation of a matrix or a tensor. We focus on both deterministic and randomized approaches.

Our research also focuses on solving problems of large size that feature high dimensions as in molecular simulations. The data in this case is represented by objects called tensors, or multilinear arrays. The goal is to design novel tensor techniques to allow their effective compression, i.e. their representation by simpler objects in small dimensions, while controlling the loss of information. The algorithms are aiming to being highly parallel to allow to deal with the large number of dimensions and large data sets, while preserving the required information for obtaining the solution of the problem.

Application domains Compositional multiphase Darcy flow in heterogeneous porous media

We study the simulation of compositional multiphase flow in porous media with different types of applications, and we focus in particular on reservoir/bassin modeling, and geological CO2 underground storage. All these simulations are linearized using Newton approach, and at each time step and each Newton step, a linear system needs to be solved, which is the most expensive part of the simulation. This application leads to some of the difficult problems to be solved by iterative methods. This is because the linear systems arising in multiphase porous media flow simulations cumulate many difficulties. These systems are non-symmetric, involve several unknowns of different nature per grid cell, display strong or very strong heterogeneities and anisotropies, and change during the simulation. Many researchers focus on these simulations, and many innovative techniques for solving linear systems have been introduced while studying these simulations, as for example the nested factorization [Appleyard and Cheshire, 1983, SPE Symposium on Reservoir Simulation].

Inverse problems

We focus on methods related to the blend of time reversal techniques and absorbing boundary conditions (ABC) used in a non standard way. Since the seminal paper by [M. Fink et al., Imaging through inhomogeneous media using time reversal mirrors. Ultrasonic Imaging, 13(2):199, 1991.], time reversal is a subject of very active research. The principle is to back-propagate signals to the sources that emitted them. The initial experiment was to refocus, very precisely, a recorded signal after passing through a barrier consisting of randomly distributed metal rods. In [de Rosny and Fink. Overcoming the diffraction limit in wave physics using a time-reversal mirror and a novel acoustic sink. Phys. Rev. Lett., 89 (12), 2002], the source that created the signal is time reversed in order to have a perfect time reversal experiment. In 27, we improve this result from a numerical point of view by showing that it can be done numerically without knowing the source. This is done at the expense of not being able to recover the signal in the vicinity of the source. In 28, time dependent wave splitting is performed using ABC and time reversal techniques. We now work on extending these methods to non uniform media.

All our numerical simulations are performed in FreeFem++ which is very flexible. As a byproduct, it enables us to have an end user point of view with respect to FreeFem++ which is very useful for improving it.

Numerical methods for wave propagation in multi-scale media

We are interested in the development of fast numerical methods for the simulation of electromagnetic waves in multi-scale situations where the geometry of the medium of propagation may be described through caracteristic lengths that are, in some places, much smaller than the average wavelength. In this context, we propose to develop numerical algorithms that rely on simplified models obtained by means of asymptotic analysis applied to the problem under consideration.

Here we focus on situations involving boundary layers and localized singular perturbation problems where wave propagation takes place in media whose geometry or material caracteristics are submitted to a small scale perturbation localized around a point, or a surface, or a line, but not distributed over a volumic sub-region of the propagation medium. Although a huge literature is already available for the study of localized singular perturbations and boundary layer pheneomena, very few works have proposed efficient numerical methods that rely on asymptotic modeling. This is due to their functional framework that naturally involves singular functions, which are difficult to handle numerically. The aim of this part of our reasearch is to develop and analyze numerical methods for singular perturbation methods that are prone to high order numerical approximation, and robust with respect to the small parameter characterizing the singular perturbation.

Data analysis in astrophysics

We focus on computationally intensive numerical algorithms arising in the data analysis of current and forthcoming Cosmic Microwave Background (CMB) experiments in astrophysics. This application is studied in collaboration with researchers from University Paris Diderot, and the objective is to make available the algorithms to the astrophysics community, so that they can be used in large experiments.

In CMB data analysis, astrophysicists produce and analyze multi-frequency 2D images of the universe when it was 5% of its current age. The new generation of the CMB experiments observes the sky with thousands of detectors over many years, producing overwhelmingly large and complex data sets, which nearly double every year therefore following Moore's Law. Planck () is a keystone satellite mission which has been developed under auspices of the European Space Agency (ESA).

Molecular simulations

Molecular simulation is one of the most dynamic areas of scientific computing. Its field of application is very broad, ranging from theoretical chemistry and drug design to materials science and nanotechnology. It provides many challenging problems to mathematicians, and computer scientists.

In the context of the ERC Synergy Grant EMC2 we address several important limitations of state of the art molecular simulation. In particular, the simulation of very large molecular systems, or smaller systems in which electrons interact strongly with each other, remains out of reach today. In an interdisciplinary collaboration between chemists, mathematicians and computer scientists, we focus on developing a new generation of reliable molecular simulation algorithms and software.

New software and platforms New software FreeFem++ Name:

FeeFrem++

Scientific Description:

FreeFem++ is a partial differential equation solver. It has its own language. freefem scripts can solve multiphysics non linear systems in 2D and 3D.

Problems involving PDE (2d, 3d) from several branches of physics such as fluid-structure interactions require interpolations of data on several meshes and their manipulation within one program. FreeFem++ includes a fast 2d̂-tree-based interpolation algorithm and a language for the manipulation of data on multiple meshes (as a follow up of bamg (now a part of FreeFem++ ).

FreeFem++ is written in C++ and the FreeFem++ language is a C++ idiom. It runs on Macs, Windows, Unix machines. FreeFem++ replaces the older freefem and freefem+.

Functional Description:

FreeFem++ is a PDE (partial differential equation) solver based on a flexible language that allows a large number of problems to be expressed (elasticity, fluids, etc) with different finite element approximations on different meshes.

URL:

http://www.freefem.org/ff++/

Contact:

Frédéric Hecht

Partner:

UPMC

HPDDM Scientific Description:

HPDDM is an efficient implementation of various domain decomposition methods (DDM) such as one- and two-level Restricted Additive Schwarz methods, the Finite Element Tearing and Interconnecting (FETI) method, and the Balancing Domain Decomposition (BDD) method. This code has been proven to be efficient for solving various elliptic problems such as scalar diffusion equations, the system of linear elasticity, but also frequency domain problems like the Helmholtz equation. A comparison with modern multigrid methods can be found in the thesis of Pierre Jolivet.

Functional Description:

URL:

https://github.com/hpddm

Contact:

Pierre Jolivet

Participants:

Frédéric Nataf, Pierre Jolivet

preAlps Name:

Preconditioned enlarged Krylov subspace methods

Keywords:

Linear Systems Solver, Preconditioner

Functional Description:

Contains enlarged Conjugate Gradient Krylov subspace method and Lorasc preconditioner.

URL:

https://github.com/NLAFET/preAlps

Contact:

Laura Grigori

HTool Keyword:

Hierarchical matrices

Functional Description:

HTool is a C++ header-only library implementing compression techniques (e.g. Adaptive Cross Approximation) using hierarchical matrices (H-matrices). The library uses MPI and OpenMP for parallelism, and is interfaced with HPDDM for the solution of linear systems.

URL:

https://github.com/htool-ddm/htool

Contact:

Pierre Marchand

BemTool Keyword:

Boundary element method

Functional Description:

BemTool is a C++ header-only library implementing the boundary element method (BEM) for the discretisation of the Laplace, Helmholtz and Maxwell equations, in 2D and 3D. Its main purpose is the assembly of classical boundary element matrices, which can be compressed and inverted through its interface with the HTool library.

URL:

https://github.com/xclaeys/BemTool

Contact:

Xavier Claeys

New platforms Alpinesmembers New results Alpinesmembers Randomized block Gram-Schmidt process for solution of linear systems and eigenvalue problems

In 9 a randomized Gram-Schmidt algorithm is developed for orthonormalization of high-dimensional vectors or QR factorization. The proposed process can be less computationally expensive than the classical Gram-Schmidt process while being at least as numerically stable as the modified Gram-Schmidt process. Our approach is based on random sketching, which is a dimension reduction technique consisting in estimation of inner products of high-dimensional vectors by inner products of their small efficiently-computable random projections, so-called sketches. This allows to perform the projection step in Gram-Schmidt process on sketches rather than high-dimensional vectors with a minor computational cost. This also provides an ability to efficiently certify the output. The proposed Gram-Schmidt algorithm can provide computational cost reduction in any architecture. The benefit of random sketching can be amplified by exploiting multi-precision arithmetic. We provide stability analysis for multi-precision model with coarse unit roundoff for standard high-dimensional operations. Numerical stability is proven for the unit roundoff independent of the (high) dimension of the problem. The proposed Gram-Schmidt process can be applied to Arnoldi iteration and result in new Krylov subspace methods for solving high-dimensional systems of equations or eigenvalue problems. Among them we chose randomized GMRES method as a practical application of the methodology.

In 15 we introduce a novel structured random matrix composed blockwise from subsampled randomized Hadamard transforms (SRHTs). The block SRHT is expected to outperform well-known dimension reduction maps, including SRHT and Gaussian matrices, on distributed architectures with not too many cores compared to the dimension. We prove that a block SRHT with enough rows is an oblivious subspace embedding, i.e., an approximate isometry for an arbitrary low-dimensional subspace with high probability. Our estimate of the required number of rows is similar to that of the standard SRHT. This suggests that the two transforms should provide the same accuracy of approximation in the algorithms. The block SRHT can be readily incorporated into randomized methods, for instance to compute a low-rank approximation of a large-scale matrix. For completeness, we revisit some common randomized approaches for this problem such as Randomized Singular Value Decomposition, , with a discussion of their accuracy and implementation on distributed architectures.

Factorized structure of the long-range two-electron integrals tensor and its application in quantum chemistry

In 14 we introduce two new approximation methods for the numerical evaluation of the long-range Coulomb potential and the approximation of the resulting high dimensional Two-Electron Integrals tensor (TEI) with long-range interactions arising in molecular simulations. The first method exploits the tensorized structure of the compressed two-electron integrals obtained through two-dimensional Chebyshev interpolation combined with Gaussian quadrature. The second method is based on the Fast Multipole Method (FMM). Numerical experiments for different medium size molecules on high quality basis sets outline the efficiency of the two methods. Detailed algorithmic is provided in this paper as well as numerical comparison of the introduced approaches.

A Directional Equispaced interpolation-based Fast Multipole Method for oscillatory kernels

Fast Multipole Methods (FMMs) based on the oscillatory Helmholtz kernel can reduce the cost of solving N-body problems arising from Boundary Integral Equations (BIEs) in acoustic or electromagnetics. However, their cost strongly increases in the high-frequency regime. In 20 introduces a new directional FMM for oscillatory kernels (defmm - directional equispaced interpolation-based fmm), whose precomputation and application are FFT-accelerated due to polynomial interpolations on equispaced grids. We demonstrate the consistency of our FFT approach, and show how symmetries can be exploited in the Fourier domain. We also describe the algorithmic design of defmm, well-suited for the BIE non-uniform particle distributions, and present performance optimizations on one CPU core. Finally, we exhibit important performance gains on all test cases for defmm over a state-of-the-art FMM library for oscillatory kernels.

Adaptive Domain Decomposition method for Saddle Point problem

In 6, we introduce an adaptive domain decomposition (DD) method for solving saddle point problems defined as a block two by two matrix. The algorithm does not require any knowledge of the constrained space. We assume that all sub matrices are sparse and that the diagonal blocks are the sum of positive semi definite matrices. The latter assumption enables the design of adaptive coarse space for DD methods. Numerical results on three dimensional elasticity problems on steel-rubber structures discretized with 1 billion degrees of freedom are shown. comparisons with direct and multigrid solvers are also shown.

Two-level DDM preconditioners for positive Maxwell equations

In 18, we develop and analyse domain decomposition methods for linear systems of equations arising from conforming finite element discretisations of positive Maxwell-type equations. Convergence of domain decomposition methods rely heavily on the efficiency of the coarse space used in the second level. We design adaptive coarse spaces that complement the near-kernel space made of the gradient of scalar functions. This extends the results in 29 to the variable coefficient case and non-convex domains at the expense of a larger coarse space.

A Unified Framework for Double Sweep Methods for the Helmholtz Equation

In 19, we consider sweeping domain decomposition preconditioners to solve the Helmholtz equation in the case of stripwise domain decomposition with or without overlaps. We unify their derivation and convergence studies as Jacobi, Gauss-Seidel or Symmetric Gauss-Seidel for different numbering of the unknowns. This enables the theoretical comparisons of the double sweep methods in Nataf et al. (1997) with that of Vion et al. (2014). It also makes possible the introduction of two new sweeping algorithms. We provide numerical test cases that assess the validity of the theoretical studies.

Domain decomposition preconditioning for high frequency wave propagation problems

In the context of seismic imaging, frequency-domain full-waveform inversion (FWI) is suitable for long-offset stationary-recording acquisition, since reliable subsurface models can be reconstructed with a few frequencies and attenuation is easily implemented without computational overhead. In the frequency domain, wave modeling is a Helmholtz-type boundary-value problem which requires to solve a large and sparse system of linear equations per frequency with multiple right-hand sides (sources). This system can be solved with direct or iterative methods. While the former are suitable for FWI application on 3D dense OBC acquisitions covering spatial domains of moderate size, the later should be the approach of choice for sparse node acquisitions covering large domains (more than 50 millions of unknowns). Fast convergence of iterative solvers for Helmholtz problems remains however challenging in high frequency regime, due to the non definiteness of the Helmholtz operator and also to the discretization constraints in order to minimize the dispersion error for a given frequency, hence requiring efficient preconditioners.

For such wave propagation problems, we continued development of ORAS Schwarz domain decomposition preconditioners for finite difference and finite element discretizations for different 3D seismic benchmarks. We also studied the influence of the choice of the local solver on the convergence and computing time, in particular for problems with multiple right-hand sides using pseudo-block methods. For example, Table 1 gathers results comparing PARDISO, MUMPS and the Block Low-Rank (BLR) version of MUMPS as the local solver, for an acoustic simulation on a 20 km $\times$ 102 km $\times$ 28.4 km subset of the 3D GO_3D_OBS crustal geomodel, which has been designed to assess seismic imaging techniques for deep crustal exploration. The table reports computing times on 1344 cores for double and single precision arithmetics for one and 130 right-hand sides. The problem is discretized using finite differences with a wavelength-adaptive 27-point stencil, resulting in 67.5 million unknowns. We can see that using the Block Low-Rank (BLR) version of MUMPS as an approximate local solver gives the best computing time, even though 55 iterations are needed to reach convergence instead of 50 when using exact local solvers. Table 2 gathers weak and strong scaling results on the 3D GO_3D_OBS crustal geomodel with finite differences up to 10 Hz frequency, where the linear system of 1160.6 million unknowns is solved in 85.9 seconds on 9856 cores. The paper, which assesses the accuracy of finite-element and finite-difference discretizations for several seismic acoustic benchmarks as well as the efficiency of ORAS Schwarz domain decomposition preconditioners for the solution of the linear systems, is published in Geophysics 13.

Numerical resolution of the inverse source problem in electroencephalography

In 21, we consider a new numerical method for the electroencephalography(EEG) inverse source problem. The novelty is to take into account the heterogeneity of the skull, as the electric conductivity of the skull can hardly be assumed to be constant: (i) hard and spongy bones, especially for neonates (fontanels), (ii) conductivity models vary between individuals, and through time for a single person. The first step of the source reconstruction method is the "cortical mapping" procedure. It deals with the numerical resolution of both data completion and transmission problems for elliptic equations: (i) data completion on the scalp knowing partial measurements of the electric potential (from electrode measurements), (ii)data transmission from the scalp to the brain surface. To this end, we propose to apply the quasi-reversibility method which provides a regularized solution of Cauchy problems in a bounded domain. The problem is solved using the finite element method in the scalp and skull layers. The second step consists in using an efficient analytic method to retrieve the source (position and moment) from the Cauchy data on the brain surface, which is assumed to be spherical with constant conductivity. For the classical three-layer spherical model (brain, skull, scalp), using synthetic data from 257 electrodes, we recover the source with 3.6% position error and moment correlation of 1. With 5% and 20% white Gaussian noise, the position error is 5.5% and 8.4% respectively, with moment correlation of 1. Numerical results are also presented for sets of 33 and 61 electrodes, for two distant and two close sources, and for a variable skull conductivity with the presence of the anterior fontanel, as well as a cerebrospinal fluid (CSF) layer.

Radiative transfer for variable 3D atmospheres

The radiative transfer equations have three spatial dimensions, two radiational dimensions, one frequency variable, and possibly time. Here, without loss of generality, they are reduced to a nonlinear integro-differential system and solved by an iterative scheme which is shown to be monotone and convergent. At each step, two large integrals of convolution type need to be computed. The kernel matrices are assembled in compressed form using Hierarchical matrices, with the help of the H-Matrix C++ parallel library Htool 5.1.4. The complexity of the resulting method is of complexity n log(n) where n is the number of vertices of the physical domain. The method is applied to the temperature and radiation in the valley of Chamonix, the challenge being the large variation of density, altitude and light reflection due to snow and possibly clouds. More details can be found in REF.

New developments in FreeFEM

New features were developed in the FreeFEM software thanks to the work of Jacques Morice (INRIA ADT engineer March 2022-2023):

The Boundary Element Method (BEM) can now be used to solve Maxwell equations through the Electric Field Integral Equations (EFIE).

The definition of the variational problems has been generalized through the introduction of a 'composite' type of finite element space which may have components defined on different meshes and even different mesh types. This allows to easily define several classes of general coupled problems such as FEM-BEM coupling problems with only one variational formulation.

Work has been started to ease the parallelization of a user's sequential code. In this setting, the parallelization of the assembly and solution steps is entirely hidden to the user, and a sequential code can be parallelized with only very few changes in the script. For example, PETSc can easily be used to solve the linear system stemming from a coupled problem with a fieldsplit preconditioner.

Modelisation and simulation of highly oscillatory Vlasov-Poisson equations

In this part we are concerned with solving Vlasov-Poisson systems involving several time scales.

First, we are interested with the parareal algorithm which is a method performing parallel computing in time for an efficient numerical solving of a very large class of time dependent equations. In 25, we propose a new reduced model derived from the two-scale convergence theory that we use in the parareal framework for the coarse solving. The parareal method allows to improve their accuracy in a few iterations through corrections by fine solvers of the full model. The models are numerically solved by particle in cell methods. We perform numerical experiments of long time simulations of a charged beam in a focusing channel and under the influence of a rapidly oscillating external electric field. On the basis of computing times, we provide an analysis of the efficiency of the parareal algorithm in terms of speedup. An ongoing work is the parallelization of the previous code in order to study the efficiency of the strategy. The issue is to analyze the distributed vs. shared memory paradigms and to address the problem of optimizing the memory footprint of the code.

Second, in 22 we obtained an original application of the method of moments to the Vlasov-Poisson system with non constant strong magnetic field in three dimensions of space. Using basis functions which are aligned with the magnetic field, one obtains a Friedrichs system where the kernel of the singular part is made explicit. A projection of the original model on this kernel yields what we call the reduced model. Basic numerical tests of the field illustrate the accuracy of our implementation.

Bilateral contracts and grants with industry Bilateral contracts with industry F.NatafA.Lechevallier

Contract with IFPen, January 2021 - December 2023, that funds the Phd thesis of Antoine Lechevallier on "Accélération des simulations avec évènements ponctuels récurrents par apprentissage : application au stockage geologique du CO2" . Supervisor F. Nataf, S. Desroziers and T. Faney.

Partnerships and cooperations Alpinesmembers International research visitors Visits to international teams Research stays abroad Laura Grigori Visited institution:

UC Berkeley

Country:

USA

Dates:

Sept 2022 - May 2023

Context of the visit:

Visiting Professor

Mobility program/type of mobility:

none

H2020 projects EMC2

EMC2 project on cordis.europa.eu

Title:

Extreme-scale Mathematically-based Computational Chemistry

Duration:

From September 1, 2019 to February 28, 2026

Partners:

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET AUTOMATIQUE (INRIA), France

ECOLE NATIONALE DES PONTS ET CHAUSSEES (ENPC), France

CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE CNRS (CNRS), France

SORBONNE UNIVERSITE, France

Inria contact:

Laura GRIGORI (Alpines)

Coordinator:

Summary:

Molecular simulation has become an instrumental tool in chemistry, condensed matter physics, molecular biology, materials science, and nanosciences. It will allow to propose de novo design of e.g. new drugs or materials provided that the efficiency of underlying software is accelerated by several orders of magnitude.

The ambition of the EMC2 project is to achieve scientific breakthroughs in this field by gathering the expertise of a multidisciplinary community at the interfaces of four disciplines: mathematics, chemistry, physics, and computer science. It is motivated by the twofold observation that, i) building upon our collaborative work, we have recently been able to gain efficiency factors of up to 3 orders of magnitude for polarizable molecular dynamics in solution of multi-million atom systems, but this is not enough since ii) even larger or more complex systems of major practical interest (such as solvated biosystems or molecules with strongly-correlated electrons) are currently mostly intractable in reasonable clock time. The only way to further improve the efficiency of the solvers, while preserving accuracy, is to develop physically and chemically sound models, mathematically certified and numerically efficient algorithms, and implement them in a robust and scalable way on various architectures (from standard academic or industrial clusters to emerging heterogeneous and exascale architectures).

EMC2 has no equivalent in the world: there is nowhere such a critical number of interdisciplinary researchers already collaborating with the required track records to address this challenge. Under the leadership of the 4 PIs, supported by highly recognized teams from three major institutions in the Paris area, EMC2 will develop disruptive methodological approaches and publicly available simulation tools, and apply them to challenging molecular systems. The project will strongly strengthen the local teams and their synergy enabling decisive progress in the field.

National initiatives ANR B3DCMB

ANR Decembre 2017 - March 2022 This project is in the area of data analysis of cosmological data sets as collected by contemporary and forthcoming observatories. This is one of the most dynamic areas of modern cosmology. Our special target are data sets of Cosmic Microwave Background (CMB) anisotropies, measurements of which have been one of the most fruitful of cosmological probes. CMB photons are remnants of the very early evolution of the Universe and carry information about its physical state at the time when the Universe was much younger, hotter, and denser, and simpler to model mathematically. The CMB has been, and continues to be, a unique source of information for modern cosmology and fundamental physics. The main objective of this project is to empower the CMB data analysis with novel high performance tools and algorithms superior to those available today and which are capable of overcoming the existing performance gap.

Partners: AstroParticules et Cosmologie Paris 7 (PI R. Stompor), ENSAE Paris Saclay.

Muffin

: ANR appel à projet générique 2019.

S. Hirstoaga and P.-H. Tournier are members of the project MUFFIN, whose objective is to explore and optimize original computational scenarios for multi-scale and high dimensional transport codes, with priority applications in plasma physics. Several approximation methods are planed to be developed. It is at the frontier of computing and numerical analysis and intends to reduce the computational burden in the context of intensive calculation. Principal Investigator: B. Després (Sorbonne University).

Dissemination Alpinesmembers Promoting scientific activities Scientific events: organisation

Organization of FreeFEM days, 14th edition, December 8 and 9, 2022, Sorbonne University.

Scientific events: selection Chair of conference program committees

L. Grigori, Area chair, Algorithms Track, Cluster 2022.

L. Grigori, Algorithm Area Co-Chair, International Conference on Parallel Processing ICPP 2022 conference.

L. Grigori, Track Leader for Applications and Algorithms, International Supercomputing Conference (ISC) 2022.

Journal Member of the editorial boards

L. Grigori, Member of the editorial board for the SIAM book series Software, Environments and Tools.

L. Grigori, Associate Editor, SIAM Journal on Scientific Computing.

L. Grigori, Associate Editor, SIAM Journal on Matrix Analysis and Applications.

L. Grigori, Editorial board, Numerical linear algebra with applications Journal, Wiley.

Invited talks

L. Grigori, Plenary Invited Speaker, Model Reduction and Surrogate Modelling Conference ß(MORE), Sept 19-23, 2022.

S. Hirstoaga gave a talk at PinT 2022: 11th Workshop on Parallel-in-Time Integration, CIRM Marseille, 11-16 July 2022.

S. Hirstoaga was invited at XVème Colloque Franco-Roumain de mathematiques appliquées, 1st September 2022, Toulouse.

S. Hirstoaga was invited at the seminar Analyse numérique et EDP, Laboratoire Paul Painlevé, Lille, 13 October, 2022.

S. Hirstoaga was invited at the conference Numerical Analysis, Numerical Modeling, Approximation Theory (Romanian Academy), 28 October 2022, Cluj-Napoca.

S. Hirstoaga gave a talk at the international workshop 'NumKin2022' (Numerical Methods for the Kinetic Equations of Plasma Physics), Max Planck Institute, Garching, 7-10 November, 2022.

Leadership within the scientific community

L. Grigori, member of the SIAM Fellows Selection Committee, July 15, 2021 to July 15, 2023.

L. Grigori, member elected of SIAM Council, 1st term January 2018 - December 2020, 2nd term January 2021 - December 2024 (re-elected), the committee supervising the scientific activities of SIAM. Nominated by a Committee and elected by the members of SIAM.

Scientific expertise

L. Grigori provides expertise to Onera.

Teaching - Supervision - Juries Teaching

Licence 3: S. Hirstoaga, Méthodes numériques, 53 HETD (cours + TP), EISE 3, Polytech Sorbonne.

Master 2: Laura Grigori, Winter 2022, Participation in the course on High Performance Computing given at Sorbonne University, Computer Science, intervention for 8 hours per year.

Master 2: Laura Grigori, Course on High performance computing for numerical methods and data analysis, Master 2nd year, Mathematics & Applications, Sorbonne University, 24 hours per year.

Master 2: Frédéric Nataf, Course on Domain Decomposition Methods, Sorbonne University, 22h.

Supervision

L. Grigori, supervision of Phd students M. Beaupere, S. Badreddine, J. G. de Damas, E. Timsit, V. T. Nguyen

S. Hirstoaga co-supervised two M1 internships.

F. Nataf, supervision of Phd student A. Lechevallier.

Juries

L. Grigori, Phd thesis of Mathieu Vérité, Bordeaux University, rapportrice.

L. Grigori, Phd thesis of Antoine Bienvenu, chemistry, Sorbonne University, examinatrice, July 2022.

L. Grigori, Phd thesis of Auguste Olivry, ENS Lyon, June 2022, rapportrice.

Communication-optimal parallel and sequential QR and LU factorizations J. W. J. W. Demmel L. L. Grigori M. M. Hoemmen J. J. Langou SIAM Journal on Scientific Computing 2012 1 206-239 An Introduction to Domain Decomposition Methods: algorithms, theory and parallel implementation V. Victorita Dolean P. Pierre Jolivet F. Frédéric Nataf 2015 SIAM An Additive Schwarz Method Type Theory for Lions's Algorithm and a Symmetrized Optimized Restricted Additive Schwarz Method R. R. Haferssas P. P. Jolivet F. F. Nataf SIAM J. Sci. Comput. 2017 39 4 A1345--A1365 New development in FreeFem++ F. F. Hecht J. Numer. Math. 2012 20 3-4 251--265 HPDDM: High-Performance Unified framework for Domain Decomposition methods, MPI-C++ library P. Pierre Jolivet F. Frédéric Nataf 2014 Adaptive Domain Decomposition method for Saddle Point problem in Matrix Form F. Frédéric Nataf P.-H. Pierre-Henri Tournier November 2019 Abstract robust coarse spaces for systems of PDEs via generalized eigenproblems in the overlaps N. Nicole Spillane V. Victorita Dolean P. Patrice Hauret F. Frédéric Nataf C. Clemens Pechstein R. Robert Scheichl Numer. Math. 2014 126 4 741--770 Local Multiple Traces Formulation for Electromagnetics: Stability and Preconditioning for Smooth Geometries A. Alan Ayala X. Xavier Claeys P. Paul Escapil-Inchauspé C. F. Carlos F. Jerez-Hanckes Journal of Computational and Applied Mathematics October 2022 413 Randomized Gram-Schmidt process with application to GMRES O. Oleg Balabanov L. Laura Grigori SIAM Journal on Scientific Computing June 2022 44 3 A1450-A1474 Robust treatment of cross points in Optimized Schwarz Methods X. Xavier Claeys E. Emile Parolin Numerische Mathematik May 2022 151 2 405-442 Recent Advances in Adaptive Coarse Spaces and Availability in Open Source Libraries V. Victorita Dolean F. Frédéric Hecht P. Pierre Jolivet F. Frédéric Nataf P.-H. Pierre-Henri Tournier MathematicS In Action 2022 11 1 61-71 Recent advances in domain decomposition methods for large-scale saddle point problems F. Frédéric Nataf P.-H. Pierre-Henri Tournier Comptes Rendus. Mécanique 2022 350 S1 1-15 Three-dimensional finite-difference finite-element frequency-domain wave simulation with multi-level optimized additive Schwarz domain-decomposition preconditioner: A tool for FWI of sparse node datasets P.-H. Pierre-Henri Tournier P. Pierre Jolivet V. Victorita Dolean H. Hossein Aghamiry S. Stéphane Operto S. Sebastian Riffo Geophysics September 2022 87 5 1-84 Factorized structure of the long-range two-electron integrals tensor and its application in quantum chemistry S. Siwar Badreddine I. Igor Chollet L. Laura Grigori October 2022 Block subsampled randomized Hadamard transform for low-rank approximation on distributed architectures O. Oleg Balabanov M. Matthias Beaupère L. Laura Grigori V. Victor Lederer October 2022 How does the partition of unity influence SORAS preconditioner? M. Marcella Bonazzoli X. Xavier Claeys F. Frédéric Nataf P.-H. Pierre-Henri Tournier December 2022 Several ways to achieve robustness when solving wave propagation problems N. Niall Bootland V. Victorita Dolean P. Pierre Jolivet F. Frédéric Nataf S. S. Operto P.-H. Pierre-Henri Tournier November 2022 Two-level DDM preconditioners for positive Maxwell equations N. Niall Bootland V. Victorita Dolean F. Frédéric Nataf P.-H. Pierre-Henri Tournier November 2022 A Unified Framework for Double Sweep Methods for the Helmholtz Equation N. Nacime Bouziani F. Frédéric Nataf P.-H. Pierre-Henri Tournier October 2022 A Directional Equispaced interpolation-based Fast Multipole Method for oscillatory kernels I. Igor Chollet X. Xavier Claeys P. Pierre Fortin L. Laura Grigori February 2022 Numerical resolution of the inverse source problem for EEG using the quasi-reversibility method M. Marion Darbas J. Juliette Leblond J.-P. Jean-Paul Marmorat P.-H. Pierre-Henri Tournier 2022 Discrete moments models for Vlasov equations with non constant strong magnetic limit B. Bruno Després F. Frédérique Charles R. Ruiyang Dai S. A. Sever Adrian Hirstoaga January 2023 Analysis of staggered schemes for the linear wave equation with Coriolis source term preserving discrete geostrophic equilibriums on triangular meshes M.-H. Minh-Hieu Do V.-T. Van-Thanh Nguyen P. Pascal Omnes July 2022 Higher-order QR with tournament pivoting for tensor compression D. David Frenkiel L. Laura Grigori M. Matthias Beaupère February 2022 A Parareal algorithm for a highly oscillating Vlasov-Poisson system with reduced models for the coarse solving L. Laura Grigori S. A. Sever A Hirstoaga J. Julien Salomon December 2022 Parallel approximation of the exponential of Hermitian matrices J. Julien Salomon L. Lucas Perrin S.-M. Sidi-Mahmoud Kaber F. Frédéric Hecht A. Alain Plagne January 2023 Time-reversed absorbing condition: application to inverse problems F. F Assous M. M Kray F. F Nataf E. E Turkel Inverse Problems 2011 27 6 065003 Time-dependent wave splitting and source separation M. J. Marcus J. Grote M. Marie Kray F. Frédéric Nataf F. Franck Assous Journal of Computational Physics 2017 330 981 - 996 Nodal auxiliary space preconditioning in H (curl) and H (div) spaces R. Ralf Hiptmair J. Jinchao Xu SIAM Journal on Numerical Analysis 2007 45 6 2483--2509

				1 RHS	130 RHS
Solver	arith.	ortho.	$T_{f} (s)$	#it	$T_{s} (s)$	$T_{t o t} (s)$	#it	$T_{s} (s)$	$T_{t o t} (s)$
PARDISO	double	MGS	12.1	48	16.5	28.6	50	724.4	736.5
PARDISO	double	CGS	12.1	48	16.7	28.8	50	575.5	587.6
MUMPS $_{F R}$	double	MGS	11.3	48	17.2	28.5	50	654.9	666.2
MUMPS $_{F R}$	double	CGS	11.3	48	16.9	28.2	50	482.2	493.5
PARDISO	single	MGS	6.5	48	9.9	16.4	50	403.6	410.1
PARDISO	single	MGS	6.5	48	9.6	16.1	50	334.8	341.3
MUMPS $_{F R}$	single	MGS	6.2	48	8.9	15.1	50	345.8	352.0
MUMPS $_{F R}$	single	CGS	6.2	48	8.9	15.1	50	275.3	281.5
MUMPS $_{B L R}$	single	CGS	4.9	53	7.2	12.1	55	256.8	261.7

f(Hz)	#dof( $10^{6}$ )	#cores	#it	ovl	$T_{f} (s)$	$T_{s} (s)$	$T_{t o t} (s)$	$E_{w}$
2.5	21.4	60	26	3	52.5	24.8	77.3	1
2.5	21.4	168	30	3	13.7	8.9	22.6	1.222
3.75	67.5	360	40	3	21.9	18.6	40.5	1.003
3.75	67.5	660	45	3	10.8	11.5	22.3	0.994
3.75	67.5	1344	53	3	4.4	7.1	11.5	0.947
3.75	67.5	2380	59	3	2.5	4.8	7.3	0.842
5	153.5	875	61	3	20.9	26.0	46.9	0.811
5	153.5	1344	63	3	12.4	18.6	31.0	0.798
7.5	500.1	2450	98	4	26.7	60.8	87.5	0.506
7.5	500.1	4752	108	4	12.2	39.7	51.9	0.439
10	1160.6	9856	142	5	15.1	70.8	85.9	0.297