Over the past few decades, there have been innumerable science, engineering and societal breakthroughs enabled by the development of high performance computing (HPC) applications, algorithms and architectures. These powerful tools have enabled researchers to find computationally efficient solutions to some of the most challenging scientific questions and problems in medicine and biology, climate science, nanotechnology, energy, and environment – to name a few – in the field of model-driven computing.
Meanwhile the advent of network capabilities and IoT, next generation sequencing, ... tend to generate a huge amount of data that deserves to be processed to extract knowledge and possible forecasts. These calculations are often referred to as data-driven calculations.
These two classes of challenges have a common ground in terms of numerical techniques that lies in the field of linear and multi-linear algebra. They do also share common bottlenecks related to the size of the mathematical objects that we have to represent and work on; those challenges retain a growing attention from the computational science community.

In this context, the purpose of the concace project, is to contribute to the design of novel numerical tools for model-driven and data-driven calculations arising from challenging academic and industrial applications. The solution of these challenging problems requires a multidisciplinary approach involving applied mathematics, computational and computer sciences. In applied mathematics, it essentially involves advanced numerical schemes both in terms of numerical techniques and data representation of the mathematical objects (e.g., compressed data, low-rank tensor 57, 64, 53 low-rank hierarchical matrices 55, 41). In computational science, it involves large scale parallel heterogeneous computing and the design of highly composable algorithms. Through this approach, concace intends to contribute to all the steps that go from the design of new robust and accurate numerical schemes to the flexible implementations of the associated algorithms on large computers.
To address these research challenges, researchers from Inria, Airbus Central R&T and Cerfacs have decided to combine their skills and research efforts to create the Inria concace project team, which will allow them to cover the entire spectrum, from fundamental methodological concerns to full validations on challenging industrial test cases. Such a joint project will enable a real synergy between basic and applied research with complementary benefits to all the partners.
The main benefits for each partner are given below:

In addition to the members of these entities, two other external collaborators will be strongly associated: Jean-René Poirier, from Laplace Laboratory at University of Toulouse) and Oguz Kaya, from LISN (Laboratoire Interdisciplinaire des Sciences du Numérique) at University of Saclay.

The scientific objectives described in Section 4 contain two main topics which cover numerical and computational methodologies. Each of the topic is composed of a methodological component and its validation counterpart to fully assess the relevance, robustness and effectiveness of the proposed solutions. First, we address numerical linear and multilinear algebra methodologies for model- and data-driven scientific computing. Second, because there is no universal single solution but rather a large panel of alternatives combining many of the various building boxes, we also consider research activities in the field of composition of parallel algorithms and data distributions to ease the investigation of this combinatorial problem toward the best algorithm for the targeted problem.

To illustrate on a single but representative example of model-driven problems
that the joint team will address we can mention one encountered at Airbus that
is related to large aero-acoustic calculations. The reduction of noise produced
by aircraft during take-off and landing has a direct societal and environmental
impact on the populations (including citizen health) located around airports.
To comply with new noise regulation rules, novel developments must be undertaken
to preserve the competitiveness of the European aerospace industry. In order to
design and optimize new absorbing materials for acoustics and reduce the
perceived sound, one must be able to simulate the propagation of an acoustic
wave in an aerodynamic flow: The physical phenomenon at stake is
aero-acoustics. The complex and chaotic nature of fluid mechanics requires
simplifications in the models used. Today, we consider the flow as non-uniform
only in a small part of the space (in the jet flow of the reactors mainly) which
will be meshed in volume finite elements, and everywhere else the flow will be
considered as uniform, and the acoustic propagation will be treated with surface
finite elements. This brings us back to the solution of a linear system with
dense and sparse parts, an atypical form for which there is no "classical"
solver available. We therefore have to work on the coupling of methods (direct
or iterative, dense or sparse, compressed or not, etc.), and to compose
different algorithms in order to be able to handle very large industrial cases.
While there are effective techniques to solve each part independently from one
another, there is no canonical, efficient solution for the coupled problem,
which has been much less studied by the community. Among the possible
improvements to tackle such a problem, hybridizing simulation and learning
represents an alternative which allows one to reduce the complexity by avoiding
as much as possible local refinements and therefore reduce the size of the problem.

Regarding data-driven calculation, climate data analysis is one of the
application domains that generate huge amounts of data, either in the form of
measurements or computation results. The ongoing effort between the climate
modeling and weather forecasting community to mutualize digital environement,
including codes and models, leads the climate community to use finer models and
discretization generating an ever growing amount of data. The analysis of these
data, mainly based on classical numerical tools with a strong involvement of
linear algebra ingredients, is facing new scalability challenges due to this
growing amount of data. Computed and measured data have intrinsic structures
that could be naturally exploited by low rank tensor representations to best
reveal the hidden structure of the data while addressing the scalability
problem. The close link with the CECI team at Cerfacs will provide us with the
opportunity to study novel numerical methodologies based on tensor calculation.
Contributing to a better understanding of the mechanisms governing the climate
change would obviously have significant societal and economical impacts on
the population. This is just an illustration of a possible usage of our work,
we could also have possibly mentioned an on-going collaboration where our tools
will be used in the context of a steel company to reduce the data volume
generated by IoT to be transferred on the cloud for the analysis. The
methodological part described in Section 4 covers mostly two complementary
topics: the first in the field of numerical scientific computing and the second
in the core of computational sciences.

To sum-up, for each of the methodological contributions, we aim to find at least one dimensioning application, preferably from a societal challenge, which will allow us to validate these methods and their implementations at full-scale. The search for these applications will initially be carried out among those available at Airbus or Cerfacs, but the option of seeking them through collaborations outside the project will remain open. The ambition remains to develop generic tools whose implementations will be made accessible via their deposit in the public domain.

The methodological component of our proposal concerns the expertise for the design as well as the efficient and scalable implementation of highly parallel numerical algorithms. We intend to go from numerical methodology studies to design novel numerical schemes up to the full assessment at scale in real case academic and industrial applications thanks to advanced HPC implementations.

Our view of the research activity to be developed in Concace is to systematically assess the methodological and theoretical developments in real scale calculations mostly through applications under investigations by the industrial partners (namely Airbus Central R&T and Cerfacs).

We first consider in Section 4.1 topics concerning parallel linear and multi-linear algebra techniques that currently appear as promising approaches to tackle huge problems both in size and in dimension on large numbers of cores. We highlight the linear problems (linear systems or eigenproblems) because they are in many large scale applications the main bottleneck and the most computational intensive numerical kernels. The second research axis, presented in Section 4.2, is related to the challenge faced when advanced parallel numerical toolboxes need to be composed to easily find the best suited solution both from a numerical but also parallel performance point of view.

In short the research activity will rely on two scientific pillars, the first dedicated to the development of new mathematical methods for linear and mutilinear algebra (both for model-driven and data-driven calculations). The second pillar will be on parallel computational methods enabling to easily compose in a parallel framework the packages associated with the methods developed as outcome of the first pillar. The mathematical methods from the first pillar can mathematically be composed, the challenge will be to do on large parallel computers thank to the outcome of the second pillar. We will still validate on real applications and at scale (problem and platform) in close collaborations with application experts.

At the core of many simulations, one has to solve a linear algebra problem that is defined in a vector space and that involves linear operators, vectors and scalars, the unknowns being usually vectors or scalars, e.g. for the solution of a linear system or an eigenvalue problem. For many years, in particular in model-driven simulations, the problems have been reformulated in classical matrix formalism possibly unfolding the spaces where the vectors naturally live (typically 3D PDEs) to end up with classical vectors in e.g., time dependent 3D PDE), the other dimensions are dealt in a problem specific fashion as unfolding those dimensions would lead to too large matrices/vectors. The concace research program on numerical methodology intends to address the study of novel numerical algorithms to continue addressing the mainstream approaches relying on classical matrix formalism but also to investigate alternatives where the structure of the underlying problem is kept preserved and all dimensions are dealt with equally. This latter research activity mostly concerns linear algebra in tensor spaces. In terms of algorithmic principles, we will lay an emphasis on hierarchy as a unifying principle for the numerical algorithms, the data representation and processing (including the current hierarchy of arithmetic) and the parallel implementation towards scalability.

As an extension of our past and on-going research activities, we will continue our works on numerical linear algebra for model-driven applications that rely on classical vectorial spaces defined on

The main numerical algorithms we are interested in are:

In that context, we will consider the benefit of using hybridization between simulation and learning in order to reduce the complexity of classical approaches by diminishing the problem size or improving preconditioning techniques. In a longer term perspective, we will also conduct an active technological watch activity with respect to quantum computing to better understand how such a advanced computing technology can be synergized with classical scientific computing.

This work will mostly address linear algebra problems defined in large dimensional spaces as they might appear either in model-driven simulations or data-driven calculations. In particular we will be interested in tensor vectorial spaces where the intrinsic mathematical structures of the objects have to be exploited to design efficient and effective numerical techniques.

The main numerical algorithms we are interested in are:

Novel techniques for large size and large dimension problems tend to reduce the memory footprint and CPU consumption through data compression such as low-rank approximations (hierarchical matrices for dense and sparse calculation, tensor decomposition 49, 66, 61) or speed up the algorithm (fast multipole method, randomized algorithm 56, 6267, 39 to reduce the time and energy to solution. Because of the compression, the genuine data are represented with lower accuracy possibly in a hierarchical manner. Understanding the impact of this lower precision data representation through the entire algorithm is an important issue for developing robust, “accurate” and efficient numerical schemes for current and emerging computing platforms from laptop commodity to supercomputers. Mastering the trade-off between performance and accuracy will be part of our research agenda 43, 46.

Because the low precision data representation can have diverse origins, this research activity will naturally cover the multi-precision arithmetic calculation in which the data perturbation comes entirely from the data encoding, representation and calculation in IEEE (or more exotic Nvidia GPU or Google TPU) floating point numbers. This will result in variable accuracy calculations. This general framework will also enable us to address soft error detection 34 and study possible mitigation schemes to design resilient algorithms.

A major breakthrough for exploiting multicore machine 42 is based on a data format and computational technique originally used in an out-of-core context 54. This is itself a refinement of a broader class of numerical algorithms – namely, “updating techniques” – that were not originally developed with specific hardware considerations in mind. This historical anecdote perfectly illustrates the need to separate data representation, algorithmic and architectural concerns when developing numerical methodologies. In the recent past, we have contributed to the study of the sequential task flow (STF) programming paradigm, that enabled us to abstract the complexity of the underlying computer architecture 32, 33, 31.
In the concace project, we intend to go further by abstracting the numerical algorithms and their dedicated data structures. We strongly believe that combining these two abstractions will allow us to easily compose toolbox algorithms and data representations in order to study combinatorial alternatives towards numerical and parallel computational efficiency. We have demonstrated this potential on domain decomposition methods for solving sparse linear systems arising from the discretisation of PEDs, that has been implemented in the maphys++ parallel package.

Regarding the abstraction of the target architecture in the design of numerical algorithms, the STF paradigm has been shown to significantly reduce the difficulty of programming these complex machines while ensuring high computational efficiency. However, some challenges remain. The first major difficulty is related to the scalability of the model at large scale where handling the full task graph associated with the STF model becomes a severe bottleneck. Another major difficulty is the inability (at a reasonable runtime cost) to efficiently handle fine-grained dynamic parallelism, such as numerical pivoting in the Gaussian elimination where the decision to be made depends on the outcome of the current calculation and cannot be known in advance or described in a task graph. These two challenges are the ones we intend to study first.

With respect to the second ingredient, namely the abstraction of the algorithms and data representation, we will also explore whether we can provide additional separation of concerns beyond that offered by a task-based design. As a seemingly simple example, we will investigate the possibility of abstracting the matrix-vector product, basic kernel at the core of many numerical linear algebra methods, to cover the case of the fast multipole method (FMM, at the core of the ScalFMM library). FMM is mathematically a block matrix-vector product where some of the operations involving the extra-diagonal blocks with hierachical structure would be compressed analytically. Such a methodological step forward will consequently allow the factorisation of a significant part of codes (so far completely independent because no bridge has been made upstream) including in particular the ones dealing with

We intend to strengthen our engagement in reproducible and open science. Consequently, we will continue our joint effort to ensure consistent deployment of our parallel software; this will contribute to improve its impact on academic and industrial users. The software engineering challenge is related to the increasing number of software dependencies induced by the desired capability of combining the functionality of different numerical building boxes, e.g., a domain decomposition solver (such as maphys++) that requires advanced iterative schemes (such as those provided by fabulous) as well as state-of-the-art direct methods (such as pastix, mumps, or qr_mumps), deploying the resulting software stack can become tedious 36.

In that context, we will consider the benefit of using hybridization between simulation and learning in order to reduce the complexity of classical approaches by diminishing the problem size or improving preconditioning techniques. In a longer term perspective, we will also conduct an active technological watch activity with respect to quantum computing to better understand how such a advanced computing technology can be synergized with classical scientific computing.

We have a major application domain in acoustic simulations that is provided by Airbus CR & T and a few more through collaborations in the context of ongoing projects, that include: plasma simulation (ESA contract and ANR Maturation), Electric device design (ANR TensorVim) and nanoscale simulation platform (ANR Diwina).

This domains is in the context of a long term collaboration with Airbus Research Centers.
Wave propagation phenomena intervene in many different aspects of systems design at Airbus. They drive the level of acoustic vibrations that mechanical components have to sustain, a level that one may want to diminish for comfort reason (in the case of aircraft passengers, for instance) or for safety reason (to avoid damage in the case of a payload in a rocket fairing at take-off). Numerical simulations of these phenomena plays a central part in the upstream design phase of any such project 44. Airbus Central R & T has developed over the last decades an in-depth knowledge in the field of Boundary Element Method (BEM) for the simulation of wave propagation in homogeneous media and in frequency domain. To tackle heterogeneous media (such as the jet engine flows, in the case of acoustic simulation), these BEM approaches are coupled with volumic finite elements (FEM). We end up with the need to solve large (several millions unknowns) linear systems of equations composed of a dense part (coming for the BEM domain) and a sparse part (coming from the FEM domain). Various parallel solution techniques are available today, mixing tools created by the academic world (such as the Mumps and Pastix sparse solvers) as well as parallel software tools developed in-house at Airbus (dense solver SPIDO, multipole solver,

Most of the software packages we develop are deployed using Guix-HPC 22.

ScalFMM is a software library to simulate N-body interactions using the Fast Multipole Method. The library offers two methods to compute interactions between bodies when the potential decays like 1/r. The first method is the classical FMM based on spherical harmonic expansions and the second is the Black-Box method which is an independent kernel formulation (introduced by E. Darve @ Stanford). With this method, we can now easily add new non oscillatory kernels in our library. For the classical method, two approaches are used to decrease the complexity of the operators. We consider either matrix formulation that allows us to use BLAS routines or rotation matrix to speed up the M2L operator.

ScalFMM intends to offer all the functionalities needed to perform large parallel simulations while enabling an easy customization of the simulation components: kernels, particles and cells. It works in parallel in a shared/distributed memory model using OpenMP and MPI. The software architecture has been designed with two major objectives: being easy to maintain and easy to understand. There is two main parts: the management of the octree and the parallelization of the method the kernels. This new architecture allow us to easily add new FMM algorithm or kernels and new paradigm of parallelization.

The version 3.0 of the library is a partial rewriting of the version 2.0 in modern C++ ( C++17) to increase the genericity of the approach. This version is also the basic framework for studying numerical and parallel composability within Concace.

We are interested in the direct solution of very large linear systems composed of both sparse and dense parts. Coupled hollow/dense systems appear in various physical problems, such as the simulation of acoustic wave propagation around aircraft. To produce a physically realistic result, the number of unknowns in the system can be extremely high, making its handling a real challenge. Thanks to the building blocks provided by state-of-the-art hollow and dense solvers, we can compose a coupled hollow/dense solver. To reduce the computation time and memory consumption of direct methods, some solvers implement advanced features such as numerical compression, out-of-core computation and distributed memory parallelism. These functionalities can be easily applied within the individual building blocks, but this is not trivial at the articulation between the hollow solver bricks and the dense solver bricks. Their programming interface (API) has not been designed for this purpose. We have previously proposed solver coupling schemes that still allow the use of these well-optimized solvers with advanced functionalities. The idea is to apply the existing API to carefully selected sub-arrays of coupled systems so as to take full advantage of digital compression and out-of-core computation in both shared and distributed memory. Although capable of handling considerably larger coupled systems compared to the state of the art, these schemes remain sub-optimal due to intrinsic design limitations. We therefore explore an alternative coupling scheme based on direct task-based solvers that use the same execution engine. The aim is to improve composability and facilitate data passing between hollow and dense solvers for more efficient computation. Before considering the integration of this approach into the complex code of a full community solver, we implemented a proof-of-concept without some advanced features. A preliminary experimental study enabled us to validate our prototype and demonstrate its competitiveness against other approaches.

For more details on this work we refer to 17.

Dense matrix multiplication involving a symmetric input matrix (SYMM) is implemented in reference distributed-memory codes with the same data distribution as its general analogue (GEMM). We show that, when the symmetric matrix is dominant, such a 2D block-cyclic (2D BC) scheme leads to a lower arithmetic intensity (AI) of SYMM than that of GEMM by a factor of 2. We propose alternative data distributions preserving the memory benefit of SYMM of storing only half of the matrix while achieving up to the same AI as GEMM. We also show that, in the case we can afford the same memory footprint as GEMM, SYMM can achieve a higher AI. We propose a task-based design of SYMM independent of the data distribution. This design allows for scalable A-stationary SYMM with which all discussed data distributions, may they be very irregular, can be easily assessed. We have integrated the resulting code in a reduction dimension algorithm involving a randomized singular value decomposition dominated by SYMM. An experimental study shows a compelling impact on performance.

For more details on this work we refer to 15.

Task-based programming models have succeeded in gaining the interest of the high-performance mathematical software community because they relieve part of the burden of developing and implementing distributed-memory parallel algorithms in an efficient and portable way.In increasingly larger, more heterogeneous clusters of computers, these models appear as a way to maintain and enhance more complex algorithms. However, task-based programming models lack the flexibility and the features that are necessary to express in an elegant and compact way scalable algorithms that rely on advanced communication patterns. We show that the Sequential Task Flow paradigm can be extended to write compact yet efficient and scalable routines for linear algebra computations. Although, this work focuses on dense General Matrix Multiplication, the proposed features enable the implementation of more complex algorithms. We describe the implementation of these features and of the resulting GEMM operation. Finally, we present an experimental analysis on two homogeneous supercomputers showing that our approach is competitive up to 32,768 CPU cores with state-of-the-art libraries and may outperform them for some problem dimensions. Although our code can use GPUs straightforwardly, we do not deal with this case because it implies other issues which are out of the scope of this work.

For more details on this work we refer to 13.

With the rise of multi-core processors with a large number of cores the need of shared memory reduction that perform efficiently on a large number of core is more pressing. Efficient shared memory reduction on these multi-core processors will help share memory programs being more efficient on these one. In this paper, we propose a reduction combined with barrier method that uses SIMD instructions to combine barriers signaling and reduction value read/write to minimize memory/cache traffic between cores thus, reducing barrier latency. We compare different barriers and reduction methods on three multi-core processors and show that proposed combining barrier/reduction method are 4 and 3.5 times faster than respectively GCC 11.1 and Intel 21.2 OpenMP 4.5 reduction.

For more details on this work we refer to 14.

We consider the solution of linear systems with tensor product structure using a GMRES algorithm.
To cope with the computational complexity in large dimension both in terms of floating point operations and memory requirement, our algorithm is based on low-rank tensor representation, namely the Tensor Train format.
In a backward error analysis framework, we show how the tensor approximation affects the accuracy of the computed solution.
With the backward perspective, we investigate the situations where the

In the framework of tensor spaces, we consider orthogonalization kernels to generate an orthogonal basis of a tensor subspace from a set of linearly independent tensors. In particular, we investigate numerically the loss of orthogonality of six orthogonalization methods, namely Classical and Modified Gram-Schmidt with (CGS2, MGS2) and without (CGS, MGS) re-orthogonalization, the Gram approach, and the Householder transformation. To tackle the curse of dimensionality, we represent tensor with low rank approximation using the Tensor Train (TT) formalism, and we introduce recompression steps in the standard algorithm outline through the TT-rounding method at a prescribed accuracy. After describing the algorithm structure and properties, we illustrate numerically that the theoretical bounds for the loss of orthogonality in the classical matrix computation round-off analysis results are maintained, with the unit round-off replaced by the TT-rounding accuracy. The computational analysis for each orthogonalization kernel in terms of the memory requirement and the computational complexity measured as a function of the number of TT-rounding, which happens to be the computational most expensive operation, completes the study.

This work was presented in two international conferences 28, 29 from different scientific communities.

For more details on this work we refer to the revised version of the scientific report 48 to be published.

In recent years, scientific machine learning, utilizing deep learning methodologies, has found widespread application in the fields of scientific computing and computational engineering. Nevertheless, while these data-driven deep learning solvers can be highly effective once appropriately trained, they often yield solutions of limited accuracy. Additionally, the computational expenses incurred during the training phase can be prohibitively high. In this talk, we first presents the details of training various learning solvers, incorporating with different neural network architectures, for solving the heterogeneous Helmholtz equation. Some mathematical ingredients from classical iterative solver are considered into the training phase to enhance robustness and speed. Moreover, once the neural network solvers are adequately trained, their inferences can be applied as a nonlinear preconditioner in the classical subspace methods, like the flexible GMRES or flexible FOM method. This presentation demonstrates the efficiency of employing neural networks as preconditioner and showcases the evident advantages of these neural network preconditioned approaches. They outperform both the newly emerging deep neural network methods and the classical subspace methods in both computational efficiency and solution accuracy.

For more details on this work we refer to the revised version of the scientific report 27, 30

The discretization of spatial operators using boundary element techniques leads to dense linear systems. The representation of full matrices in

Multilevel estimators aim at reducing the variance of Monte Carlo statistical estimators, by combining samples generated with simulators of different costs and accuracies. In particular, the recent work of Schaden and Ullmann (2020) on the multilevel best linear unbiased estimator (MLBLUE) introduces a framework unifying several multilevel and multifidelity techniques. The MLBLUE is reintroduced here using a variance minimization approach rather than the regression approach of Schaden and Ullmann. We then discuss possible extensions of the scalar MLBLUE to a multidimensional setting, i.e. from the expectation of scalar random variables to the expectation of random vectors. Several estimators of increasing complexity are proposed: a) multilevel estimators with scalar weights, b) with element-wise weights, c) with spectral weights and d) with general matrix weights. The computational cost of each method is discussed. We finally extend the MLBLUE to the estimation of second-order moments in the multidimensional case, i.e. to the estimation of covariance matrices. The multilevel estimators proposed are d) a multilevel estimator with scalar weights and e) with element-wise weights. In large-dimension applications such as data assimilation for geosciences, the latter estimator is computationnally unaffordable. As a remedy, we also propose f) a multilevel covariance matrix estimator with optimal multilevel localization, inspired by the optimal localization theory of Ménétrier and Auligné (2015).

For more details on this work we refer to the revised version of the scientific report 24

We investigate the use of multilevel Monte Carlo (MLMC) methods for estimating the expectation of discretized random fields. Specifically, we consider a setting in which the input and output vectors of the numerical simulators have inconsistent dimensions across the multilevel hierarchy. This requires the introduction of grid transfer operators borrowed from multigrid methods. Starting from a simple 1D illustration, we demonstrate numerically that the resulting MLMC estimator deteriorates the estimation of high-frequency components of the discretized expectation field compared to a Monte Carlo (MC) estimator. By adapting mathematical tools initially developed for multigrid methods, we perform a theoretical spectral analysis of the MLMC estimator of the expectation of discretized random fields, in the specific case of linear, symmetric and circulant simulators. This analysis provides a spectral decomposition of the variance into contributions associated with each scale component of the discretized field. We then propose improved MLMC estimators using a filtering mechanism similar to the smoothing process of multigrid methods. The filtering operators improve the estimation of both the small- and large-scale components of the variance, resulting in a reduction of the total variance of the estimator. These improvements are quantified for the specific class of simulators considered in our spectral analysis. The resulting filtered MLMC (F-MLMC) estimator is applied to the problem of estimating the discretized variance field of a diffusion-based covariance operator, which amounts to estimating the expectation of a discretized random field. The numerical experiments support the conclusions of the theoretical analysis even with non-linear simulators, and demonstrate the improvements brought by the proposed F-MLMC estimator compared to both a crude MC and an unfiltered MLMC estimator.

For more details on this work we refer to the revised version of the scientific report 23

Monte Carlo (MC) sampling is a popular method for estimating the statistics (e.g. expectation and variance) of a random variable. Its slow convergence has led to the emergence of advanced techniques to reduce the variance of the MC estimator for the outputs of computationally expensive solvers. The control variates (CV) method corrects the MC estimator with a term derived from auxiliary random variables that are highly correlated with the original random variable. These auxiliary variables may come from surrogate models. Such a surrogate-based CV strategy is extended here to the multilevel Monte Carlo (MLMC) framework, which relies on a sequence of levels corresponding to numerical simulators with increasing accuracy and computational cost. MLMC combines output samples obtained across levels, into a telescopic sum of differences between MC estimators for successive fidelities. In this paper, we introduce three multilevel variance reduction strategies that rely on surrogate-based CV and MLMC. MLCV is presented as an extension of CV where the correction terms devised from surrogate models for simulators of different levels add up. MLMC-CV improves the MLMC estimator by using a CV based on a surrogate of the correction term at each level. Further variance reduction is achieved by using the surrogate-based CVs of all the levels in the MLMC-MLCV strategy. Alternative solutions that reduce the subset of surrogates used for the multilevel estimation are also introduced. The proposed methods are tested on a test case from the literature consisting of a spectral discretization of an uncertain 1D heat equation, where the statistic of interest is the expected value of the integrated temperature along the domain at a given time. The results are assessed in terms of the accuracy and computational cost of the multilevel estimators, depending on whether the construction of the surrogates, and the associated computational cost, precede the evaluation of the estimator. It was shown that when the lower fidelity outputs are strongly correlated with the high-fidelity outputs, a significant variance reduction is obtained when using surrogate models for the coarser levels only. It was also shown that taking advantage of pre-existing surrogate models proves to be an even more efficient strategy.

For more details on this work we refer to the revised version of the scientific report 25

Probabilistic inference in high-dimensional continuous (or hybrid) domains is a challenging problem typically addressed through discretization, sampling, or reliance on often naive parametric assumptions. The drawbacks of these methods are well-known: slow computational speeds and/or highly inaccurate results.

This paper introduces a novel deterministic and general inference algorithm designed for hybrid Bayesian networks featuring both discrete and continuous variables. The algorithm avoids the discretization of continuous densities into histograms by employing quadrature rules to compute continuous integrals, thus transforming the process of marginalizing continuous random variables into summations. These summations are subsequently computed using classical sum-product algorithms within an auxiliary discrete Bayesian network, appropriately constructed for this purpose.

Numerous experiments are conducted using either the conditional linear Gaussian model for reference, or non-Gaussian models for the sake of generality. The algorithm shows remarkable performances both in speed and accuracy when compared with discretization, kernel smoothing or Gaussian assumption. This establishes the algorithm’s efficacy across a spectrum of scenarios, and proving its potential as a robust tool for hybrid Bayesian network inferences.

Some on the ongoing PhD thesis are developed within bilareal contract with industry for PhD advisory such as

In addition two post-docs, namely Maksym Shpakovych and Marvin Lasserre, are funded by the "plan de relance".

There is a continuous deployment of sensor devices in industrial manufactures to monitor the production processes. Beyond the real time monitoring of the infrastructures, the huge amount of data collected can be further exploited to forecast for instance the aging or the failures of some production tools using machine learning techniques. Classically, this data analysis is performed off- line using a cloud-based service and transferring the data from the production place to the processing place on the cloud is a major bottleneck.

This project aims to design a highly efficient and robust parallel non-linear dimensionality reduction approach for a generic cloud- based Industrial Internet of Things (IIoT) data processing system to reduce the volume of data transferred without compromising the accuracy of the target machine learning tasks. An interdisciplinary collaboration that addresses all the relevant issues from numerical linear algebra, parallel processing, machine learning, and IIoT systems is required to achieve this goal.

The project's main objective is to develop a novel eigendecomposition approach based on contour integrals and recycling Krylov subspaces to improve the robustness and efficiency of non-linear dimensionality reduction techniques. The resulting non-linear dimensionality technique will then be used to reduce the size of time-dependent IIoT data to decrease the data transfer costs without compromising the overall machine learning accuracy on the cloud side. To achieve the main objective of the project, partners will accomplish the following specific objectives: (i) Design of an efficient and scalable eigensolver, (ii) Integration of the efficient eigensolver to the non-linear dimensionality reduction methods, and (iii) Application of the proposed method to realistic IIoT system data.

The work between ANL and Inria was initiated in the context of the JLESC initiative. Compression is ubiquitous in scientific computing in gen- eral and numerical linear algebra in particular. One of the most well-known methods for compression in this latter field is truncated singular value decomposition (TSVD). TSVD allows for compressing matrices in some optimum sense. However, there are fewer techniques for compressing vectors. An old but currently intensively studied method is to design numerical algorithms able to use mixed-precision arithmetic. Still, data are stored in a way that sticks to the hardware processing capacity, typically under the form of 64-, 32-, and 16-bit words. The idea of variable-accuracy storage is instead to rely on a compressor such as SZ develop at ALN to compress vectors independently from hardware constraints and apply it to the solution of large sparse linear systems.

NumPEx is a French program dedicated to Exascale: High-performance computing (HPC), high-performance data analytics (HPDA), and Artificial Intelligence (AI) pose significant challenges across scientific, societal, economic, and ethical realms. These technologies, including modeling and data analysis, are crucial decision support tools addressing societal issues and competitiveness in French research and development. Digital resources, essential across science and industry, demand high-performance hardware. HPC enables advanced modeling, while HPDA handles heterogeneous and massive data. The solution to exploding demand is the upcoming “exascale” computers, a new generation with extraordinary capabilities.

In this context, the French Exascale program NumPEx aims at designing and developing the software components that will equip future exascale machines. NumPEx will deliver Exascale-grade numerical methods, softwares, and training, allowing France to remain one of the leaders in the field. It will contribute to take bridging the gap between cutting-edge software development and application domains to prepare the major scientific and industrial application codes to fully exploit the capabilities of these machines. Application domains of the NumPEx program include, but are not limited to, weather forecasting and climate, aeronautics, automotive, astrophysics, high energy physics, material science, energy production and management, biology and health.

Numpex is organized in 7 scientific pillar projects, we are directly involved in two of them namely:

The simulation under real conditions of partially magnetized low temperature plasmas by Lagrangian approaches, though using powerful Particle-In-Cell (PIC) techniques supplemented with efficient high-performance computing methods, requires considerable computing resources for large plasma densities. This is explained by two main limitations. First, stability conditions that constrain the numerical parameters to resolve the small space and time scales. These numerical parameters are the mesh size of the grid used to compute the electric field and the time step between two consecutive computations. Second, PIC methods rely on a sampling of the distribution function by numerical particles whose motion is time integrated in the self-consistent electric field. The PIC algorithm remains close to physics and offers an incomparable efficiency with regard to Eulerian methods, discretizing the distribution function onto a mesh. It is widely and successfully operated for the discretization of kinetic plasma models for more than 40 years. Nonetheless, to spare the computational resources, the number of numerical particles is limited compared to that of the physical particles. Inherent to this “coarse” sampling, PIC algorithms produce numerical approximations prone to statistical fluctuations that vanish slowly with the mean number of particles per cell. The mesh accessible on typical high performance computing machines may

This is the issue addressed within the MATURATION project aiming at introducing a new class of PIC algorithms with an unprecedented computational efficiency, by analyzing and improving, parallelizing and optimizing as well as benchmarking, in the demanding context of partially magnetized low temperature plasmass through 2D large scale and 3D computations, a method recently proposed in the literature, based on a combination of sparse grid techniques and PIC algorithm.

L. Giraud is member of the editorial board of the SIAM Journal on Scientific Computing (SISC).

Applied Mathematical Modelling, SIAM J. Scientific Computing, Mathematical Modelling and Numerical Analysis, ...

PhD defense