Section: Partnerships and Cooperations
National Initiatives
Inria Project Lab
C2S@Exa  Computer and Computational Sciences at Exascale
Since January 2013, the team is participating to the C2S@Exa Inria Project Lab (IPL). This national initiative aims at the development of numerical modeling methodologies that fully exploit the processing capabilities of modern massively parallel architectures in the context of a number of selected applications related to important scientific and technological challenges for the quality and the security of life in our society. At the current state of the art in technologies and methodologies, a multidisciplinary approach is required to overcome the challenges raised by the development of highly scalable numerical simulation software that can exploit computing platforms offering several hundreds of thousands of cores. Hence, the main objective of C2S@Exa is the establishment of a continuum of expertise in the computer science and numerical mathematics domains, by gathering researchers from Inria projectteams whose research and development activities are tightly linked to high performance computing issues in these domains. More precisely, this collaborative effort involves computer scientists that are experts of programming models, environments and tools for harnessing massively parallel systems, algorithmists that propose algorithms and contribute to generic libraries and core solvers in order to take benefit from all the parallelism levels with the main goal of optimal scaling on very large numbers of computing entities and, numerical mathematicians that are studying numerical schemes and scalable solvers for systems of partial differential equations in view of the simulation of very largescale problems.
ANR
SOLHAR: SOLvers for Heterogeneous Architectures over Runtime systems
Participants : Emmanuel Agullo, Mathieu Faverge, Andra Hugo, Abdou Guermouche, Xavier Lacoste, Pierre Ramet, Jean Roman, Guillaume Sylvand.
Grant: ANRMONU
Dates: 2013 – 2017
Partners: Inria (REALOPT , RUNTIME Bordeaux SudOuest et ROMA RhoneAlpes), IRIT/INPT, CEACESTA et EADSIW.
Overview:
During the last five years, the interest of the scientific computing community towards accelerating devices has been rapidly growing. The reason for this interest lies in the massive computational power delivered by these devices. Several software libraries for dense linear algebra have been produced; the related algorithms are extremely rich in computation and exhibit a very regular pattern of access to data which makes them extremely good candidates for GPU execution. On the contrary, methods for the direct solution of sparse linear systems have irregular, indirect memory access patterns that adversely interact with typical GPU throughput optimizations.
This project aims at studying and designing algorithms and parallel programming models for implementing direct methods for the solution of sparse linear systems on emerging computer equipped with accelerators. The ultimate aim of this project is to achieve the implementation of a software package providing a solver based on direct methods for sparse linear systems of equations. To date, the approaches proposed to achieve this objective are mostly based on a simple offloading of some computational tasks to the accelerators and rely on fine handtuning of the code and accurate performance modeling to achieve efficiency. This project proposes an innovative approach which relies on the efficiency and portability of runtime systems. The development of a productionquality, sparse direct solver requires a considerable research effort along three distinct axis:

linear algebra: algorithms have to be adapted or redesigned in order to exhibit properties that make their implementation and execution on heterogeneous computing platforms efficient and reliable. This may require the development of novel methods for defining data access patterns that are more suitable for the dynamic scheduling of computational tasks on processing units with considerably different capabilities as well as techniques for guaranteeing a reliable and robust behavior and accurate solutions. In addition, it will be necessary to develop novel and efficient accelerator implementations of the specific dense linear algebra kernels that are used within sparse, direct solvers;

runtime systems: tools such as the StarPU runtime system proved to be extremely efficient and robust for the implementation of dense linear algebra algorithms. Sparse linear algebra algorithms, however, are commonly characterized by complicated data access patterns, computational tasks with extremely variable granularity and complex dependencies. Therefore, a substantial research effort is necessary to design and implement features as well as interfaces to comply with the needs formalized by the research activity on direct methods;

scheduling: executing a heterogeneous workload with complex dependencies on a heterogeneous architecture is a very challenging problem that demands the development of effective scheduling algorithms. These will be confronted with possibly limited views of dependencies among tasks and multiple, and potentially conflicting objectives, such as minimizing the makespan, maximizing the locality of data or, where it applies, minimizing the memory consumption.
Given the wide availability of computing platforms equipped with accelerators and the numerical robustness of direct solution methods for sparse linear systems, it is reasonable to expect that the outcome of this project will have a considerable impact on both academic and industrial scientific computing. This project will moreover provide a substantial contribution to the computational science and highperformance computing communities, as it will deliver an unprecedented example of a complex numerical code whose parallelization completely relies on runtime scheduling systems and which is, therefore, extremely portable, maintainable and evolvable towards future computing architectures.
SONGS: Simulation Of Next Generation Systems
Participant : Abdou Guermouche.
Grant: ANR 11 INFRA 13
Dates: 2011 – 2015
Partners: Inria (Bordeaux SudOuest, Nancy  Grand Est, RhoneAlpes, Sophia Antipolis  Méditerranée), I3S, LSIIT
Overview:
The last decade has brought tremendous changes to the characteristics of large scale distributed computing platforms. Large grids processing terabytes of information a day and the peertopeer technology have become common even though understanding how to efficiently such platforms still raises many challenges. As demonstrated by the USS SimGrid project funded by the ANR in 2008, simulation has proved to be a very effective approach for studying such platforms. Although even more challenging, we think the issues raised by petaflop/exaflop computers and emerging cloud infrastructures can be addressed using similar simulation methodology.
The goal of the SONGS project is to extend the applicability of the SimGrid simulation framework from Grids and PeertoPeer systems to Clouds and High Performance Computation systems. Each type of largescale computing system will be addressed through a set of use cases and lead by researchers recognized as experts in this area.
Any sound study of such systems through simulations relies on the following pillars of simulation methodology: Efficient simulation kernel; Sound and validated models; Simulation analysis tools; Campaign simulation management.
ANEMOS: Advanced Numeric for ELMs : Modeling and Optimized Schemes
Participants : Xavier Lacoste, Guillaume Latu, Pierre Ramet.
Grant: ANRMN
Dates: 2012 – 2016
Partners: Univ. Nice, CEA/IRFM, CNRS/MDS.
Overview: The main goal of the project is to make a significant progress in understanding of largely unknown at present physics of active control methods of plasma edge MHD instabilities Edge Localized Modes (ELMs) wich represent particular danger with respect to heat and particle loads for Plasma Facing Components (PFC) in ITER. Project is focused in particular on the numerical modelling study of such ELM control methods as Resonant Magnetic Perturbations (RMPs) and pellet ELM pacing both foreseen in ITER. The goals of the project are to improve understanding of the related physics and propose possible new strategies to improve effectiveness of ELM control techniques. The tool for the nonlinear MHD modeling is the JOREK code which was essentially developed within previous ANR ASTER . JOREK will be largerly developed within the present project to include corresponding new physical models in conjunction with new developments in mathematics and computer science strategy. The present project will put the nonlinear MHD modeling of ELMs and ELM control on the solid ground theoretically, computationally, and applicationswise in order to progress in urgently needed solutions for ITER.
Regarding our contributions, the JOREK code is mainly composed of numerical computations on 3D data. The toroidal dimension of the tokamak is treated in Fourier space, while the poloidal plane is decomposed in Bezier patches. The numerical scheme used involves a direct solver on a large sparse matrix as a main computation of one time step. Two main costs are clearly identified: the assembly of the sparse matrix, and the direct factorization and solve of the system that includes communications between all processors. The efficient parallelization of JOREK is one of our main goals, to do so we will reconsider: data distribution, computation distribution or GMRES implementation. The quality of the sparse solver is also crucial, both in term of performance and accuracy. In the current release of JOREK , the memory scaling is not satisfactory to solve problems listed above , since at present as one increases the number of processes for a given problem size, the memory footprint on each process does not reduce as much as one can expect. In order to access finer meshes on available supercomputers, memory savings have to be done in the whole code. Another key point for improving parallelization is to carefully profile the application to understand the regions of the code that do not scale well. Depending on the timings obtained, strategies to diminish communication overheads will be evaluated and schemes that improve load balancing will be initiated. JOREK uses PaStiX sparse matrix library for matrix inversion. However, large number of toroidal harmonics and particular thin structures to resolve for realistic plasma parameters and ITER machine size still require more aggressive optimisation in numeric dealing with numerical stability, adaptive meshes etc. However many possible applications of JOREK code we proposed here which represent urgent ITER relevant issues related to ELM control by RMPs and pellets remain to be solved.
OPTIDIS: OPTImisation d'un code de dynamique des DISlocations
Participants : Olivier Coulaud, Aurélien Esnard, Arnaud Etcheverry, Luc Giraud.
Grant: ANRCOSINUS
Dates: 2010 – 2014
Partners: CEA/DEN/DMN/SRMA (leader), SIMaP Grenoble INP and ICMPE / ParisEst.
Overview: Plastic deformation is mainly accommodated by dislocations glide in the case of crystalline materials. The behavior of a single dislocation segment is perfectly understood since 1960 and analytical formulations are available in the literature. However, to understand the behavior of a large population of dislocations (inducing complex dislocations interactions) and its effect on plastic deformation, massive numerical computation is necessary. Since 1990, simulation codes have been developed by French researchers. Among these codes, the code TRIDIS developed by the SIMAP laboratory in Grenoble is the pioneer dynamic dislocation code. In 2007, the project called NUMODIS had been set up as team collaboration between the SIMAP and the SRMA CEA Saclay in order to develop a new dynamics dislocation code using modern computer architecture and advanced numerical methods. The objective was to overcome the numerical and physical limits of the previous code TRIDIS. The version NUMODIS 1.0 came out in December 2009, which confirms the feasibility of the project. The project OPTIDIS is initiated when the code NUMODIS is mature enough to consider parallel computation. The objective of the project in to develop and validate the algorithms in order to optimize the numerical and performance efficiency of the NUMODIS code. We are aiming at developing a code able to tackle realistic material problems such as the interaction between dislocations and irradiation defects in a grain plastic deformation after irradiation. These kinds of studies where “local mechanisms" are correlated with macroscopic behavior is a key issue for nuclear industry in order to understand material aging under irradiation, and hence predict power plant secured service life. To carry out such studies, massive numerical optimizations of NUMODIS are required. They involve complex algorithms lying on advanced computational science methods. The project OPTIDIS will develop through joint collaborative studies involving researchers specialized in dynamics dislocations and in numerical methods. This project is divided in 8 tasks over 4 years. Two PhD thesis will be directly funded by the project. One will be dedicated to numerical development, validation of complex algorithms and comparison with the performance of existing dynamics dislocation codes. The objective of the second is to carry out large scale simulations to validate the performance of the numerical developments made in OPTIDIS . In both cases, these simulations will be compared with experimental data obtained by experimentalists.
RESCUE: RÉsilience des applications SCientifiqUEs
Participants : Emmanuel Agullo, Luc Giraud, Abdou Guermouche, Jean Roman, Mawussi Zounon.
Grant: ANRBlanc (computer science theme)
Dates: 2010 – 2014
Partners: Inria EPI ROMA (leader) and GRAND LARGE.
Overview: The advent of exascale machines will help solve new scientific challenges only if the resilience of large scientific applications deployed on these machines can be guaranteed. With 10,000,000 core processors, or more, the time interval between two consecutive failures is anticipated to be smaller than the typical duration of a checkpoint, i.e., the time needed to save all necessary application and system data. No actual progress can then be expected for a largescale parallel application. Current faulttolerant techniques and tools can no longer be used. The main objective of the RESCUE project is to develop new algorithmic techniques and software tools to solve the exascale resilience problem. Solving this problem implies a departure from current approaches, and calls for yettobediscovered algorithms, protocols and software tools.
This proposed research follows three main research thrusts. The first thrust deals with novel checkpoint protocols. This thrust will include the classification of relevant fault categories and the development of a software package for fault injection into application execution at runtime. The main research activity will be the design and development of scalable and lightweight checkpoint and migration protocols, with onthefly storing of key data, distributed but coordinated decisions, etc. These protocols will be validated via a prototype implementation integrated with the publicdomain MPICH project. The second thrust entails the development of novel execution models, i.e., accurate stochastic models to predict (and, in turn, optimize) the expected performance (execution time or throughput) of largescale parallel scientific applications. In the third thrust, we will develop novel parallel algorithms for scientific numerical kernels. We will profile a representative set of key largescale applications to assess their resilience characteristics (e.g., identify specific patterns to reduce checkpoint overhead). We will also analyze execution tradeoffs based on the replication of crucial kernels and on decentralized ABFT (AlgorithmBased Fault Tolerant) techniques. Finally, we will develop new numerical methods and robust algorithms that still converge in the presence of multiple failures. These algorithms will be implemented as part of a software prototype, which will be evaluated when confronted with realistic faults generated via our fault injection techniques.
We firmly believe that only the combination of these three thrusts (new checkpoint protocols, new execution models, and new parallel algorithms) can solve the exascale resilience problem. We hope to contribute to the solution of this critical problem by providing the community with new protocols, models and algorithms, as well as with a set of freely available publicdomain software prototypes.
BOOST: Building the future Of numerical methOdS for iTer
Participants : Emmanuel Agullo, Luc Giraud, Abdou Guermouche, Jean Roman, Xavier Vasseur.
Grant: ANRBlanc (applied math theme)
Dates: 2010 – 2014
Partners: Institut de Mathématiques de Toulouse (leader); Laboratoire d'Analyse, Topologie, Probabilités in Marseilles; Institut de Recherche sur la Fusion Magnétique, CEAr/IRFM and HiePACS .
Overview: This project regards the study and the development of a new class of numerical methods to simulate natural or laboratory plasmas and in particular magnetic fusion processes. In this context, we aim in giving a contribution, from the mathematical, physical and algorithmic point of view, to the ITER project.
The core of this project consists in the development, the analysis, the implementation and the testing on real physical problems of the socalled AsymptoticPreserving methods which allow simulations over a large range of scales with the same model and numerical method. These methods represent a breakthrough with respect to the stateofthe art. They will be developed specifically to handle the various challenges related to the simulation of the ITER plasma. In parallel with this class of methodologies, we intend to design appropriate coupling techniques between macroscopic and microscopic models for all the cases in which a net distinction between different regimes can be done. This will permit to describe different regimes in different regions of the machine with a strong gain in term of computational efficiency, without losing accuracy in the description of the problem. We will develop full 3D solver for the asymptotic preserving fluid as well as kinetic model. The AsymptoticPreserving (AP) numerical strategy allows us to perform numerical simulations with very large time and mesh steps and leads to impressive computational saving. These advantages will be combined with the utilization of the last generation preconditioned fast linear solvers to produce a software with very high performance for plasma simulation. For HiePACS this project provides in particular a testbed for our expertise in parallel solution of large linear systems.