Section: Partnerships and Cooperations
Inria Project Lab
C2S@Exa - Computer and Computational Sciences at Exascale
Since January 2013, the team is participating to the C2S@Exa Inria Project Lab (IPL). This national initiative aims at the development of numerical modeling methodologies that fully exploit the processing capabilities of modern massively parallel architectures in the context of a number of selected applications related to important scientific and technological challenges for the quality and the security of life in our society. At the current state of the art in technologies and methodologies, a multidisciplinary approach is required to overcome the challenges raised by the development of highly scalable numerical simulation software that can exploit computing platforms offering several hundreds of thousands of cores. Hence, the main objective of C2S@Exa is the establishment of a continuum of expertise in the computer science and numerical mathematics domains, by gathering researchers from Inria project-teams whose research and development activities are tightly linked to high performance computing issues in these domains. More precisely, this collaborative effort involves computer scientists that are experts of programming models, environments and tools for harnessing massively parallel systems, algorithmists that propose algorithms and contribute to generic libraries and core solvers in order to take benefit from all the parallelism levels with the main goal of optimal scaling on very large numbers of computing entities and, numerical mathematicians that are studying numerical schemes and scalable solvers for systems of partial differential equations in view of the simulation of very large-scale problems.
SOLHAR: SOLvers for Heterogeneous Architectures over Runtime systems
Participants : Emmanuel Agullo, Mathieu Faverge, Abdou Guermouche, Pierre Ramet, Jean Roman, Guillaume Sylvand.
Dates: 2013 – 2017
During the last five years, the interest of the scientific computing community towards accelerating devices has been rapidly growing. The reason for this interest lies in the massive computational power delivered by these devices. Several software libraries for dense linear algebra have been produced; the related algorithms are extremely rich in computation and exhibit a very regular pattern of access to data which makes them extremely good candidates for GPU execution. On the contrary, methods for the direct solution of sparse linear systems have irregular, indirect memory access patterns that adversely interact with typical GPU throughput optimizations.
This project aims at studying and designing algorithms and parallel programming models for implementing direct methods for the solution of sparse linear systems on emerging computer equipped with accelerators. The ultimate aim of this project is to achieve the implementation of a software package providing a solver based on direct methods for sparse linear systems of equations. To date, the approaches proposed to achieve this objective are mostly based on a simple offloading of some computational tasks to the accelerators and rely on fine hand-tuning of the code and accurate performance modeling to achieve efficiency. This project proposes an innovative approach which relies on the efficiency and portability of runtime systems. The development of a production-quality, sparse direct solver requires a considerable research effort along three distinct axes:
linear algebra: algorithms have to be adapted or redesigned in order to exhibit properties that make their implementation and execution on heterogeneous computing platforms efficient and reliable. This may require the development of novel methods for defining data access patterns that are more suitable for the dynamic scheduling of computational tasks on processing units with considerably different capabilities as well as techniques for guaranteeing a reliable and robust behavior and accurate solutions. In addition, it will be necessary to develop novel and efficient accelerator implementations of the specific dense linear algebra kernels that are used within sparse, direct solvers;
runtime systems: tools such as the StarPU runtime system proved to be extremely efficient and robust for the implementation of dense linear algebra algorithms. Sparse linear algebra algorithms, however, are commonly characterized by complicated data access patterns, computational tasks with extremely variable granularity and complex dependencies. Therefore, a substantial research effort is necessary to design and implement features as well as interfaces to comply with the needs formalized by the research activity on direct methods;
scheduling: executing a heterogeneous workload with complex dependencies on a heterogeneous architecture is a very challenging problem that demands the development of effective scheduling algorithms. These will be confronted with possibly limited views of dependencies among tasks and multiple, and potentially conflicting objectives, such as minimizing the makespan, maximizing the locality of data or, where it applies, minimizing the memory consumption.
Given the wide availability of computing platforms equipped with accelerators and the numerical robustness of direct solution methods for sparse linear systems, it is reasonable to expect that the outcome of this project will have a considerable impact on both academic and industrial scientific computing. This project will moreover provide a substantial contribution to the computational science and high-performance computing communities, as it will deliver an unprecedented example of a complex numerical code whose parallelization completely relies on runtime scheduling systems and which is, therefore, extremely portable, maintainable and evolvable towards future computing architectures.
ANEMOS: Advanced Numeric for ELMs : Modeling and Optimized Schemes
Participants : Guillaume Latu, Pierre Ramet.
Dates: 2012 – 2016
Partners: Univ. Nice, CEA/IRFM, CNRS/MDS.
Overview: The main goal of the project is to make a significant progress in understanding of active control methods of plasma edge MHD instabilities Edge Localized Modes (ELMs) wich represent particular danger with respect to heat and particle loads for Plasma Facing Components (PFC) in ITER. The project is focused in particular on the numerical modelling study of such ELM control methods as Resonant Magnetic Perturbations (RMPs) and pellet ELM pacing both foreseen in ITER. The goals of the project are to improve understanding of the related physics and propose possible new strategies to improve effectiveness of ELM control techniques. The tool for the non-linear MHD modeling is the JOREK code which was essentially developed within previous ANR ASTER . JOREK will be largerly developed within the present project to include corresponding new physical models in conjunction with new developments in mathematics and computer science strategy. The present project will put the non-linear MHD modeling of ELMs and ELM control on the solid ground theoretically, computationally, and applications-wise in order to progress in urgently needed solutions for ITER.
Regarding our contributions, the JOREK code is mainly composed of numerical computations on 3D data. The toroidal dimension of the tokamak is treated in Fourier space, while the poloidal plane is decomposed in Bezier patches. The numerical scheme used involves a direct solver on a large sparse matrix as a main computation of one time step. Two main costs are clearly identified: the assembly of the sparse matrix, and the direct factorization and solve of the system that includes communications between all processors. The efficient parallelization of JOREK is one of our main goals, to do so we will reconsider: data distribution, computation distribution or GMRES implementation. The quality of the sparse solver is also crucial, both in term of performance and accuracy. In the current release of JOREK, the memory scaling is not satisfactory to solve problems listed above , since at present as one increases the number of processes for a given problem size, the memory footprint on each process does not reduce as much as one can expect. In order to access finer meshes on available supercomputers, memory savings have to be done in the whole code. Another key point for improving parallelization is to carefully profile the application to understand the regions of the code that do not scale well. Depending on the timings obtained, strategies to diminish communication overheads will be evaluated and schemes that improve load balancing will be initiated. JOREK uses PaStiX sparse matrix library for matrix inversion. However, large number of toroidal harmonics and particular thin structures to resolve for realistic plasma parameters and ITER machine size still require more aggressive optimisation in numeric dealing with numerical stability, adaptive meshes etc. However many possible applications of JOREK code we proposed here which represent urgent ITER relevant issues related to ELM control by RMPs and pellets remain to be solved.
DEDALES : Algebraic and geometric domain decomposition for subsurface/groundwater flows
Participants : Emmanuel Agullo, Mathieu Faverge, Luc Giraud, Louis Poirel.
Dates: 2014 – 2018
Partners: Inria EPI Pomdapi (leader); Université Paris 13 - Laboratoire Analyse, Géométrie et Applications; Maison de la Simulation; Andra.
Overview: Project DEDALES aims at developing high performance software for the simulation of two phase flow in porous media. The project will specifically target parallel computers where each node is itself composed of a large number of processing cores, such as are found in new generation many-core architectures. The project will be driven by an application to radioactive waste deep geological disposal. Its main feature is phenomenological complexity: water-gas flow in highly heterogeneous medium, with widely varying space and time scales. The assessment of large scale model is of major importance and issue for this application, and realistic geological models have several million grid cells. Few, if at all, software codes provide the necessary physical features with massively parallel simulation capabilities. The aim of the DEDALES project is to study, and experiment with, new approaches to develop effective simulation tools with the capability to take advantage of modern computer architectures and their hierarchical structure. To achieve this goal, we will explore two complementary software approaches that both match the hierarchical hardware architecture: on the one hand, we will integrate a hybrid parallel linear solver into an existing flow and transport code, and on the other hand, we will explore a two level approach with the outer level using (space time) domain decomposition, parallelized with a distributed memory approach, and the inner level as a subdomain solver that will exploit thread level parallelism. Linear solvers have always been, and will continue to be, at the center of simulation codes. However, parallelizing implicit methods on unstructured meshes, such as are required to accurately represent the fine geological details of the heterogeneous media considered, is notoriously difficult. It has also been suggested that time level parallelism could be a useful avenue to provide an extra degree of parallelism, so as to exploit the very large number of computing elements that will be part of these next generation computers. Project DEDALES will show that space-time DD methods can provide this extra level, and can usefully be combined with parallel linear solvers at the subdomain level. For all tasks, realistic test cases will be used to show the validity and the parallel scalability of the chosen approach. The most demanding models will be at the frontier of what is currently feasible for the size of models.
TECSER : Novel high performance numerical solution techniques for RCS computations
Participants : Emmanuel Agullo, Luc Giraud, Matthieu Kuhn.
Dates: 2014 – 2017
Partners: Inria EPI Nachos (leader), Corida, HiePACS; Airbus Group Innovations, Nucletudes.
Overview: the objective of the TECSER projet is to develop an innovative high performance numerical methodology for frequency-domain electromagnetics with applications to RCS (Radar Cross Section) calculation of complicated structures. This numerical methodology combines a high order hybridized DG method for the discretization of the frequency-domain Maxwell in heterogeneous media with a BEM (Boundary Element Method) discretization of an integral representation of Maxwell's equations in order to obtain the most accurate treatment of boundary truncation in the case of theoretically unbounded propagation domain. Beside, scalable hybrid iterative/direct domain decomposition based algorithms are used for the solution of the resulting algebraic system of equations.