Section: Partnerships and Cooperations

National initiatives

NOSSI: New platform for parallel, hybrid quantum/classical simulations

Participants : Olivier Coulaud, Aurélien Esnard.

Grant: ANR 2007 – CIS

Dates: 2008 – 2011

Partners: CPMOH (Bordeaux, UMR 5098), DRIMM, IMPREM (leader of the project, Pau, UMR 5254), Institut Néel ( Grenoble, UPR2940)

Overview: Physicists, chemists and computer scientists join forces in this project to further design high performance numerical simulation of materials, by developing and deploying a new platform for parallel, hybrid quantum/classical simulations. The platform synthesizes established functions and performances of two major European codes, SIESTA and DL-POLY, with new techniques for the calculation of the excited states of materials, and a graphical user interface allowing steering, visualization and analysis of running, complex, parallel computer simulations.

The platform couples a novel, fast TDDFT (Time dependent density functional theory) route for calculating electronic spectra with electronic structure and molecular dynamics methods particularly well suited to simulation of the solid state and interfaces.

The software will be capable of calculating the electronic spectra of localized excited states in solids and at interfaces. Applications of the platform include hybrid organic-inorganic materials for sustainable development, such as photovoltaic materials, bio- and environmental sensors, photocatalytic decontamination of indoor air and stable, non-toxic pigments.

Web: http://nossi.gforge.inria.fr/index.html

OPTIDIS: OPTImisation d'un code de dynamique des DISlocations

Participants : Olivier Coulaud, Aurélien Esnard, Luc Giraud, Jean Roman.


Dates: 2010 – 2014

Partners: CEA/DEN/DMN/SRMA (leader), SIMaP Grenoble INP and ICMPE / Paris-Est.

Overview: Plastic deformation is mainly accommodated by dislocations glide in the case of crystalline materials. The behaviour of a single dislocation segment is perfectly understood since 1960 and analytical formulations are available in the literature. However, to understand the behaviour of a large population of dislocations (inducing complex dislocations interactions) and its effect on plastic deformation, massive numerical computation is necessary. Since 1990, simulation codes have been developed by French researchers. Among these codes, the code TRIDIS developed by the SIMAP laboratory in Grenoble is the pioneer dynamic dislocation code. In 2007, the project called NUMODIS had been set up as team collaboration between the SIMAP and the SRMA CEA Saclay in order to develop a new dynamics dislocation code using modern computer architecture and advanced numerical methods. The objective was to overcome the numerical and physical limits of the previous code TRIDIS. The version NUMODIS 1.0 came out in December 2009, which confirms the feasibility of the project. The project OPTIDIS is initiated when the code NUMODIS is mature enough to consider parallel computiation. The objective of the project in to develop and validate the algorithms in order to optimise the numerical and performance efficiencies of the NUMODIS code. We are aiming at developing a code able to tackle realistic material problems such as the interaction between dislocations and irradiation defects in a grain plastical deformation after irradiation. These kinds of studies where “local mechanisms" are correlated with macroscopic behaviour is a key issue for nuclear industry in order to understand material ageing under irradiation, and hence predict power plant secured service life. To carry out such studies, massive numerical optimisations of NUMODIS are required. They involve complex algorithms lying on advanced computational science methods. The project OPTIDIS will develop through joint collaborative studies involving researchers specialized in dynamics dislocations and in numerical methods. This project is divided in 8 tasks over 4 years. Two PhD thesis will be directly funded by the project. One will be dedicated to numerical development, validation of complex algorithms and comparison with the performance of existing dynamics dislocation codes. The objective of the second is to carry out large scale simulations to validate the performance of the numerical developments made in OPTIDIS. In both cases, these simulations will be compared with experimental data obtained by experimentalists.

RESCUE: RÉsilience des applications SCientifiqUEs

Participants : Emmanuel Agullo, Luc Giraud, Abdou Guermouche, Jean Roman, Mawussi Zounon.

Grant: ANR-Blanc (computer science theme)

Dates: 2010 – 2014

Partners: Inria EPI GRAAL (leader) and GRAND LARGE.

Overview: The advent of exascale machines will help solve new scientific challenges only if the resilience of large scientific applications deployed on these machines can be guaranteed. With 10,000,000 core processors, or more, the time interval between two consecutive failures is anticipated to be smaller than the typical duration of a checkpoint, i.e., the time needed to save all necessary application and system data. No actual progress can then be expected for a large-scale parallel application. Current fault-tolerant techniques and tools can no longer be used. The main objective of the Rescue project is to develop new algorithmic techniques and software tools to solve the exascale resilience problem. Solving this problem implies a departure from current approaches, and calls for yet-to-be-discovered algorithms, protocols and software tools.

This proposed research follows three main research thrusts. The first thrust deals with novel checkpoint protocols. This thrust will include the classification of relevant fault categories and the development of a software package for fault injection into application execution at runtime. The main research activity will be the design and development of scalable and light-weight checkpoint and migration protocols, with on-the-fly storing of key data, distributed but coordinated decisions, etc. These protocols will be validated via a prototype implementation integrated with the public-domain MPICH project. The second thrust entails the development of novel execution models, i.e., accurate stochastic models to predict (and, in turn, optimize) the expected performance (execution time or throughput) of large-scale parallel scientific applications. In the third thrust, we will develop novel parallel algorithms for scientific numerical kernels. We will profile a representative set of key large-scale applications to assess their resilience characteristics (e.g., identify specific patterns to reduce checkpoint overhead). We will also analyze execution trade-offs based on the replication of crucial kernels and on decentralized ABFT (Algorithm-Based Fault Tolerant) techniques. Finally, we will develop new numerical methods and robust algorithms that still converge in the presence of multiple failures. These algorithms will be implemented as part of a software prototype, which will be evaluated when confronted with realistic faults generated via our fault injection techniques.

We firmly believe that only the combination of these three thrusts (new checkpoint protocols, new execution models, and new parallel algorithms) can solve the exascale resilience problem. We hope to contribute to the solution of this critical problem by providing the community with new protocols, models and algorithms, as well as with a set of freely available public-domain software prototypes.

BOOST: Building the future Of numerical methOdS for iTer

Participants : Emmanuel Agullo, Mikko Byckling, Luc Giraud, Abdou Guermouche, Jean Roman.

Grant: ANR-Blanc (applied math theme)

Dates: 2010 – 2014

Partners: Institut de Mathématiques de Toulouse (coordinator); Laboratoire d'Analyse, Topologie, Probabilités in Marseilles; Institut de Recherche sur la Fusion Magnétique, CEAr/IRFM and Inria-HiePaCS

Overview: This project regards the study and the development of a new class of numerical methods to simulate natural or laboratory plasmas and in particular magnetic fusion processes. In this context, we aim in giving a contribution, from the mathematical, physical and algorithmic point of view, to the ITER project.

The core of this project consists in the development, the analysis, the implementation and the testing on real physical problems of the so-called Asymptotic-Preserving methods which allow simulations over a large range of scales with the same model and numerical method. These methods represent a breakthrough with respect to the state-of-the art. They will be developed specifically to handle the various challenges related to the simulation of the ITER plasma. In parallel with this class of methodologies, we intend to design appropriate coupling techniques between macroscopic and microscopic models for all the cases in which a net distinction between different regimes can be done. This will permit to describe different regimes in different regions of the machine with a strong gain in term of computational efficiency, without losing accuracy in the description of the problem. We will develop full 3-D solver for the asymptotic preserving fluid as well as kinetic model. The Asymptotic-Preserving (AP) numerical strategy allows us to perform numerical simulations with very large time and mesh steps and leads to impressive computational saving. These advantages will be combined with the utilization of the last generation preconditioned fast linear solvers to produce a software with very high performance for plasma simulation. For HiePACS   this project provides in particular a testbed for our expertise in parallel solution of large linear systems.