## Section: New Results

### Application Domains

#### Dislocation dynamics simulations in material physics

This year we have focused on the hybrid parallelization of the OptiDis code.
As dislocations move in their grain, they expand, shrink, collide and annihilate, which means that we are facing a extremely dynamic n-body problem. Also, we have introduced an adaptive cache conscious data structure to manage the dislocation mesh. Moreover, two main kernels, plugged in our `ScalFMM` library, was built to handle the pairwise force interactions and the collisions between dislocations. Finally the code is written using hybrid parallelism based on OpenMP tasks inside on node and MPI to exchange data between nodes. The code can run on both shared and distributed memories.
Future works will mainly focus on tuning the code and manage dynamically this tuning to adapt to different kind of simulations and architectures.
On the physical side, we have introduced more *split node* cases to simulate irradiated materials. Now we are able to run simulations with tens of thousand of defaults in materials. Typically, our simulation box can hold lot of tiny dislocation loops such as those induced by radiation on materials, so we can observe how Frank-Read sources interact while they cross the field of loop defects.

This work is developed in the framework of Arnaud Etcheverry's PhD funded by the ANR OPTIDIS .

#### Co-design for scalable numerical algorithms in scientific applications

The study of the **thermo-acoustic stability of large combustion chambers** requires the solution of a nonlinear eigenvalue problem.
The nonlinear problem is linearized using a fixed point iteration procedure. This leads to a sequence of linear eigenproblems which must
be solved iteratively in order to obtain one nonlinear eigenpair.
Therefore, efficient and robust parallel eigensolvers for the solution of linear problems have been investigated, and strategies to accelerate
the solution of the sequence of linear eigenproblems have also been proposed.
Among the numerical techniques that have been considered (Krylov-Schur, Implicitly Restarted Arnoldi, Subspace iteration with Chebyshev acceleration)
the Jacobi-Davidson method was the best suited to be combined with techniques to recycle spectral information between the nonlinear iterations.
The robustness of the parallel numerical techniques were illustrated on large problems with a few millions unknowns solved on a few tens of cores.

These results are part of the outcome of Pablo Salas PhD thesis that has been defended on November 15th.

The **Time-domain Boundary Element Method** (TD-BEM) has not been widely study but represent an interesting alternative to its frequency counterpart. Usually based on inefficient Sparse Matrix Vector-product (SpMV), we investigate other approaches in order to increase the sequential flop-rate. We have implement extremely efficient operator using intrinsic SIMD or even ASM64 instructions.
We are using this novel approaches to parallelize both in shared and distributed memory and target execution on hundreds of clusters.
All the implementations should be in high quality in the Software Engineering sense since the resulting library is going to be used by industrial applications.

This work is developed in the framework of Bérenger Bramas's PhD and contributes to the EADS-ASTRIUM, Inria, Conseil Régional initiative.

In a preliminary work, a **3D Cartesian SN solver** `DOMINO` has been designed and implemented using two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. `DOMINO` is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, `DOMINO` can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. The very high Flops/Watt ratio of `DOMINO` makes it a very interesting building block for a future many-nodes nuclear simulation tool.

This work is developed in the framework of Salli Moustafa's PhD in collaboration with EDF. These contributions have been presented at the international conference on Supercomputing on Nuclear Applications [21] in Paris.

Concerning the numerical simulation of **the turbulence of plasma
particules inside a tokamak**, two software tools, providing a
post-mortem analysis, have been designed to
manage the memory optimization of `GYSELA` [20] .
The first one is a visualization tool. It plots the memory consumption of the
code along an execution. This tool helps the developer to localize where happens
the memory peak and to wonder how he can modify the code to decrease
it. On the same graphic, the names of the allocated structures are
labelled, which gives a significant hint on the modifications to apply.
The second tool concerns the prediction of the peak memory. Given an
input set of parameters, we can replay the allocations of the code in
an offline mode. With this tool, we can deduce accurately the value of
the memory peak and where it happens. Thank to this prediction we know
which size of mesh is possible under a given architecture.

This work is carried on in the framework of Fabien Rozar's PhD in collaboration with CEA Cadarache.

In the first part of our research work concerning the parallel
**aerodynamic code** `FLUSEPA` , an intermediate version based on the
previous one has been developped.
By using an hybrid OpenMP/MPI parallelism based on a domain decomposition,
we achieved a faster version of the code and the temporal adaptive
method used without bodies in relative motion has been tested
successfully for real complex 3D-cases using up to 400 cores.
Moreover, an asynchronous strategy for computing bodies in relative
motion and mesh intersections has been developed and the test of this
feature is currently in progress. The next step will be to design a
new fully asynchronous code based on a task graph description to be
executed on a modern runtine system like `StarPU` .

This work is carried on in the framework of Jean-Marie Couteyen's PhD in collaboration with Astrium Les Mureaux.