## Section: New Results

### Application Domains

#### Material physics

##### Molecular vibrational spectroscopy

Quantum chemistry eigenvalue problem is a big challenge in recent research. Here we are interested in solving eigenvalue problems coming from the molecular vibrational analysis. These problems are challenging because the size of the vibrational Hamiltonian matrix to be diagonalized is exponentially increasing with the size of the molecule we are studying. So, for molecules bigger than 10 atoms the actual existent algorithms suffer from a curse of dimensionality or computational time.

A new variational algorithm called adaptive vibrational configuration interaction (A-VCI) intended for the resolution of the vibrational Schrödinger equation was developed. The main advantage of this approach is to efficiently reduce the dimension of the active space generated into the configuration interaction (CI) process. Here, we assume that the Hamiltonian writes as a sum of products of operators. This adaptive algorithm was developed with the use of three correlated conditions i.e. a suitable starting space ; a criterion for convergence, and a procedure to expand the approximate space. The velocity of the algorithm was increased with the use of a posteriori error estimator (residue) to select the most relevant direction to increase the space. Two examples have been selected for benchmark. In the case of H${}_{2}$CO, we mainly study the performance of A-VCI algorithm: comparison with the variation-perturbation method, choice of the initial space, residual contributions. For CH${}_{3}$CN, we compare the A-VCI results with a computed reference spectrum using the same potential energy surface and for an active space reduced by about 90 %. This work was published in [9].

##### Dislocations

We have focused on the improvements in collision detection in the Optidis Code. Junction formation mechanisms are essential to characterize material behavior such as strain hardening and irradiation effects. Dislocations junctions appear when dislocation segments collide with each other, therefore, reliable collision detection algorithms must be used to detect an handle junction formations. Collision detection is also a very costly operation in dislocation dynamics simulations, and performance must be carefully optimized to allow massive simulations.

During the first year of this PhD thesis, new collision algorithms have been implemented for the Dislocation Dynamics code OptiDis. The aim was to allow fast and accurate collision detection between dislocation segments using hierarchical methods. The complexity to solve the N-body collision problem can be reduced to O(N) using spatial partitioning; computation can be accelerated using fast-reject techniques, and OpenMP parallelism. Finally, new collision handling algorithms for dislocations have been implemented to increase the reliability of the simulation.

#### Co-design for scalable numerical algorithms in scientific applications

##### Interior penalty discontinuous Galerkin method for coupled elasto-acoustic media

We introduce a high order interior penalty discontinuous Galerkin scheme for the nu- merical solution of wave propagation in coupled elasto-acoustic media. A displacement formulation is used, which allows for the solution of the acoustic and elastic wave equations within the same framework. Weakly imposing the correct transmission condition is achieved by the derivation of adapted numerical fluxes. This generalization does not weaken the discontinuous Galerkin method, thus $hp$-non-conforming meshes are supported. Interior penalty discontinuous Galerkin methods were originally developed for scalar equations. Therefore, we propose an optimized formulation for vectorial equations more suited than the straightforward standard transposition. We prove consis- tency and stability of the proposed schemes. To study the numerical accuracy and convergence, we achieve a classic plane wave analysis. Finally, we show the relevance of our method on numerical experiments.

More details on this work can be found in [47].

##### High performance simulation for ITER tokamak

Concerning the `GYSELA` global non-linear electrostatic code, the
efforts during the period have concentrated on the design of a more
efficient parallel gyro-average operator for the deployment of very
large (future) `GYSELA` runs.
The main unknown of the computation is a distribution function that
represents either the density of the guiding centers, either the density of the
particles in a tokamak. The switch between these two representations is
done thanks to the gyro-average operator.
In the previous version of `GYSELA`, the computation of this operator
was achieved thanks to a Padé approximation.
In order to improve the precision of the gyro-averaging, a new
parallel version based on an Hermite interpolation has been done (in
collaboration with the Inria TONUS project-team and IPP Garching).
The integration of this new implementation of the gyro-average operator
has been done in `GYSELA` and the parallel benchmarks have been
successful.
This work had been carried on in the framework of Fabien Rozar's PhD
in collaboration with CEA-IRFM (defended in November 2015) and is
continued in the PhD of Nicolas Bouzat funded by IPL C2S@Exa .
The scientific objectives of this new work will be first
to consolidate the parallel version of the gyro-average operator, in
particular by designing a scalable MPI+OpenMP parallel version and
using a new communication scheme, and second to design
new numerical methods for the gyro-average, source and collision
operators to deal with new physics in `GYSELA`. The objective is to
tackle kinetic electron configurations for more realistic complex large
simulations.

##### 3D aerodynamics for unsteady problems with bodies in relative motion

The first part of our research work concerning the parallel
aerodynamic code `FLUSEPA` has been to design an operational MPI+OpenMP
version based on a domain decomposition.
We achieved an efficient parallel version up to 400 cores and the
temporal adaptive method used without bodies in relative motion has
been tested successfully for complex 3D take-off blast wave
computations.
Moreover, an asynchronous strategy for computing bodies in relative
motion and mesh intersections has been developed and has been used for
3D stage separation cases. This first version is the current
industrial production version of `FLUSEPA` for Airbus Safran Launchers.

However, this intermediate version shows synchronization problems for the
aerodynamic solver due to the time integration used.
To tackle this issue, a task-based version over the runtime system
`StarPU` has been developed and evaluated.
Task generation functions have been designed in order to maximize
asynchronism during execution while respecting the data pattern
access of the code. This led to the re-factorization of the `FLUSEPA` computation kernels.
It's clearly a successful proof of concept as a task-based version is
now available for the aerodynamic solver and for both shared and
distributed memory. It uses three parallelism levels : MPI processes
between sub-domains, `StarPU` workers in shared memory (for
each sub-domain) themselves running OpenMP parallel tasks.
This version has been validated for large 3D take-off blast wave
computations (80 millions of cells) and is much more efficient than the
previous MPI+OpenMP version: we achieve a gain in computation time
equal to 70 % for 320 cores and to 50 % for 560 cores.
The next step will consist in extending the task-based version to the
motion and intersection operations.
This work has been carried on in the framework of Jean-Marie Couteyen's PhD
(defended in September 2016) in collaboration with Airbus Safran
Launchers ([2], [17]).