## Section: Research Program

### Efficient algorithmic for load balancing and code coupling in complex simulations

Participants : Astrid Casadei, Olivier Coulaud, Aurélien Esnard, Maria Predari, Pierre Ramet, Jean Roman, Clément Vuchener.

Many important physical phenomena in material physics and climatology are inherently complex applications. They often use multi-physics or multi-scale approaches, that couple different models and codes. The key idea is to reuse available legacy codes through a coupling framework instead of merging them into a standalone application. There is typically one model per different scale or physics; and each model is implemented by a parallel code. For instance, to model a crack propagation, one uses a molecular dynamic code to represent the atomistic scale and an elasticity code using a finite element method to represent the continuum scale. Indeed, fully microscopic simulations of most domains of interest are not computationally feasible. Combining such different scales or physics are still a challenge to reach high performance and scalability. If the model aspects are often well studied, there are several open algorithmic problems, that we plan to investigate in the HiePACS project-team.

#### Efficient schemes for multiscale simulations

As mentioned previously, many important physical phenomena, such as material deformation and failure (see Section 4.1 ), are inherently multiscale processes that cannot always be modeled via continuum model. Fully microscopic simulations of most domains of interest are not computationally feasible. Therefore, researchers must look at multiscale methods that couple micro models and macro models. Combining different scales such as quantum-atomistic or atomistic, mesoscale and continuum, are still a challenge to obtain efficient and accurate schemes that efficiently and effectively exchange information between the different scales. We are currently involved in two national research projects, that focus on multiscale schemes. More precisely, the models that we start to study are the quantum to atomic coupling (QM/MM coupling) in the ANR NOSSI and the atomic to dislocation coupling in the ANR OPTIDIS .

#### Dynamic load balancing for massively parallel coupled codes

In this context of code coupling, one crucial issue is undoubtedly the
load balancing of the whole coupled simulation that remains an open
question. The goal here is to find the best data distribution for the
whole coupled simulation and not only for each standalone code, as it
is most usually done. Indeed, the naive balancing of each code on its
own can lead to an important imbalance and to a communication
bottleneck during the coupling phase, that can drastically decrease
the overall performance. Therefore, one argues that it is required to
model the coupling itself in order to ensure a good scalability,
especially when running on massively parallel architectures (tens of
thousands of processors/cores). In other words, one must develop new
algorithms and software implementation to perform a *coupling-aware* partitioning of the whole application.

Another related problem is the problem of resource allocation. This is particularly important for the global coupling efficiency and scalabilty, because each code involved in the coupling can be more or less computationally intensive, and there is a good trade-off to find between resources assigned to each code to avoid that one of them wait for the other(s). And what happens if the load of one code dynamically changes relatively to the other? In such a case, it could be convenient to dynamically adapt the number of resources used at runtime.

For instance, the conjugate heat transfer simulation in complex geometries (as developed by the CFD team of CERFACS) requires to couple a fluid/convection solver (AVBP) with a solid/conduction solver (AVTP). The AVBP code is much more CPU consuming than the AVTP code. As a consequence, there is an important computational imbalance between the two solvers. The use of new algorithms to correctly load balance coupled simulations with enhanced graph partitioning techniques appears as a promising way to reach better performances of coupled application on massively parallel computers.

#### Graph partitioning for hybrid solvers

Graph handling and partitioning play a central role in the activity described here but also in other numerical techniques detailed in Section 3.3 .

The Nested Dissection is now a well-known heuristic for sparse matrix
ordering to both reduce the fill-in during numerical factorization and
to maximize the number of independent computation tasks. By using the
block data structure induced by the partition of separators of the
original graph, very efficient parallel block solvers have been
designed and implemented according to supernodal or multifrontal
approaches. Considering hybrid methods mixing both direct and
iterative solvers such as `HIPS` or `MaPHyS` , obtaining a domain
decomposition leading to a good balancing of both the size of domain
interiors and the size of interfaces is a key point for load balancing
and efficiency in a parallel context.
We intend to revisit some well-known graph partitioning techniques in
the light of the hybrid solvers and design new algorithms to be tested
in the `Scotch` package.