## Section: New Results

### Efficient algorithmic for load balancing and code coupling in complex simulations

#### Dynamic load balancing for massively parallel coupled codes

As a preliminary step related to the dynamic load balancing of coupled
codes, we focus on the problem of dynamic load balancing of a single
parallel code, with variable number of processors. Indeed, if the
workload varies drastically during the simulation, the load must be
redistributed regularly among the processors. Dynamic load balancing
is a well studied subject but most studies are limited to an initially
fixed number of processors. Adjusting the number of processors at
runtime allows to preserve the parallel code efficiency or to keep
running the simulation when the current memory resources are
exceeded. We call this problem, *MxN graph repartitioning*.
We propose some methods based on graph repartitioning in order to
rebalance the load while changing the number of processors. These
methods are split in two main steps. Firstly, we study the migration
phase and we build a “good” migration matrix minimizing several
metrics like the migration volume or the number of exchanged
messages. Secondly, we use graph partitioning heuristics to compute a
new distribution optimizing the migration according to the previous
step results. Besides, we propose a direct $k$-way partitioning
algorithm that allows us to improve our biased partitioning. Finally,
an experimental study validates our algorithms against
state-of-the-art partitioning tools.
Our algorithms are implemented in the `LBC2` library
and have been integrated in the partitioning
tools `Scotch` as a prototype.

This work is developed in the framework of Clément Vuchener's PhD, that will be defended on February 2014. These contributions have been presented at the international conference ParCo [22] in Munchen.

Regarding the problem of dynamic balancing of parallel coupled codes,
we start to reuse results on *MxN graph repartitioning*. Given two
coupled codes $A$ and $B$, the key idea is to develop an algorithm of
*two-graph co-partitioning*, that partitions two *coupled*
graphs ${G}_{A}$ and ${G}_{B}$ in respectively ${N}_{A}$ and ${N}_{B}$ with classic
objectives (*i.e.*, balancing computational load and minimizing
communication cost for each code) and that minimizes the number of
messages exchanged between codes in the coupling phase.

This work is developed in the framework of Maria Predari's PhD, that just started in october 2013.

#### Graph partitioning for hybrid solvers

Nested Dissection has been introduced by A. George and is a very popular heuristic for sparse matrix ordering before numerical factorization. It allows to maximize the number of parallel tasks, while reducing the fill-in and the operation count. The basic standard idea is to build a "small separator" $S$ of the graph associated with the matrix in order to split the remaining vertices in two parts ${P}_{0}$ and ${P}_{1}$ of "almost equal size". The vertices of the separator $S$ are ordered with the largest indices, and then the same method is applied recursively on the two sub-graphs induced by ${P}_{0}$ and ${P}_{1}$. At the end, if $k$ levels of recursion are done, we get ${2}^{k}$ sets of independents vertices separated from each other by ${2}^{k}-1$ separators.

However, if we examine precisely the complexity analysis for the estimation of asymptotic bounds for fill-in or operation count when using Nested Dissection ordering, we can notice that the size of the halo of the separated sub-graphs (set of external vertices belonging to an old separator and previously ordered) plays a crucial role in the asymptotic behavior achieved. In the perfect case, we need halo vertices to be balanced among parts.

Considering now hybrid methods mixing both direct and iterative solvers such as `HIPS` , `MaPHyS` ,
obtaining a domain decomposition leading to a good balancing of both the size of
domain interiors and the Scalable numerical schemes for scientific applications size of interfaces is a key point for load balancing and efficiency in a parallel context.
This leads to the same issue: balancing the halo vertices to get balanced interfaces.

For this purpose, we revisit the algorithm introduced by Lipton, Rose and Tarjan which performed the recursion of nested dissection in a different manner: at each level, we apply recursively the method to the sub-graphs But, for each sub-graph, we keep track of halo vertices. We have implemented that in the Scotch framework, and have studied its main algorithm to build a separator, called greedy graph growing.

This work is developed in the framework of Astrid Casadei's PhD. These contributions have been presented at the international workshop on Nested Dissection [32] in Waterloo.