Section: New Results

Efficient algorithmics for code coupling in complex simulations

The performance of the coupled codes depends on how the data are well distributed on the processors. Generally, the data distributions of each code are built independently from each other to obtain the best load-balancing. But once the codes are coupled, the naive use of these decompositions can lead to important imbalance in the coupling area. Therefore, the modeling of the whole coupling is crucial to improve the performance and to ensure a good scalability. The goal is to find the best data distribution for the whole coupled codes and not only for each standalone code. The key idea is to use a graph/hypergraph model that will incorporate information about the coupling itself. Then, we propose new algorithms to perform a coupling-aware partitioning in order to improve the load-balancing of the whole coupled simulation.

Let us consider two coupled codes, modeled by two graphs (or hypergraphs) A and B, connected by inter-edges I(A,B) that represents the coupling communications between codes. Formally, the problem consists in partitioning A in M and B in N with accounting for I(A,B). This algorithm should optimize both the edge cut for each graph and the coupling communications while maintaining each graph balance. Our general strategy is divided in three main steps:

  1. first, we freely partition A in M to obtain the partition A/M;

  2. then, we projects this partition to B according to I(A,B), that provides the partition B/M;

  3. finally, we compute the partition B/N by repartitioning B from M existing parts into N.

The final repartitioning step is particularly tiedous, because it must handle a variable number of processes. However, as far as we know, the state-of-the-art graph/hypergraph repartitioning tools are limited to a fixed number of processes (i.e. M=N). To overcome this issue, we have proposed a new repartitioning algorithm – assuming the load is constant – based on hypergraph partitioning technics with fixed vertices. Our algorithm uses an optimal communication pattern, that we have proved to minimize the total number of messages between the former and newer parts. Experimental results validate our work comparing it with other approaches [20] . We currently investigate how to extend our algorithm for the dynamic load-balancing of parallel adaptive codes (A=B), whose load evolution is variable and difficult to predict. In this case, it would be convenient to dynamically adapt the number of processes used at runtime (MN), while minimizing migration cost during the repartitioning step. This work is currently conducted in the framework of Clément Vuchener PhD thesis.