Section: New Results

HPC Component Models and Runtimes

Participants : Thierry Gautier, Christian Perez, Laurent Turpin, Marie Durand, Philippe Virouleau.

Fine-Grained MPI+OpenMP Plasma Simulations: Communication Overlap with Dependent Tasks

In the article [15], we demonstrate how OpenMP 4.5 tasks can be used to efficiently overlap computations and MPI communications based on a case-study conducted on multi-core and many-core architectures. The paper focuses on task granularity, dependencies and priorities, and also identifies some limitations of OpenMP. Results on 64 Skylake nodes show that while 64% of the wall-clock time is spent in MPI communications, 60% of the cores are busy in computations, which is a good result. Indeed, the chosen dataset is small enough to be a challenging case in terms of overlap and thus useful to assess worst-case scenarios in future simulations. Two key features were identified: by using task priority we improved the performance by 5.7% (mainly due to an improved overlap), and with recursive tasks we shortened the execution time by 9.7%. We also illustrate the need to have access to tools for task tracing and task visualization. These tools allowed a fine understanding and a performance increase for this task-based OpenMP+MPI code.

Patches to LLVM compiler

We propose two source code patches to LLVM https://reviews.llvm.org/D63196 and  https://reviews.llvm.org/D67447 in order to improve performance of application using numerous fine grain tasks such as [15]. Patches were accepted in 2019.