Section: Research Program
High performance Fast Multipole Method for N-body problems
Participants : Emmanuel Agullo, Olivier Coulaud, Pierre Esterie, Guillaume Sylvand.
In most scientific computing applications considered nowadays as
computational challenges (like biological and material systems,
astrophysics or electromagnetism), the introduction of hierarchical
methods based on an octree structure has dramatically reduced the
amount of computation needed to simulate those systems for a given
accuracy. For instance, in the N-body problem arising from
these application fields, we must compute all pairwise
interactions among N objects (particles, lines, ...) at every
timestep. Among these methods, the Fast Multipole
Method (FMM) developed for gravitational potentials in astrophysics
and for electrostatic (coulombic) potentials in molecular simulations
solves this N-body problem for any given precision with
The potential field is decomposed in a near field part, directly computed, and a far field part approximated thanks to multipole and local expansions. We introduced a matrix formulation of the FMM that exploits the cache hierarchy on a processor through the Basic Linear Algebra Subprograms (BLAS). Moreover, we developed a parallel adaptive version of the FMM algorithm for heterogeneous particle distributions, which is very efficient on parallel clusters of SMP nodes. Finally on such computers, we developed the first hybrid MPI-thread algorithm, which enables to reach better parallel efficiency and better memory scalability. We plan to work on the following points in HiePACS .
Improvement of calculation efficiency
Nowadays, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. GPU (Graphics Processing Units) and the Cell processor have thus already been used in astrophysics and in molecular dynamics. The Fast Mutipole Method has also been implemented on GPU . We intend to examine the potential of using these forthcoming processors as a building block for high-end parallel computing in N-body calculations. More precisely, we want to take advantage of our specific underlying BLAS routines to obtain an efficient and easily portable FMM for these new architectures. Algorithmic issues such as dynamic load balancing among heterogeneous cores will also have to be solved in order to gather all the available computation power. This research action will be conduced on close connection with the activity described in Section 3.2.
Non uniform distributions
In many applications arising from material physics or astrophysics, the distribution of the data is highly non uniform and the data can grow between two time steps. As mentioned previously, we have proposed a hybrid MPI-thread algorithm to exploit the data locality within each node. We plan to further improve the load balancing for highly non uniform particle distributions with small computation grain thanks to dynamic load balancing at the thread level and thanks to a load balancing correction over several simulation time steps at the process level.
Fast multipole method for dislocation operators
The engine that we develop will be extended to new potentials arising
from material physics such as those used in dislocation
simulations. The interaction between dislocations is long ranged
(
Fast multipole method for boundary element methods
The boundary element method (BEM) is a well known
solution of boundary value problems appearing in various fields of
physics. With this approach, we only have to solve an integral
equation on the boundary. This implies an interaction that decreases in space, but results
in the solution of a dense linear system with