ScAlApplix is a joint project of INRIA Futurs, LaBRI (Laboratoire Bordelais de Recherche en Informatique – CNRS UMR 5800, University of Bordeaux 1 and ENSEIRB) and IMB (Institut Mathématique
de Bordeaux – CNRS UMR 5251, Universities of Bordeaux 1 and Bordeaux 2). ScAlApplix has been created on the first of November, 2002 (
http://

The purpose of the
`ScAlApplix`project is to analyze and solve scientific computation problems arising from complex research and industrial applications and involving scaling. These applications are
characterized by the fact that they require enormous computing power, on the order of tens or hundreds of teraflops, and that they handle huge amounts of data. Solving these kinds of problems
requires a multidisciplinary approach concerning both applied mathematics and computer science. In applied mathematics, it is essentially the field of numerical schemes that is concerned. In
computer science, parallel computing and the design of high-performance codes to be executed on today's major computing platforms are concerned (parallel computers organized as a large network
of SMP nodes, production GRIDs).

Through this approach,
`ScAlApplix`intends to contribute to all steps in the line that goes from the design of new high-performance, more robust and more precise, numerical schemes to the optimized
implementation of algorithms and codes for the simulation of
*physical*(fluid mechanics, inert and reactive flows, multimaterial and multiphase flows),
*chemical*(molecular dynamic simulations) and
*environmental*(host-parasite systems) phenomena that are by nature multiscale and multiphysic.

Another domain we are currently investigating is the development of distributed environments for coupling numerical codes and for steering interactively numerical simulations. The computational steering is an effort to make the typical simulation work-flow (modeling, computing, analyzing) more efficient, by providing on-line visualization and interactive steering over the on-going computational processes. On-line visualization appears very useful to monitor and detect possible errors in long-running applications, and interactive steering allows the researcher to alter simulation parameters on-the-fly and immediately receive feedback on their effects. Thus, scientists gain a better insight in the simulation regarding to the cause-effect relationship and can better grasp the complexity of the underlying models.

`Scotch``5.0`has been released, including
`PT-Scotch`, the current state-of-the-art parallel sparse matrix ordering software.

The direct solver
`PaStiX`has been successfully used by CEA/CESTA to solve a symmetric complex sparse linear system arising from a 3D electromagnetism code with more than 45 million unknowns on the
TERA-10 CEA supercomputer. Solving this system required about
1.4Petaflops (in double precision) and the task was completed in about one hour on 1024 processors. To our knowledge, a system of this size and this
kind has never been solved by a direct solver.

`FluidBox`has been declared at the APP agency and the first very large scale simulations have been run; first calculations with very large meshes for the ADIGMA project (3 million
vertices) have been performed.

The LibMultiScale software package and library have been released as an open source software under the CECILL-C license. Multiscale simulations with millions of atoms have been performed for 3D wave and crack propagations.

A large number of industrial problems can be translated into fluid mechanics ones. They may be coupled with one or more physical models. An example is provided by aeroelastic problems, which have been studied in details by other INRIA teams. Another example is given by flows in pipelines where the fluid (a mixture of air–water–gas) has no very well-understood physical properties. One may also consider problems in aeroacoustics, which become more and more important in everyday life. In some occasions, one needs specific numerical tools because fluids have exotic equation of states, or because the amount of computation becomes huge, as for unsteady flows. Another situation where specific tools are needed is when one is interested in very specific quantities, such as the lift and drag of an airfoil, a situation where commercial tools can only provide a very crude answer.

It is a fact that there are many commercial codes. They allow users to consider some of these examples, but the quality of the solutions is far from being optimal. Moreover, the numerical tools of these codes are often not the most recent ones. An example is the noise generated by vortices crossing through a shock wave. It is, up to our knowledge, even out of reach of the most recent technologies because the numerical resources that would necessitate such simulations are tremendous ! In the same spirit, the simulation of a 3D compressible mixing layer in a complex geometry is also out of reach because very different temporal and physical scales need to be captured, thus we need to invent specific algorithms for that purpose.

In order to reach efficient simulation of complex physical problems, we are working on some fundamental aspects of the numerical analysis of non linear hyperbolic problems. Our goal is to develop schemes that can adapt to the modern computer architectures. More precisely, we are working on a class of numerical schemes specifically tuned for unstructured and hybrid meshes. They have the most possible compact stencil that is compatible with the expected order of accuracy. The order of accuracy typically ranges from two to four. Since the stencil is compact, the implementation on parallel machines becomes simple. The price to pay is that the scheme is necessarily implicit. However, the compactness of the scheme enables to use the high performance parallel linear algebra tools developed by the team for the lowest order version of these schemes. The high order versions of these schemes, that are still under development, will lead to new scientific problems at the border between numerical analysis and computer science. In parallel to these fundamental aspects, we also work on adapting more classical numerical tools to complex physical problems such as those encountered in interface flows, turbulent or multiphase flows.

Within a few years, we expect to be able to consider the physical problems that are now difficult to compute thanks to the know–how coming from our research on compact distribution schemes and the daily discussions with specialists of computer science and scientific computing. These problems range from aeroacoustic to multiphysic problems, such as the one mentionned above. We also have interest in solving compressible MHD problems in relation with the ITER project. Because of the existence of a magnetic field and the type of solutions we are seeking for, this leads to additional scientific challenges. Our research work about numerical algorithms has led to software FluidBox that is described in section . This work is supported by the EU-Strep ADIGMA, various research contracts and in part by the ANR-CIS ASTER project (see section also).

The large scale applications which we study are built in order to capture complex phenomena such as magnetohydrodynamic instabilities, cracks and so on. Such phenomena may be very localized in the domain of simulation and a full modeling or discretization at the finest level require several tens or hundreds teraflops of computation using several terabyte of data. Many of these phenomena are multiscale both in time and in space. To capture them efficiently, we need to introduce multiresolution schemes as adaptive mesh refinement, wavelet methods or even multiscale models when the current model is no more valid at the finest level of discretization.

Adaptive techniques based on a hierarchical decomposition could lead to a dramatic reduction of computation and memory costs in numerical simulation. It turns out that adaptive numerical methods are often difficult to parallelize, because they introduce dependencies between data at different grid levels and it is then difficult to manage data locality. Works have to be done in order to build efficient adaptive parallel methods. For exemple, in an interpolet framework, a detail is defined as the difference between the real value of the function of interest and an interpolated value. We have a natural criterion of compression since the detail is small where the function is regular. Compression of a wavelet decomposition is performed by removing grid points with detail smaller than a given threshold. One could write numerical schemes in which only the significant details are computed and other are skipped in order to reduce both memory usage and computation cost. Parallel simulations that use interpolets have been performed in our team. It allows us to perform simulation on very large grid sizes that are not accessible with non-adaptive simulators. The main challenge is to design parallel numerical schemes that does not lead to large overheads in term of: load balancing, sparsity management, communication between processors.

In the context of ASTER project, the study of MHD instabilities is increasingly complex in the sense that they require some simulations at a very large scale; several tens or hundreds teraflops of computation using several terabyte of data are needed, and they use multiphysic or multiscale modelling giving place to fine couplings. To carry out these more and more accurate full-scale simulations without increasing the number of unknowns in a uniform way, the AMR technique consists in handling a finer grid where the solution varies abruptly and a coarser grid at other places accordingly to quite specific criteria. These new finer grids are regarded as patches to the initial grid. One can thus obtain a hierarchy of grids by repeating this procedure.

This approach is broadly used when explicit numerical schemes are considered. In the case of numerical implicit schemes (as implemented in
`JOREK`and
`FluidBox`), the difficulty is to take into account the additional unknowns due to the hierarchy of grids in the modified linear system that must be solved at each time step of the
simulation.

A specific study of the error estimators adapted to the problem of MHD instabilities needs to be carried out, in particular in connection with the discretization with cubic Hermite
(Bezier) finite elements used in
`JOREK`.

In the context of high performance computing, we have to study within a parallel framework the principal difficulties encountered in these methods. The efficiency will therefore require the use of dynamic load balancing methods over the set of processors to compensate the variability of the grid during the simulation.

Due to the increase of available computer power, new applications such as failure material simulations like crack propagation are now commonly performed by physicists. These computations simulate systems up to billion of atoms in materials, for large time scales up to several nanoseconds. The larger the simulation is, the smaller the computational cost of the potential driving the phenomena is, resulting in low precision results. Moreover, full simulations at the finest level are not computationally feasible on the whole materials. Most of the time, the finest level is only necessary where the phenomena of interest occurs, for example, in a crack propagation simulation far from the tip, we have a macroscopic behavior of the material and then we can use a coarser model. The idea is to limit the more expensive level simulation to a subset of the domain and to combine it with a macroscopic level. But combining quantum and atomistic or atomistic and continuum simulations are still a challenge to obtain a robust model, a good scheme and an efficient implementation.

We are currently focusing on several aspects of these problems

The difficulty is to build an efficient scheme which couples the two different scales without any loss of precision and without introducing any numerical artifacts like wave reflections. We are working to improve our reduction operator in order to obtain a more robust one with respect to the size of the mesh and the frequency of atomistic waves.

The algorithm we have developed is valid for crystals at zero Kelvin but heat flux is very important for realistic simulations. The main questions for us are what kind of model must be considered at the coarser level, how we transfer atomic kinetic energy in the continuum model.

Many of such couplings are realized by putting together different codes. A main problem is to define a coupler that can efficiently exploit the parallelism of the underlying codes. In this context, we focus on the parallel algorithmic of the coupler such as data redistribution, how we can conserve the efficiency of each code in the distributed application and obtain a good efficiency for the global application.

Solving large sparse systems
Ax=
bof linear equations is a crucial and time-consuming step, arising in many scientific and engineering applications. Consequently, many parallel techniques for sparse
matrix factorization have been studied and implemented.

Sparse direct solvers are mandatory when the linear system is very ill-conditioned; such a situation is often encountered in structural mechanic code for example. Therefore, to obtain an industrial software tool that must be robust and versatile, high-performance sparse direct solvers are mandatory, and parallelism is then necessary for reasons of memory capabilities and acceptable solving time. Moreover, in order to solve efficiently 3D problems with more than 50 million of unkowns, which is now a reachable challenge with new SMP supercomputers (see section ), we must achieve a good time scalability and control memory overhead. Solving a sparse linear system by a direct method is generally a highly irregular problem that induces some challenging algorithmic problems and requires a sophistical implementation scheme in order to fully exploit the modern supercomputer capabilities.

In the
`ScAlApplix`project, we focused first on the block partitioning and scheduling problem for high performance sparse
LDL^{T}or
LL^{T}parallel factorization without dynamic pivoting for large sparse symmetric positive definite systems. Our strategy is suitable for non-symmetric sparse matrices with symmetric pattern,
and for general distributed heterogeneous architectures whose computation and communication performances are predictable in advance.

Research about high performance sparse direct solvers is carried on in collaboration with P. Amestoy (ENSEEIHT – IRIT) and J.-Y. L'Excellent (INRIA Rhône-Alpes), and has led to software developments (see section , , ) and to industrial contracts with CEA (Commissariat à l'Energie Atomique). Those works are supported by the ANR-CIS project “SOLSTICE”.

In addition to the project activities on direct solvers, we also study some robust preconditioning algorithms for iterative methods. The goal of these studies is to overcome the huge memory consumption inherent to the direct solvers in order to solve 3D dimensional problems of huge size (several million of unknowns). Our studies focus on the building of generic parallel preconditioners based on ILU factorizations. The classical ILU preconditioners use scalar algorithms that do not exploit well the CPU power and are difficult to parallelize. Our work aims at finding some unknown orderings and partitionings that lead to a dense block structure of the incomplete factors. Then, based on the block pattern, some efficient parallel blockwise algorithms can be devised to build robust preconditioners that are also able to exploit the full capabilities of the modern high-performance computers.

We study two approaches:

the first approach consists in building block ILU(k) preconditioners. The main idea is to adapt the classical ILU(k) factorization in order to reuse the algorithmic
ingredients that have been developed for direct methods. In this case, the ordering we use is the same than in the direct factorization and a dense block pattern (i.e. a partition of the
unknowns) is obtained using an algorithm that lumps columns having few differences in their non zeros pattern. We have adapted the parallel direct solver chain in order to deal with the
incomplete block factors defined by this process. Thus the preconditioner computation benefits from the breakthroughts made by the direct solver techniques studied in
`PaStiX`(sections
and
).

the second approach we recently developed is based on the Schur complement approach. In this case, we use a partition of the adjacency graph of the system matrix into a
set of small subdomains with overlap. The interior of these subdomains are treated by a direct method. Solving the whole system is then equivalent to solve the Schur complement system on
the interface between the subdomains (this system has a much smaller dimension). We use the hierarchical interface decomposition (HID) that has been developed in
`PHIDAL`to reorder and partition this system. Indeed, the HID gives a natural dense block structure of the Schur complement. Based on this partition, we define some efficient block
preconditioners that allow the use of BLAS routines and a high degree of parallelism thanks to the HID properties. All these algorithms are implemented in a new library named
`HIPS`.
`HIPS`contains the
`PHIDAL`library (HID ordering and partitioning) and proposes some extensions and new algorithms (multilevel functionalities, hybrid direct-iterative approach) to the former
`PHIDAL`library. Details can be found in sections
and
.

Those works are also supported by the ANR-CIS project “SOLSTICE”.

In most of scientific computing applications considered nowadays as computational challenges like biological systems, astrophysic or electromagnetism, the introduction of hierarchical methods based on an octree structure has dramatically reduced the amount of computation needed to simulate those systems for a given error tolerance.

Among these methods, the Fast Multipole Method (FMM) allows the computation of the interactions in, for example, a molecular dynamics system of
Nparticles in
O(
N)time, against
O(
N^{2})with a direct approach. The extension of these methods and their efficient implementation on current parallel architectures is still a critical issue. Moreover the use of
periodic boundary conditions, or of duplications of the system in 2 out of 3 space dimensions, just as well as the use of higher approximations for integral equations are also still
relevant.

They can be used in the three (quantum, molecular and continuum) models for atom-atom interactions in quantum or molecular mechanics, atom-surface interactions for the coupling between
continuum and the other models, and also for fast matrix-vector products in the iterative solving of the linear system given by the integral formulation of the continuum method. Moreover, the
significant experience achieved by the
`Scotch`and
`PaStiX`projects (see section
and
) will be useful in order to develop efficient implementations of the FMM methods on parallel clusters of SMP nodes.

Thanks to the constant evolution of computational capacity, numerical simulations are becoming more and more complex; it is common to couple different models in different distributed codes running on supercomputer or heterogeneous grids. Nowadays, such simulations are typically running in batch mode, and the analysis of the results is then performed on a local workstation as a post-processing step, which implies to preliminary collect all the simulation output files. In most research fields, the post-processing step is realized thanks to 3D scientific visualization techniques. In the batch approach, there is a lack of control over the in-progress computations, and it might drastically decrease the profitability of the computational resources (repeated tests with different input files separated by excessively long waiting periods).

For years, the scientific computing community has expressed the need of new computational steering tools. The computational steering is an alternative approach to the typical
simulation work-flow of performing computation and visualization sequentially. It mainly consists in coupling a remote simulation with a graphics system through the network in order to provide
scientists with on-line visualization and interactive steering. On-line visualization appears very useful to monitor the evolution of the simulation by rendering the current results. It also
allows us to validate the simulation codes and to detect conceptual or programming errors before the completion of a long-running application. Interactive steering allows the researcher to
change the parameters of the simulation without stopping it. As the on-line visualization provides an immediate visual feedback on the effect of a parameter change, the scientist gains
additional insight in the simulation, regarding to the cause-effect relationship. Such a tool might help the scientist to better grasp the complexity of the underlying models and to drive more
rapidly the simulation
*into the right direction*. Basically, a computational steering environment can be defined as a communication infrastructure, coupling a simulation with a remote user interface, called
steering system. This interface usually provides on-line visualization and user interaction.

Over the last decade, many steering environments have been developed. They distinguish themselves by their simulation integration process. A first solution is the problem solving environment (PSE) approach, like in SCIRun. This approach allows the scientist to construct a steering application according to a visual programming model. As an opposite, the majority of the steering environments, such as the well-known CUMULVS, are based on the annotation of the application source-code. This latter approach allows fine grain steering functionalities and can achieve better run-time performances.

Even though most of existing computational steering environments, such as CUMULVS, DAQV or gViz support parallel simulations, they are limited to sequential visualization systems. This leads
to an important bottleneck and increased rendering time. In the gViz project, the IRIS Explorer visualization system has been extended to run the different modules (simulation, visualization,
rendering) in a distributed fashion on the Grid, but the visualization and the rendering modules are still sequential. Recent works in the Uintah PSE (Problem Solving Environment) has addressed
the problem of massively parallel computation connected to a remote parallel visualization module, but this latter module is only running on a shared-memory machine. Therefore, it would be
particularly
*valuable*for the scientist if a steering environment would be able to perform parallel visualization using a PC-based graphics cluster.

In the
`EPSN`project, we intend to explore this latter purpose, called
*M
×N computational steering*. More precisely, the
`EPSN`environment (Environment for the Steering of Parallel Numerical Simulations, see section
) enables to interconnect parallel simulations with visualization systems, that can be parallel as well. In other words, we
want to provide a framework that can benefit from immersive virtual reality technology (e.g. tiled display wall) and that might help scientists to better grasp the complexity of real-life
simulations. Such a coupling between parallel numerical simulations and parallel visualization systems raises crucial issues we investigate in the
`EPSN`project:

**Parallel coordination of steering operations.**The on-line visualization and the computational steering of parallel simulations come up against a serious coherence problem. Indeed, data
distributed over parallel processes must be accessed carefully to ensure they are presented to the visualization system in a meaningful way. To overcome this problem, we introduce a
Hierarchical Task Model (HTM) that represents the control-flow of the simulation, too often considered in other approaches as a “single-loop” program. Thanks to this representation, we schedule
in parallel the user interaction on the simulation processes and satisfy the temporal coherence.

**Parallel data redistribution.**The efficient redistribution of data is a crucial issue to achieve an efficient coupling between simulation and visualization codes. Most of the previous
works have limited their studies to structured data with regular distributions (e.g dense arrays with simple block-cyclic distributions). However, computational steering applications (or
multiphysic simulations) frequently handle more irregular data, like particle sets, unstructured meshes or hierarchical grids. In such a context, the data transfer involves to switch from one
distribution to another, and thus to “redistribute” the data. For this, we introduce a description model of distributed data, based on the notion of “complex object”. Thanks to this model, we
propose redistribution algorithms that generate symbolic messages, independently from a particular communication layer. We distinguish two main approaches: the spatial approach and the
placement approach.

The main objective of the
`ScAlApplix`project is to analyze and solve scientific computing problems coming from complex research and industrial applications and involving scaling. This allows us to validate the
numerical schemes, the algorithms and the associated softwares that we develop. We have today four reference application domains which are fluid mechanics, material mechanic, host-parasite
systems in population dynamics and the MHD simulation dedicated to the ITER project. In these four domains, we study and simulate phenomena that are by nature multiscale and multiphysic, and
that requires enormous computing power. A major part of these works leads to industrial collaborations in particular with the CNES, ONERA, and with the french CEA/CESTA, CEA/Ile-de-France and
CEA/Cadarache centers.

The numerical simulation of steady and unsteady flows is still a challenge since efficient schemes and efficient implementations are needed. The accuracy of schemes is still a problem nowadays. This challenge is even higher if large size problems are considered, and if the meshes are not regular.

Among the problems to be considered, one may list the computation of mixing layers, shock–vortices interactions, the noise generated by a flow. This last item clearly needs very high order schemes, and the today best schemes use regular structured meshes. Hence, one of our objectives is to construct very high order schemes for unstructured meshes.

Another example where large computer resources are needed is the simulation of multiphase flows. In that case, several difficulties must be faced: unsteady flows, complex geometries and a very complex physical model. A last example is related to unsteady MHD flow problems related to the ITER project. In addition to “standard” difficulties arrising in flow simulations, we have to take into account the magnetic field that satisfies a geometrical constraint. Mesh adaption is also essential for the instability problems we consider. This is done in collaboration with physiscist at CEA Cadarache via the ANR CIS 2006 ASTER project.

Due to the increase of available computer power, new applications such as study of properties of new materials (photovoltaic materials, bio- and environmental sensors,...) failure material simulations, nano-indentation are now commonly performed by chemists.

These computations simulate systems up to billion of atoms in materials, for large time scales up to several nanoseconds. The larger the simulation is, the smaller the computational cost of the potential driving the phenomena is, resulting in low precision results. Moreover, full simulations at the finest level are not computationally feasible on the whole materials. Most of the time, the finest level is only necessary where the phenomena of interest occurs for example in a crack propagation simulation far from the tip, we have a macroscopic beahavior of the material and then we can use a coarser model. The idea is to limit the more expensive level simulation to a subset of the domain and to combine it with a macroscopic level. This implies that atomistic simulations must be speeded up by several orders of magnitude.

The study of crack propagation understanding is important in many area like lens for laser, ... but also to understand how the crack appears. To do this, the main difficulty is to take into account the temperature in the model.

In population dynamics, systems can present very complex behaviors and can be difficult to analyse from a purely mathematical point of view. The aim of this interdisciplinary project was to develop numerical tools for population dynamics models arising in modelling complex heterogeneous host-parasite systems. Our main goals are: understanding the impact of a host population structure on a parasite population dynamics, developing accurate numerical simulations using parallelization, designing prophylactic methods. For many host-parasite systems different time scales between the host population (e.g. a one year period) and the virus (e.g. an infected host dies with a few weeks) require a small time step. Numerical schemes of the resulting nonlinear epidemiological model in spatially heterogeneous environment are complex to perform and reliable numerical results become difficult to get when the size of the spatial domain is increasing. In addition, many input parameters (biological and environmental factors) are taken into account to compare results of simulations and observations from field studies. Therefore, a realistic simulator has a significant computation cost and parallelization is required. This work is a collaborative effort in an interdisciplinary approach between population dynamics, mathematics and computer science.

Numerical simulation has become a major tool for the study of many physical phenomena involving charged particles, in particular beam physics, space and laboratory plasmas including fusion plasmas. Moreover, it is a subject of interest to figure out and optimize physics experiments in the present fusion devices and also to design future reactors like in the ITER project. Parallelism is required to carry on numerical simulations on realistic test cases.

On this topic, we have a strong collaboration with the physicists of the CEA/DRFC group. Moreover, we worked with the INRIA CALVI team to bring our high performance skills to some of their numerical simulators. The study of plasma turbulence requires, for example, to solve the Maxwell equations coupled to the calculation of the plasma response to the perturbed electromagnetic field. This response can be computed by using either a fluid or a kinetic description of the plasma. We have contributed to kinetic (in solving the Vlasov equation) and fluid (through the study of MHD models) simulators.

We have established an other collaboration with the physicists of the CEA/DRFC group in the context of the ANR CIS 2006 project called ASTER (Adaptive mhd Simulation of Tokamak Elms for
iteR). The magnetohydrodynamic instability called ELM for Edge Localized Mode is commonly observed in the standard tokamak operating scenario. The energy losses the ELM will induce in ITER
plasmas are a real concern. However, the current understanding of what sets the size of these ELM induced energy losses is extremely limited. No numerical simulations of the complete ELM
instability, from its onset through its non-linear phase and its decay, are referenced in literature. Recently, encouraging results on the simulation of an ELM cycle have been obtained with the
`JOREK`code developed at CEA but at reduced toroidal resolution. The
`JOREK`code uses a fully implicit time evolution scheme in conjunction with the
`PaStiX`sparse matrix library.

We develop two kinds of software. The first one consists in generic libraries that are used in applications. We work on a (parallel) partitioner for large irregular graphs or meshes (
`Scotch`), on high performance direct or hybrid solvers for very large sparse systems of equations (
`MUMPS`,
`PaStiX`,
`HIPS`). The second kind of software corresponds to dedicated softwares for fluid mechanics (
`FluidBox`), and to a platform for computational steering (
`EPSN`). For these parallel software developments, we use the message passing (MPI) paradigm, the OpenMP programming language, threads, and the Java and/or CORBA technologies.

The
`EPSN`project has been partially supported by both the ACI-GRID program from the french Ministry of Research (grant number PPL02-03), the ARC
`RedGRID`, and is now supported by the ANR program called MASSIM (grant number ANR-05-MMSA-0008-03).

`EPSN`is a distributed computational steering environment which allows the steering of remote parallel simulations with sequential or parallel visualization tools or graphics user
interface. It is a distributed environment based on a simple client/server relationship between user interfaces (clients) and simulations (servers). The user interfaces can dynamically be
connected to or disconnected from the simulation during its execution. Once a client is connected, it interacts with the simulation component through an asynchronous and concurrent request
system. We distinguish three kinds of steering request. Firstly, the "control" requests (play, step, stop) allow to steer the execution flow of the simulation. Secondly, the "data access"
requests (get, put) allow to read/write parameters and data from the memory of the remote simulation. Finally, the "action" requests enable to invoke user-defined routines in the simulation. In
order to make a legacy simulation steerable, the end-user annotates its simulation source-code with the
`EPSN` API. These annotations provide the
`EPSN`environment with two kinds of information: the description of the program structure according to a Hierarchical Task Model (HTM) and the description of the distributed data that
will be accessible by the remote clients.

Concerning the development of client applications, we also provide a front-end API that enables to integrate
`EPSN`in a high-level visualization system such as
*AVS/Express*,
*VTK*or
*ParaView*. We also provide a lightweight user interface, called
*SiMonE*(Simulation Monitoring for EPSN), that enables to easily connect any simulations and interact with them, by controlling the computational flow, viewing the current parameters or
data on a simple data-sheet and modifying them optionally.
*SiMonE*also includes simple visualization plug-ins to online display the intermediate results. Moreover, the
`EPSN`framework offers the ability to exploit parallel visualization and rendering techniques thanks to the Visualization ToolKit (VTK). This approach allows to reduce the steering
overhead of the
`EPSN`platform and allows to process efficiently large dataset. To visualize with high resolution image and to improve the rendering time,
`EPSN`can also exploit tiled-display wall based on ICE-T library developed at Sandia Laboratory.

As both the simulation and the visualization can be parallel applications,
`EPSN`is based on the M
×N redistribution library called
`RedGRID`. This library is in charge of computing all the messages that will be exchanged between the two parallel components, and is also in charge of performing the data transfer in
parallel. Thus,
`RedGRID`is able to aggregate the bandwidth and to achieve high performance. Moreover, it is designed to consider a wide variety of distributed data structures usually found in the
numerical simulations, such as structured grids, pints or unstructured meshes.

Both
`EPSN`and
`RedGRID`use a communication infrastructure based on CORBA which provides our platform with portability, interoperability and network transparency.

The current version of
`EPSN`and
`RedGRID`libraries are now available at INRIA Gforge :

`RedGRID` :
http://

`FluidBox` is a software dedicated to the simulation of inert or reactive flows . It is also able to simulate multiphase, multimaterial and MHD flows. There exist 2D and 3D
dimensional versions. The 2D version is used to test new ideas that are later implemented in the 3D one. Two classes of schemes have been implemented: classical finite volume schemes and the
more recent residual distribution schemes. Several low Mach preconditioning techniques are also implemented. The code has been parallelized with and without overlap of the domains. Recently,
the
`PaStiX`solver has been integrated in
`FluidBox`.
`FluidBox`has also been coupled with the EPSN plateform. A partitioning tool exists in the package, it uses
`Scotch`.
`FluidBox`has also benefited from many software and functionality improvements from Rémi Butel (IMB); up to now, it is only a private project, but we expect to open some part of the code
to public before the end of the year. In order to facilitate the project development,
`FluidBox`has been uploaded to the INRIA/Gforge page.

In the context of
`PARASOL`(Esprit IV Long Term Project, 1996-1999), CERFACS and ENSEEIHT-IRIT teams have initiated a parallel sparse solver
`MUMPS`(“MUltifrontal Massively Parallel Solver”). Since the first public release of
`MUMPS`(March 2000), this research and (also software) project is the context of a tight and fruitful collaboration with J. Y. L'Excellent (INRIA-LIP-ENS Lyon) and the INRIA project
GRAAL. Recent work related to performance scalability, preprocessing of both symmetric and unsymmetric matrices, two by two pivots for symmetric indefinite matrices, and dynamic scheduling has
been incorporated in the new improved version of the package (release 4.5.5 available since october 2005 at
http://

`MUMPS`is a package for solving linear systems of equations
Ax=
b, where the matrix
Ais sparse and can be either unsymmetric, symmetric positive definite, or general symmetric. It uses a multifrontal technique which is a direct method based on either the
LUor the
LDL^{T}factorization of the matrix. The main features of the
`MUMPS`package include numerical pivoting during factorization, solution of the transposed system, input of the matrix in assembled format (distributed or centralized) or elemental
format, error analysis, iterative refinement, scaling of the original matrix, and return of a Schur complement matrix. It also offers several built-in ordering algorithms, a tight interface to
some external ordering packages such as
`Scotch`and is available in various arithmetics (real or complex, single or double).

This year, we particularly worked on an out-of-core functionality, where computed factors are written to disk with a panel oriented scheme (i.e. each factor block is written panel by panel to improve overlapping between computation and I/O operations). It has been made available to some of our users, in order to get their feedback before making this functionality more widely available. We have also been working on reducing the size of communication buffers used during the factorization step by allowing the splitting of large messages into a sequence of small ones. This new functionality allows for more flexibility in both in-core and out-of-core contexts.

This work is supported by the French “Commissariat à l'Energie Atomique CEA/CESTA” in the context of structural mechanics and electromagnetism applications.

`PaStiX`(
http://
`FluidBox`(see section
). The
`PaStiX`library is planed to be released this year under INRIA CeCILL licence.

The
`PaStiX`library uses the graph partitioning and sparse matrix block ordering package
`Scotch`(see section
).
`PaStiX`is based on an efficient static scheduling and memory manager, in order to solve 3D problems with more than 50 million of unknowns. The mapping and scheduling algorithm handles a
combination of 1D and 2D block distributions. This algorithm computes an efficient static scheduling of the block computations for our supernodal parallel solver which uses a local aggregation
of contribution blocks. This can be done by taking into account very precisely the computational costs of the BLAS 3 primitives, the communication costs and the cost of local aggregations.
We also improved this static computation and communication scheduling algorithm to anticipate the sending of partially aggregated blocks, in order to free memory dynamically. By doing this, we
are able to reduce dramatically the aggregated memory overhead, while keeping good performances.

Another important point is that our study is suitable for any heterogeneous parallel/distributed architecture when its performances are predictable, such as clusters of SMP nodes. In particular, we propose now a high performance version with a low memory overhead for SMP node architectures, which fully exploits shared memory advantages by using an hybrid MPI-thread implementation.

Direct methods are numerically robust methods, but the very large three dimensional problems may lead to systems that would require a huge amount of memory despite any memory optimization. A studied approach consists to define an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers. Such incomplete factorization can take advantage of the latest breakthroughts in sparse direct methods and particularly should be very competitive in CPU time (effective power used from processors and good scalability) while avoiding the memory limitation encountered by direct methods.

`HIPS`(Hierarchical Iterative Parallel Solver) is a scientific library that provides an efficient parallel iterative solver for very large sparse linear systems.

`HIPS`has been built on top of the
`PHIDAL`library; it is based on the ordering and partitioning that were developed in
`PHIDAL`and it proposes some new hybrid direct iterative algorithms.

The keypoint of the methods implemented in
`HIPS`is to define an ordering and a partition of the unknowns that relies on a form of nested dissection ordering in which cross points in the separators play a special role
(Hierarchical Interface Decomposition ordering). The subgraphs obtained by the nested dissection correspond to the unknowns that are eliminated using a direct method and the Schur complement
system on the remaining of the unknowns (that correspond to the interface between the subgraphs viewed as subdomains) is solved using an iterative method (GMRES or Conjugate Gradient at the
time being).

This special ordering and partitioning allow the use of dense block algorithms both in the direct and iterative part of the solver and provides a high degree of parallelism to these algorithms.

We propose several algorithmic variants to solve the Schur complement system that can be adapted to the geometry of the problem: typically some strategies are more suitable for systems
coming from a 2D problem discretization and others for a 3D problem; the choice of the method also depends on the numerical difficulty of the problem. Thus
`HIPS`is a generic library that provides several methods to build an efficient preconditioner in many of these situations. It handles symmetric, unsymmetric, real or complex matrices. It
also provides the scalar preconditioner based on the multistage ILUT factorization that were developed in
`PHIDAL`.
`HIPS`has been uploaded as a private project on InriaGForge and a first release is planed during the year 2008.

The multiscale coupling methods are a powerful tool to study the local phenomenon that occurs at the atomic scale. Such methods are generally employed to study crack propagations for example, dislocations or nano indentations. With these approaches, one couples an atomic description which models the material at the finest scale - such as molecular dynamics - with a macroscopic model of continuum mechanics. The use of a macroscopic model allows to reduce considerably the number of unknowns to handle. Moreover, it is easier to apply force fields or complex boundary conditions to macroscopic models. Thus, this type of coupling can help to provide complex boundary conditions to the molecular dynamics domain.

The LibMultiScale is a C++ parallel framework for the multiscale coupling methods dedicated to material simulations. This framework is designed with the form of a library providing an API which makes it possible to program coupled simulations. At the present time, the stable implemented coupling method is based on the bridging method from T. Belytschko and S. Xiao.

The coupled parts can be provided by existing projects. In such a manner, the API gives C++ templated interfaces to reduce to the maximum the cost of integration taking the form of plugins or alike. Such codes have been integrated to provide a functional prototype of the framework. For example, molecular dynamics software that have been integrated in Stamp (a code of the CEA) and Lammps (Sandia laboratories). The unique software of continuum mechanics, discretized by finite elements, is based on the libMesh framework.

This software is the result of a collaboration between ScAlApplix and the CEA/DPTA Ile de France. The LibMultiScale is now distributed with a CECILL-C open-source licence. The LibMultiScale
is available at (
http://

`Scotch`(
http://

The initial purpose of
`Scotch`was to compute high-quality partitions and static mappings of valuated graphs representing parallel computations and target architectures of arbitrary topologies. The original
contribution consisted in developing a “
*divide and conquer*” algorithm in which processes are recursively mapped onto processors by using graph bisection algorithms that are applied both to the process graph and to the
architecture graph. This allows the mapper to take into account the topology and heterogeneity of the valuated graph which models the interconnection network and its resources (processor speed,
link bandwidth). As new multicore, multinode parallel machines tend to be less uniform in terms of memory latency and communication bandwidth, this feature may possibly regain interest.

The software has then been extended in order to produce vertex separators instead of edge separators, using a multilevel framework. Recursive vertex separation is used to compute orderings of the unknowns of large sparse linear systems, which both preserve sparsity when factorizing the matrix and exhibit concurrency for computing and solving the factorized matrix in parallel. The original contribution has been to study and implement a tight coupling between the nested dissection and the approximate minimum degree methods; this work was carried out in collaboration with Patrick Amestoy, of ENSEEIHT-IRIT.

In version
`4.0`, released in February 2006, new data structures and methods have been added to the
`libScotch`library, which allow it to compute efficient orderings of native meshes, resulting in the handling of larger problems than with standard sequential graph partitioners. Meshes
are represented as bipartite graphs, in which node vertices are connected to element vertices only, and vice versa. Since this structure is equivalent to a hypergraph, where nodes are connected
to hyper-edges only, and vice versa, the mesh partitioning routines of
`Scotch`turn it into a sequential hypergraph partitioner. This version was released as free/libre software under LGPL license, in order to encourage members of the community to use it as
a testbed for the quick and easy development of new partitioning and ordering methods. It has been downloaded more than 2400 times from the InriaGForge repository.

Version
`5.0`of
`Scotch`
,
, which has been released last August, was the first version to comprise parallel routines. This extension,
called
`PT-Scotch`(for “
*Parallel Threaded*
`Scotch`
*”), is based on a distributed memory model, and makes use of the MPI and, optionally, Posix thread APIs. A distributed graph structure has been defined, which allows users to reserve
vertex indices on each processor for future local adaptive refinement. Its parallel graph ordering routine provides orderings which are of the same quality as the ones yielded by the sequential
Scotchordering routine, while competing software
ParMETISexperiences a severe loss of quality when the number of processors increase.
Scotch
5.0has been released under the CeCILL-C free/libre software license, and is also registered at APP (“Agence pour la Protection des Programmes”).*

`Scotch`can be called from
`MUMPS`,
`PaStiX`and
`HIPS`as an external ordering library. It is also part of the latest release of
Code_ Aster, a GPLed thermal and mechanical analysis software developed by French state-owned electricity producer EDF.

This year many developments have been conducted and implemented in the
`FluidBox`software after
which has open up many doors.

One may list the developments of stabilized and quasi monotone centered RD schemes , , the approximation of viscous terms that is consistent with what is done on the convective/hyperbolic part. This item has been worked out in collaboration with N. Villedieu and H. Deconinck from VKI and Jiri Dobes from CTU in Prag , .

These schemes have all been extended to unsteady problems and quadrilateral meshes and viscous problems, as well as 3D. Thanks to a careful analysis of the implicit phase of the scheme, we have been able to reduce the CPU cost by a half.

We have also started to work on very high order residual distribution schemes. Lagrange interpolant of degree 3 and 4 are considered with either an upwind or a centered formulation , . In , we have considerably reduced the algorithmic complexity of the stabilization term that one has to add in order to speed up convergence to steady state. A first version of an implicit scheme has been developed. It uses the compactness of the stencil of residual distribution schemes. The improvement of the implicit phase will be a major axis of research in the coming years.

The work on shallow water have been prolongated. The scheme is now able to handle dry beds. We have shown that genuinely steady 2D are numericaly preserved by the scheme .

With C.W. Shu (Brown University), we have also developed a new method which is somewhat between the Residual distribution schemes that need continuous interpolant and the Discontinuous Galerkin ones where the interpolant is discontinuous . This method is currently second order in space.

Mario Ricchiuto has written in collaboration with H. Deconinck a chapter for the Encyclopedia of Computational Mechanics, .

During his visit in Bordeaux, H. Nishikawa has worked with M. Ricchiuto on the approximation of model viscous problems. The idea is to transform the second order system into a fist order one and to use the approximation techniques of the residual type to the new system.

In collaboration with Tim Barth (Nasa Ames Rc, USA), R. Abgrall has started to develop a strategy for computing some statistical parameters that need to be introduced because some elements of a physical model are unknown. For example, the boundary might be uncertain because of imperfections, or the inflow boundary conditions, or some parameters describing the equation of state or a turbulent model. In the approach we are working on, the main parameters will be the conditional expectancy of the fluid description (density, velocity, pressure). The approach is non intrusive. For now, some encouraging but preliminary results have been obtained for scalar hyperbolic and parabolic models.

Under this denomination, we cover relaxation schemes and MUSCL type schemes.

In relaxation schemes, one modifies the problem by adding a “relaxation” term that depends on a parameter that should be very small, if not zero. Hence it is a family of schemes that is indexed by this parameter. The advantage of this formal derivation (that can be justified theoretical in many cases however), is to put the non linearities in this relaxation term, and a very simple PDE in the rest of the problem. This permits an easier analysis, and from a more practical view point, it is an avenue for developing schemes with provable properties, such as positivity of the density and the pressure.

This year, we have employed this technique for solving: multifluid flow problems , , and radiative transfert with CELIA , , . In both cases, the physical stiffness (positivity of partial volume for example) could be easily resolved.

For MUSCL schemes, we have developed a numerical procedure for which we can prove that mass and pressure remain positive under a standard CFL type property. The method has been validated against many standard test cases , , .

In collaboration with CEA–CESTA, we are developing an interface tracking method using the level set method with the Ghost fluid technique. The method has been implemented in 2D and validated against several benchmark tests, in particular for interface between compressible and incompressible fluids , .

B. Braconnier has finished his thesis on the development of numerical procedures for the simulation of low mach number flows
. The method use relaxation solvers. The scheme, which is implicit, uses the
`PaStiX`library, and this strategy seems to be the most efficient. In the near future, we expect to be able to consider larger problems by using the iterative method library that is
developed in the team.

In the case of phase transition, V. Perrier has been studying the structure of discontinuities in the seven equations model and the five equations model using traveling wave techniques. This topic is done in collaboration with H. Guillard from the Smash project in Sophia Antipolis.

We have developed a parallel strategy by domain decomposition which is tuned for aerodynamics problems (THOT code of CEA/CESTA). The problem was to construct domain decomposition criteria which preserve the multiblock structure of the mesh. The code is fully operational and will be delivered to CEA/CESTA in the near future.

C. Dobrzynski has worked on fully parallel mesh adaptation procedure that uses standard sequential mesh adaptation codes. The idea is to adapt the mesh on each processor without changing the interfaces. Then the interfaces are modified. The main advantage is the simplicity because there is no need to parallelize mesh generation tools (insert/delete, swap, etc). The main techniques are described in , .

C Dobrzynski has also developed an efficient tool for handling moving 2D and 3D meshes. Here, contrarily to most ALE methods, the connectivity of the mesh is changing in time as the
objects within the computational domain are moving. The objective is to garanty a high quality mesh in term of minimum angle for example. Other criteria, which depend on the physical problem
under consideration, can also been handled. Currently this meshing tool is beeing coupled with
`FluidBox`in order to produce CFD applications. One target example is the simulation of the 3D flow over helicopter blades. A publication is in preparation see also
.

We also have started to work on the definition of an anisotropic metric which is computed from the output of a Residual distribution code. Once this will be done, standard mesh adaptation method will be used so that the numerical error of the solution is controled.

A study of crack propagation in silica glasses with a coupling method between molecular dynamics and elasticity began in collaboration with the CEA Ile-de-France in December 2003. Simulations which follow crack propagation at atomistic level lead to huge number of atoms on a small domain. The coupling between two length scales allows us to treat larger domains with smaller number of atoms. Nevertheless 3D atomistic simulations involve several million atoms; they must be parallel and use a coupling with elasticity codes based on finite element approximation.

Our algorithm to couple such models is based on the Bridging method introduced by T. Belytschko. We have extended our previous work on 1D analysis of the model to higher dimension and we have developed a parallel framework to compute and visualize the coupling algorithms. This framework allows us to couple finite element technique with molecular dynamics. We validated the approach based on the Bridging Method on several multi-dimensional cases like wave propagation and crack propagation. The coupling algorithm solves a coupling linear equation and redistributes the corrections among degrees of freedom (atoms and finite elements nodes). Optimized data structures have been used in several parts of the coupling process. For example we build an efficient algorithm based on an initial computing of the finite element shape functions in order to accelerate the field's interpolation at atom positions. One other crucial service of the framework is the ability to control and forward the information on dynamic load balance strategies. Those strategies migrate atoms between processors; thus the communication scheme to update the variables attached to the coupling system (like dofs) needs to be updated. Moreover, this framework integrates EPSN that allows a powerful monitoring. All these works are detailed in .

We are now studying the effect of the coupling method on acoustic waves. More precisly, we compared the wave reflexion rates, as a numerical results issued from simulations, with respect to various parameters such as the finite element size, the coupling zone size, etc.. We have observed that thoses rates are not constant with respect to the studied wave frequency. We have developed a spectral formulation of the coupling scheme in order to explain the reasons of such phenomenon. An article explaining the numerical results as well as the spectral formulation is being written.

In our developments of both stochastic and deterministic models, biological processes are combined to reach a good level of realism. For host-parasite systems, it makes a big difference with purely mathematical models, for which numerical results could hardly be compared to observations. Parallel numerical simulations mimic some of the dynamics observed in the fields, and supply a usable tool to validate the models. This work is a collaborative effort in an interdisciplinary approach between population dynamics, mathematics and computer science.

A cooperation involving a biologist (Agnès Calonnec - INRA UMR Santé végétale 1065 - Villenave d'Ornon) and a thesis student in computer science (Gaël Tessier) began since
october 2003. Using numerical methods and parallel technics, we are interested in modeling the spread of
*powdery mildew*, a disease of vineyard. Correct prediction of this type of parasite epidemics needs a realistic simulator, and could have an industrial impact.

An architectural model of vine stocks is used for two purposes: the study of the growth of stocks and the influence of its structure on the dispersal of powdery mildew. In this model, we consider a large number of infectious elements and several spatially hetereogeneous environmental parameters. Indeed, the dispersal of powdery is a multiscale mechanism that takes place within vine stocks, and along and across the rows of the vineyard. An initial version of a parallel simulator using MPI communications has been developed. A characterization of the implemented algorithms is presented in ; we evaluate particularly the communication costs and the load imbalance. Experiments were carried out on clusters of SMP nodes, up to 128 processors . This revealed that the part of time spent for communications and synchronizations highly increases for simulations that uses 64 processors and more. Relative efficiency drops to 63 %with 128 processors.

An hybrid approach mixing processes and threads has been considered: the idea is to benefit from the high speed of shared memory accesses by replacing
nmonothreaded processes in the previous parallel simulator by
n/
pprocesses, each one containing one master thread responsible for inter-process MPI communications and
psimulation threads running inside the same SMP node. Simulation threads compute the growth of vinestocks and colonies of powdery mildew, and the dispersal of aerial spores. Threads in a
same process can exchange data via the shared memory, avoiding MPI communications. Furthermore, communications between nodes can be aggregated for all the threads of the nodes, and
load-balancing can be improved by exchanging vinestocks between threads of a process. The implementation and the performances of this hybrid simulator were presented in
. A partial dynamic load-balancing turned out to be necessary to reduce the cost of synchronizations between
threads, and permit to improve the scalability of the simulator.

In this work we have to develop and implement methods to improve the MHD simulation code to enable high-resolution MHD simulations of ELMs (Edge Localized Mode).

In this work we have started to upgrade the
`FluidBox`code so that it can run MHD flows. Starting from a “standard” Residual Distribution scheme for ideal gases, we have adapted the scheme to handle the magnetic field and its
coupling to the other flow variables. This has been done via the formulation we have developed and that starts from a Rusanov like version of the first order distribution. The rest is
standard once the eigen–structure of the system has been computed. The novelty is how to handle the geometrical
constraint. For this we have used a trick which is originaly due to Munz et al
. It consists in introducting a fictitious variable in the system (a Lagrange multiplier) and to introduce a
fictitious time
. The first step is to solve the steady modified problem in
, and then to march in (true) time. Several validations are beeing conducted in order to validate the method. In the future, we wish to adapt the high order strategy as well as the
technique with discontinuous elements we are developing in other context to reach some of the goals assigned to the ASTER project. We also have to take into account the resistivity terms.

The improvements also include adaptive mesh refinement (see section
), a robust numerical MHD scheme and refinable cubic Hermite finite elements. These developments need to be consistent with
the implicit time evolution scheme and the
`PaStiX`solver
. The implicit scheme is essential due to the large variety of time scales in the MHD simulations.

The new methods have to be implemented and evaluated in the code
`FluidBox`and the
`JOREK`code to optimize the exchange of expertise on numerical methods and MHD simulations. This work is supported by the ANR CIS 2006 project called ASTER (Adaptive mhd Simulation of
Tokamak Elms for iteR).

One of our contribution was to propose an efficient parallel implementation of the semi-Lagrangian method using a
*local*cubic spline interpolation method. Coupled with a time splitting procedure, the
*standard*cubic spline interpolation is a good compromise between accuracy and simplicity. Nevertheless, the standard method does not provide the locality to find the spline
coefficients. So previous semi-Lagrangian simulators used frequent global redistribution of the main data structure on the parallel machine. Thus, the scalability of the parallel application
was very limited by these costly collective communications. Our local spline method
,
reduces dramatically the communication overhead. It was used sucessfully into two codes: LOSS4D and
GYSELA5D. Notably, the GYSELA code, developed by the CEA Cadarache, is the first
*full-f*semi-Lagrangian code in the world
,
,
,
,
. Our contribution to this code has led to high scalability
, shown on test case with up to 4096 processors. This software provides an important alternative to
classical PIC and pure Eulerian simulators available in the international plasma community.

We have contributed to an adaptive method to solve the multi-dimensional Vlasov equation on an uniform grid. Although mathematicians have already studied wavelet transforms and adaptive
numerical scheme, coupling wavelets and non-linear partial differential equations approximation represents a great deal of interest. The wavelet decomposition gives a sparse representation
and a natural criterion to perform local grid refinements. The algorithmic complexity and the memory consumption of such an adaptive simulator is asymptotically less than classical dense
simulator. The parallelization of such method is interesting in order to deal with applications that manipulate very large data having several dimensions. Our work focused on the
parallelization with OpenMP of a semi-Lagrangian code OBIWAN4D that considers 4D data
,
. We have been able to achieve simulation on a
1024
^{4}uniform grid with this tool. Such big grids are typically not within the scope of classical non-adaptive Vlasov simulator.

Today's and future massively parallel supercomputers allow to envision the simulation of realistic problems involving complex geometries and multiple scales. One aim of the HOUPIC project is to develop massively parallel Particle-In-Cell simulator that can handle such problems. The Particle-In-Cell method consists in solving the Vlasov equation using a particle method (advancing numerically the particle thanks to motion equation). In order to achieve this goal, new numerical methods need to be investigated in the field solver part. This includes the use of hybrid grids that combine structured and unstructured mesh types. The mixing of different mesh types leads to numerical complexities and parallelization problems. A fine analysis of load balancing issues needs to be done in order to enable the scalability up to hundreds of processors. The aim of our on-going work is to compare different field solvers and investigate their efficient coupling to the particles. These self consistent relativistic PIC solvers will be the first of this kind in this context and promise to have an impact for the simulation of realistic problems in accelerator and plasma physics.

The work carried out within the
`Scotch`project (see section
) focused on three main axes.

The first one was the optimization of the parallel graph ordering routines of the
`PT-Scotch`(for “
*Parallel Threaded*
`Scotch`
*”) parallel graph partitioning and ordering library. This work was performed in the context of the PhD thesis of Cédric Chevalier
, who graduated last September. To achieve efficient and scalable parallel graph partitioning, it is
necessary to implement a parallel multilevel framework in which distributed graphs are coarsened, thanks to vertex matchings, down to a size which can be handled by a single processor, on
which a sequential partition is computed by means of the existing
Scotchtool, after which this coarse solution is expanded and refined, level by level, up to obtain a partition of the original distributed graph. In order for the coarse partition to
reflect the one of the finest level, graph coarsening routines must preserve the topological structure of the original graph, which is a most complicated task in parallel as it requires much
communication to send and process matching requests. Consequently, most parallel partitioners tend to privilege local matching over remote ones, thus introducing a bias in the building of
coarsened graphs, reducing overall partitioning quality. Thanks to new probabilistic matching algorithms designed by Cédric Chevalier,
PT-Scotchis able to quickly build unbiased distributed coarsened graphs which improve the quality of its vertex separators and reduce memory consumption. These results have been
presented in several workshops
,
, and two journal papers are in preparation, one of which has already been submitted
.*

The second one is the design of scalable local optimization algorithms for the uncoarsening phase of the multilevel framework. Indeed, the best sequential local optimization algorithms do
not parallelize well, as they are intrinsically sequential, and attempts to relax this strong sequential constraint can lead to severe loss of partition quality when the number of processors
increase. In version
`5.0`of
`Scotch`, we have temporarily overcome this difficulty by proposing a multisequential algorithm where the sequential optimization is applied on each processor to band graphs that
contain vertices that are at a small fixed distance (typically 3) from the projected separators
. However, this approach is not scalable. Again basing on band graphs, we have therefore designed a
sequential diffusion-based algorithm for graph bipartitioning, called the “jug of the Danaides”
. This algorithm greatly improves the smoothness of subdomain boundaries, is only three times slower than
the classical Fiduccia-Mattheyses algorithm (and therefore much faster than the genetic algorithms that we experimented before
), and is furthermore intrinsically parallel, which make it an ideal candidate to replace our
multisequential local refinement algorithm.

These first two axes of research will be pursued next year, thanks to the arrival last September of a post-doc student, Jun-Ho Her, who started working on parallel graph bipartitioning algorithms.

The third research axis regards the design of specific graph partitioning algorithms. Several applications, such as quantum chemistry or hierarchical interface decomposition (see Section ), need k-way partitions where load balance should take into account not only vertices belonging to the subdomains, but also boundary vertices, which lead to computations on each of the subdomains which share them. Preliminary work has started this year with the internship of Stéphane Le Roy, and will be pursued by Charles-Edmond Bichot, a post-doc student arrived in December.

In order to solve linear systems of equations coming from 3D problems and with more than 50 million of unkowns, which is now a reachable challenge for new SMP supercomputers, the parallel solvers must keep good time scalability and must control memory overhead caused by the extra structures required to handle communications.

**Static parallel supernodal approach.**In the context of new SMP node architectures, we proposed to fully exploit shared memory advantages. A relevant approach is then to use an hybrid
MPI-thread implementation. This not yet explored approach in the framework of direct solver aims at solving efficiently 3D problems with much more than 50 million of unkowns. The rationale
that motived this hybrid implementation was that the communications within a SMP node can be advantageously substituted by direct accesses to shared memory between the processors in the SMP
nodes using threads. In addition, the MPI communications between processes are grouped by SMP node. We have shown that this approach allows a great reduction of the memory required for
communications
,
. Many factorization algorithms are now implemented in real or complex variables, for single or double
precision: LLt (Cholesky), LDLt (Crout) and LU with static pivoting (for non symmetric matrices having a symmetric pattern). This latter version is now integrated in the
`FluidBox`software
. A survey article on thoses techniques is under preparation and will be submitted to the SIAM journal on
Matrix Analysis and Applications. It will present the detailed algorithms and the most recent results. We have to add numerical pivoting technique in our processing to improve the robustness
of our solver. Moreover, in collaboration with the MUMPS developers (see section
), we want to adapt Out-of-Core techniques to overcome the physical memory constraints.

**Dynamic parallel multifrontal approach.**The memory usage of sparse direct solvers can be the bottleneck to solve large-scale problems involving sparse systems of linear equations of the
form Ax=b. If memory is not large enough to treat a given problem, disks must be used to store data that cannot fit in memory (
*out-of-core*storage). In a previous work, we proposed a first out-of-core extension of a parallel multifrontal approach based on the solver
`MUMPS`, where only the computed factors were written to disk during the factorization. We have studied in details the minimum memory requirements of this parallel multifrontal
approach and proposed several mechanisms to decrease further those memory requirements. This year, we were interested by the minimization of the volume of I/O needed to perform the
out-of-core factorization. Indeed, we showed in a first step that the state-of-the-art algorithms for minimizing the memory requirements of the multifrontal method are not well adapted to
volume of I/O minimization. We then proposed an optimal I/O volume minimization algorithm, for the sequential multifrontal method, which reduces the volume of I/O by a factor up to two when
compared with the existing memory minimization algorithms. Having this result, we are now working on the reduction of the volume of I/O when using different memory allocation schemes, that
were introduced in a previous work that was concerned by the memory minimization
for the multifrontal method. This work is performed in the context of the PhD of Emmanuel Agullo, in
collaboration with Jean-Yves L'Excellent (INRIA project GRAAL). It is important to note that in this context, a collaboration with Xiaoye S. Li and Esmond G. Ng (Lawrence Berkeley National
Laboratory, Berkeley, California, USA) has started to compare the multifrontal factorization to the left-looking.

Once the factorization is finished (i.e. factors are written to disk), we have to process the solution step which needs to read back the factors produced during the factorization. This step is very critical in the out-of-core context since it becomes a time consuming step. In addition, since the amount of computation of a single right-hand side solution step is low: the performance of the out-of-core solution step is bounded by the disk bandwidth. Thus, the objective of an out-of-core solution step is to be as close as possible to this lower bound. In the context of the thesis of Tzetomila Slavova (PhD at CERFACS) we proposed several scheduling strategies for the out-of-core solution step that aim at ensuring that factors are accessed in a good way in the sense of I/O performance (i.e. data must be accessed in a sequential way for a given file).

Finally, in collaboration with O. Beaumont (INRIA team CEPAGE), we designed a new scheduling strategy to distribute the computational tasks among the processors. The proposed algorithm was
inspired from the work of Prasanna & Musicus
,
for the scheduling of trees of parallel tasks on multiprocessors. This algorithm is characterized by a high
locality of communications between processors. This algorithm has been implemented within the software package
`MUMPS`and have shown a high efficiency. However, we observed a high memory consumption when using this new scheme. Thus, the next step is to design bi-criteria algorithms that aim at
finding a trade-off between efficiency and memory usage.

**Adaptation to NUMA architectures.**In the context of distributed NUMA architectures, a work has recently begun, in collaboration with the INRIA RUNTIME team, to study optimization
strategies, and to improve the scheduling of communications, threads and I/O. Our solvers will use
*Madeleine*and
*Marcel*libraries in order to provide an experimental application to validate those strategies. M. Faverge has started a Ph.D. (since october 2006) to study these aspects in the context
of the NUMASIS ANR CIGC project. It has been proved that NUMA allocation can significantly improve the efficiency. In the Out-of-Core context, new problems linked to the scheduling and the
management of the computational tasks may arise (processors may be slowed down by I/O operations). Thus, we have to design and study specific algorithms for this particular context (by
extending our work on scheduling for heterogeneous platforms). The NUMASIS ANR CIGC project also deals with the numerical study of seismic wave propagation in geological structures, and a
Ph.D. has started in 2007 (Fabrice Dupros from BRGM in collaboration with the Magique-3D INRIA project) concerning efficient parallel 3D simulations on NUMA architectures. We will use the new
version of our direct solver
`PaStiX`adapted for these architectures for these challenge simulations.

The resolution of large sparse linear systems is often the most consuming step in scientific applications. Parallel sparse direct solver are now able to solve efficiently real-life three-dimensional problems having in the order of several million of equations, but anyway they are limited by the memory requirement. On the other hand, the iterative methods require less memory, but they often fail to solve ill-conditioned systems. We have developed two approaches in order to find some trade-off between these two classes of methods.

In these work, we consider an approach which, we hope, will bridge the gap between direct and iterative methods. The goal is to provide a method which exploits the parallel blockwise algorithmic used in the framework of the high performance sparse direct solvers. We want to extend these high-performance algorithms to develop robust parallel incomplete factorization based preconditioners for iterative methods such as GMRES or Conjugate Gradient. Those works are supported by the ANR-CIS project “SOLSTICE”.

**Block ILUK preconditioner.**The first idea is to define an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete
factorizations commonly used to precondition iterative solvers. Such incomplete factorization can take advantage of the latest breakthroughts in sparse direct methods and particularly should
be very competitive in CPU time (effective power used from processors and good scalability) while avoiding the memory limitation encountered by direct methods. By this way, we expect to be
able to solve systems in the order of hundred million of unknowns and even one billion of unknowns. Another goal is to analyse and justify the chosen parameters that can be used to define the
block sparse pattern in our incomplete factorization.

The driving rationale for this study is that it is easier to incorporate incomplete factorization methods into direct solution software than it is to develop new incomplete factorizations. Our main goal at this point is to achieve a significant diminution of the memory needed to store the incomplete factors (with respect to the complete factors) while keeping enough fill-in to make the use of BLAS3 (in the factorization) and BLAS2 (in the triangular solves) primitives profitable.

In
and
, we have shown the benefit of this approach over classic scalar implementation and also over direct
factorisations. Indeed, on the AUDI problem (that is a reference 3D test case for direct solver with about one million of unknowns), we are able to solve the system in half the time required
by the direct solver while using only one tenth of the memory needed (for a relative residual precision of
10
^{-7}). We now expect to improve the convergence of our solver that fails on more difficult problems.

Recently, we have focused on the critical problem to find approximate supernodes of ILU(k) factorizations. The problem is to find a coarser block structure of the incomplete factors. The “exact” supernodes that are exhibited from the incomplete factor non zero pattern are usually very small and thus the resulting dense blocks are not large enough for an efficient use of the BLAS3 routines. A remedy to this problem is to merge supernodes that have nearly the same structure. The benefits of this approach has been shown in .

**Hybrid direct-iterative solver based on a Schur complement approach.**In recent years, a few Incomplete LU factorization techniques were developed with the goal of combining some of the
features of standard ILU preconditioners with the good scalability features of multilevel methods. The key feature of these techniques is to reorder the system in order to extract parallelism
in a natural way. Often a number of ideas from domain decomposition are utilized and mixed to derive parallel factorizations.

Under this framework, we developed in collaboration with Yousef Saad (University of Minnesota) algorithms that generalize the notion of “faces” and “edge” of the “wire-basket” decomposition. The interface decomposition algorithm is based on defining a “hierarchical interface structure” (HID). This decomposition consists in partitioning the set of unknowns of the interface into components called connectors that are grouped in “classes” of independent connectors .

In the context of robust preconditioner technique, we have developed an approach that uses the HID ordering to define a new hybrid direct-iterative solver. The principle is to build a
decomposition of the adjacency matrix of the system into a set of small subdomains (the typical size of a subdomain is around a few hundreds or thousand nodes) with overlap. We build this
decomposition from the nested dissection separator tree obtained using a sparse matrix reordering software as
`Scotch`. Thus, at a certain level of the separator tree, the subtrees are considered as the interior of the subdomains and the union of the separators in the upper part of the
elimination tree constitutes the interface between the subdomains.

The interior of these subdomains are treated by a direct method. Solving the whole system is then equivalent to solve the Schur complement system on the interface between the subdomains which has a much smaller dimension. We use the hierarchical interface decomposition (HID) to reorder and partition this system. Indeed, the HID gives a natural dense block structure of the Schur complement. Based on this partition, we define some efficient block preconditioners that allow the use of BLAS routines and a high degree of parallelism thanks to the HID properties.

We propose several algorithmic variants to solve the Schur complement system that can be adapted to the geometry of the problem: typically some strategies are more suitable for systems
coming from a 2D problem discretization and others for a 3D problem; the choice of the method also depends on the numerical difficulty of the problem
. For the moment, only a sequential version of these techniques have been implemented in a library (
`HIPS`). It provides several methods to build an efficient preconditioner in many of these situations.
`HIPS`has been built on top of the
`PHIDAL`library and thus also integrates scalar preconditioners that were developed in the multistage ILUT factorization
. We have also added the possibility to mix dense block computation in the interior domain and an ILUT
factorization in the Schur complement to precondition the system. We have shown that the memory could then be reduced by a factor 2 or 3 at a reasonnable time expense compared to the
incomplete dense block factorization of the Schur complement
.

The Fast Multipole Method (FMM) is a hierarchical method which computes interactions for the N-body problem in O(N) time for any given precision. In order to compute energy and forces on large systems, we need to improve the computation speed of the method.

This has been realized thanks to a matrix formulation of the main operator in the far field computation : this matrix formulation is indeed implemented with BLAS routines (Basic Linear Algebra Subprograms). Even if it is straightforward to use level 2 BLAS (corresponding to matrix-vector operations), the use of level 3 BLAS (that corresponds to matrix-matrix operations) is interesting because much more efficient. So, thanks to a careful data memory storage, we have rewritten the algorithm in order to use level 3 BLAS, thus greatly improving the overall runtime. Other enhancements of the Fast Multipole Method, such as the use of Fast Fourier Transform (the block FFT and the FFT with polynomial scaling), the use of rotations or the use of plane wave expansions, allow the reduction of the theoretical operation count. Comparison tests have shown that, for the required precisions in astrophysics or in molecular dynamics, our approach is either faster (compared to rotations and plane waves) or as fast and without any numerical instabilities (compared to the FFT based method), hence justifying our BLAS approach. These results have been accepted for publication in the Journal of Computational Physics . Our BLAS version has then been extended to non uniform distributions, requiring therefore a new octree data structure named octree with indirection, that is efficient for both uniform and non uniform distributions. We have also designed an efficient algorithm that detects uniform areas in structured non uniform distributions, since these areas are more suitable for BLAS computations. These results have been published in . An efficient parallel code of our BLAS version, based on an hybrid MPI-thread programming, has finally been developed and validated on shared and distributed memory architectures .

Thanks to a post-doctoral position within the ScAlApplix project, located in the
*Laboratoire d'Astrophysique de Marseille*and supported by the
*HALOBAR*ANR project (ANR-06-BLAN-0172), we have been able to compare our FMM code to other serial and parallel codes dedicated to astrophysical comparisons. This will result in a
forthcoming submission to an international journal.

This work started with a collaboration between the EDF/SINETICS team and the ScAlApplix project to design and develop techniques to optimize the efficiency of the codes used to simulate the physics of nuclear reactors. G. Caramel was recruted, during one year, as an associate engineer, to work on this contract.

This collaboration has given rise to a new study for a large-scale parallel computing code simulation. Bruno Lathuilère started in november 2006 a thesis on domain decomposition methods applied for solving neutron transport equations. Many scientific fields are involved like reactor physics, numerical analysis and computer science. For a stationnary problem (criticaly), one has to solve a generalized eigenvalue problem using power algorithm upper three recursive levels of deteriorated iterative solvers.

At the end of the first year of his thesis, Bruno has been able to implement, in an industrial code, algorithms he has proposed and the initial results, obtained on a the simulation of the IAEA benchmark, are encouraging.

However, this approach to parallelize the SPn approximation, used to solve the Boltzmann equation, will not be sufficient to obtain a complete simulation. We therefore propose in a second step to improve the efficiency of parallel code from this first work by taking into account the heterogeneity of the reactor (that increases the number of energy groups and the order of anisotropy). Mesh refinement techniques and the coupling of different models should be adapted to neutron simulations.

**Redistribution of hierarchical grid.**In the context of the ANR MASSIM, we are now considering more complex data structures such as hierarchical grids for the purpose of parallel data
redistribution. In previous works, we have proposed a description model of distributed data, that can represent scalar parameters, structured grids, points or unstructured meshes. We have
recently extended this model with hierarchical grids, that are designed in the RedGRID framework as a coarse grid (by inheritence of classical grid) with multiple recursive subgrids. We have
validated our model with the LOSS4D simulation that uses a compression scheme based on hierarchical finite element basis. Concerning the redistribution of hierarchical grids, we have proposed
two different approaches based on previous algorithms already available in the RedGRID framework. We distinguish two main approaches: the spatial approach and the placement approach. In the
first approach, one uses spatial information on the coarse grid to compute the pieces of blocks (with refined cells) that should be exchanged through the network. In the the placement approach,
one assigns a weight to each cell of the coarse grid that depends on the amount of data in sub-levels. And then, one uses this information to calculate the placement of blocks from one code to
another with a strategy inspired from load-balancing algorithms. These algorithms have been implemented in the RedGRID framework and integrated to the EPSN steering environment. In future
works, we will compare and evaluate the benefits of such strategies to reduce the communication time when performing online parallel visualization of a hierarchical grid.

**Model for the steering of parallel-distributed simulations.**The model that we have proposed in the
`EPSN`framework can only steer efficiently SPMD simulations. A natural development is to consider more complex simulations such as coupled SPMD codes called M-SPMD (Multiple SPMD like
multiscale simulation for “crack-propagation”) and client/server simulation codes. In order to steer these kinds of simulation, we have designed an extension to the Hierarchical Task Model
(HTM), that affords to solve the coherency problem for such complex applications. This model has been published in
. The
`EPSN`framework has been extended to handle this new kind of simulation. In order to manage this new extended model we must improve the synchronisation algorithms and the request
management. In the previous architecture the synchronization phases are done by a coordinator (the
`EPSN`proxy) with each simulation nodes. In the distributed cases these phases are more complex because they are distributed through the simulation nodes and the coupled-simulations. So
we introduced a hierarchy of coordinators that determines the synchronization dates that are used to interact coherently with the simulations. With this new software architecture we have to
route the request to the good simulation, those which had the data required by the request. In future works, we will validate this approach with the multiscale simulation for
“crack-propagation”developed in the project.

**Compression and Visualisation of 4D data.**The main output of our most recent Vlasov simulators is typically a 4D+t data. In the ANR MASSIM, we intend to perform the visualization of
the 4D data interactively during simulations. In order to reach this goal, we have plugged a compression scheme (that uses Hierarchical Finite element basis) in the LOSS4D semi-Lagrangian code.
Indeed, the full 4D data structure is most of the time too large to be sent quickly on the network to the visualization process. So, we provide at each time step of the simulator a compressed
hierarchical data structure that can be send to the visualization tool through the EPSN framework. Furthermore, a visualization tool (named PlasmaViz owned by the INRIA CALVI project) is able
to take the hierarchical data structure, extract slices, and plot them. The LOSS4D simulator has to perform specific extra calculation in order to export a compressed data structure. A step
beyond this, is to do the coupling of an adaptive code directly with an adaptive visualization tool. Then, we have developed a visualization tool, named
*Wodca*, that could plot data produced by the adaptive OBIWAN4D code. This
*Wodca*software prototype relies on wavelet compression scheme, whereas PlasmaViz uses hierarchical finite element scheme. The compression rate and the algorithmic complexity for these two
schemes are quite different. We expect to explore accurately these differences in the near future.

**EPSN framework.**Many improvements have been brought in the software both in the EPSN kernel and in the client part. Firstly, low layers of RedGRID have been deeply modified and more
precisely the ColCOWS layer. The communication algorithms have been optimized by using a hierarchical scheme. Currently, ColCOWS can deploy a distributed application on a cluster of 256 nodes.
It takes now less than 1 second, against more than 4 seconds before. Secondly, we have developed QT4 plugins to help the integration of EPSN in QT application (client) such as SiMonE
(Simulation Monitoring for EPSN), PlasViz (used in MASSIM Project) and ParaView.

CEA research and development contracts:

Parallel resolution of multifluid flows (Benjamin Braconnier, Boniface Nkonga);

Relaxation method for Laser/Plasma interaction: Ti-Te model (Aubin Bellue, Boniface Nkonga);

Numerical tools for interfaces flows: compressible/incompressible (Marie Billaud, Boniface Nkonga);

Automatic distribution for parallel structured aerodynamic solver (Michael Lutz, Boniface Nkonga);

Numerical simulation of crack propagation in silica glass by coupling molecular dynamics and elasticity methods (Guillaume Anciaux, Olivier Coulaud, Jean Roman).

Feasability study of the graph reorderer and partitioner
`PT-Scotch`and the hybrid direct-iterative solver
`HIPS`to large 3D electromagnetism problems (Cédric Chevalier, Jérémie Gaidamour, Pascal Hénon, François Pellegrini, Jean Roman).

EDF research and development contracts:

Application of a domain decomposition method to the neutronic SPn equations (Bruno Lathuilière, Pierre Ramet, Jean Roman);

Improvement of the computation tool performances used for neutronic simulation of the EDF cores (Guilhem Caramel, Pierre Ramet, Jean Roman).

**Grant:**Conseil Régional d'Aquitaine, CNES and EADS – EXPERT project

**Dates:**2004 – 2007

**Overview:**The objective of this work is to upgrade the numerical schemes in the aerodynamic modules of the ONERA code CEDRE using the know–how we have developed in residual distribution
schemes. The main difficulty is to adapt these methods to the data structure of CEDRE. The residual distribution schemes are tuned for cell vertex data structure while CEDRE works with cell
centered data structures. The scientific objective of this grant is to provide a bridge between residual distribution schemes and discontinuous Galerkin ones.

**Grant:**SNECMA

**Dates:**2006-2009

**Partners:**Ecole Centrale Lyon, ONERA-DSNA, ENSAM, Université Aix-Marseille

**Overview:**The AEROCAV project goal is to study the noise produced by the circulation of air around elliptic or cylindric cavities. This kind of noises are particulary intense around
aircraft wing at take-off and landing phase. Our task is to analyse new schemes that are high order accurate and can be used on general unstructured meshes. This is done within the framework
of residual distribution schemes.

**Grant:**ARC INRIA

**Dates:**2005-2007

**Partners:**CALVI (INRIA LORIA - leader of the project), MC2 (INRIA Futurs Bordeaux), SIMPAF (INRIA Futurs Lille), MIP (Toulouse), CEA Cadarache

**Overview:**The description of magnetized plasmas uses a hierarchy of models; this leads to several open problems: modeling and role of adimensionnalised parameters, mathematical analysis
of the models and their asymptotic behavior when some parameters tends to infinity, numerical simulation of these (simplified) models. The role of this ARC is to cover this range of problems,
from the analysis to the numerical simulation.

**Grant:**ANR-06-CIS

**Dates:**2006 – 2009

**Partners:**CEA Cadarache.

**Overview:**The magnetohydrodynamic instability called ELM for Edge Localized Mode is commonly observed in the standard tokamak operating scenario. The energy losses the ELM will induce
in ITER plasmas are a real concern. However, the current understanding of what sets the size of these ELM induced energy losses is extremely limited. No numerical simulations of the complete
ELM instability, from its onset through its non-linear phase and its decay, exist in literature. Recently, encouraging results on the simulation of an ELM cycle have been obtained with the
JOREK code developed at CEA but at reduced toroidal resolution. The JOREK code uses a fully implicit time evolution scheme in conjunction with the PaStiX sparse matrix library. In this
project it is proposed to develop and implement methods to improve the MHD simulation code to enable high-resolution MHD simulations of ELMs. The ELM simulations are urgently needed to
improve our understanding of ELMs and to evaluate possible mechanism to control the energy losses. The improvements include adaptive mesh refinement, a robust numerical MHD scheme and
refinable cubic Hermite finite elements. These developments need to be consistent with the implicit time evolution scheme and the PaStiX solver. The implicit scheme is essential due to the
large variety of time scales in the MHD simulations. The new methods will be implemented and evaluated in the code FluidBox, developed by the ScAlApplix team and the JOREK code to optimize
the exchange of expertise on numerical methods and MHD simulations.

The project is a collaboration between the Departement de Recherche sur la Fusion Controlee (DRFC, CEA/Cadarache) and the Laboratoire Bordelais de Recherche en Informatique (LaBRI) and Mathématiques Appliquées de Bordeaux (IMB) at the University of Bordeaux I.

**Grant:**ANR - Programme blanc

**Dates:**2007 – 2010

**Partners:**Laboratoire d'Astrophysique de Marseille (leader of the project) , Astroparticules et Cosmologie

**Overview:**We want to study the secular evolution of barred galaxies in the presence of both a gaseous component and a live and responsive halo. Although considerable steps have been
made in understanding the two influences independently, little, if anything, has been done about understanding their coupled effect and the outcome of the internal competition between angular
momentum emitters and absorbers when all partners are active. This is partly due to the fact that, so far, understanding one of the two influences was already a sufficient challenge. It is
also partly due to the fact that an all-round approach is necessary at this stage. Simulations should be more performing, necessitating deep knowledge of N-body and hydrodynamic codes, as
well as of the ways to link them. The effects of the resonances need to be fully understood, necessitating expertise in analytical work on resonances, but also in orbital structure theory
including periodic orbits as well as chaotic ones. Software for the analysis of the resonances and the study of the orbital structure in the simulations is necessary. Finally, links with
observations need to be very tight, since they will be the only way to ensure that the solutions found are realistic and relevant to real galaxies, rather than abstract mathematical
models.

**Grant:**ANR MMSA - ARA MAsses de données

**Dates:**2005 – 2008

**Partners:**IRMA (Strasbourg, UMR 7001)), LSIIT (leader of the project, Strasbourg, UMR 7005)

**Overview:**Numerical simulation is a continuously growing area, especially with the increasing computational power of current computer technology, thus covering larger and larger
scientific application fields. But at these days, monitoring tools are still seriously lacking, since developers and users desire more and more to get faster and faster feedbacks of the
simulation results. In this project, we are interested in large scale simulations dealing with complex data (multivariate and multidimensional). Our aim is to realize a plate-form / framework
to couple parallel and distributed simulations, like in GRID'5000, with an interactive monitoring and visualization system. This plate-form will be validated on two types of large scale
applications: plasma and crack propagation simulation using multiscale approaches. For these applications, the simulation codes are definitely very complex and need some highly efficient
tools to represent the large amount of data, to redistribute the data using visualization and to control and validate the corresponding computation algorithms. Since results may be
multivariate and multidimensional, they need also specific data exploration and visualization tools.

**Grant:**ANR-05-CIGC-002

**Dates:**2006 – 2009

**Partners:**Bull, Total, BRGM, CEA, ID-Imag (leader of the project), PARIS (IRISA), Runtime (INRIA Futurs Bordeaux).

**Overview:**The multiprocessor machines of tomorrow will rely on an NUMA architecture introducing multiple levels of hierarchy into computers (multimodules, chips multibody,
multithreading material, etc). To exploit these architectures, parallel applications must use powerful runtime supports making possible the distribution of execution and data streams without
compromising their portability. Project NUMASIS proposes to evaluate the functionalities provided by the current systems, to apprehend the limitations, to design and implement new mechanisms
of management of the processes, data and communications within the basic softwares (operating system, middleware, libraries). The target algorithmic tools that we retained are parallel linear
sparse solvers with application to seismology.

**Grant:**ANR-06-CIS

**Dates:**2006 – 2009

**Partners:**CERFACS, EADS IW, EDF R&D SINETICS, INRIA Rhône-Alpes and LIP, INPT/IRIT, CEA/CESTA, CNRS/GAME/CNRM.

**Overview:**New advances in high-performance numerical simulation require the continuing development of new algorithms and numerical methods. These technologies must then be implemented
and integrated into real-life parallel simulation codes in order to address critical applications that are at the frontier of our know-how. The solution of sparse systems of linear equations
of (very) large size is one of the most critical computational kernel in terms of both memory and time requirements. Three-dimensional partial differential equations (3D-PDE) are particularly
concerned by the availability of efficient sparse linear algorithms since the numerical simulation process often leads to linear systems of 10 to 100 million variables that need to be solved
many times. In a competitive environment where numerical simulation becomes extremely critical compared to physical experimentation, very precise models involving a very accurate
discretisation are more and more critical. The objective of our project is thus both to design and develop high-performance parallel linear solvers that will be efficient to solve complex
multiphysic and multiscale problems of very large size. To demonstrate the impact of our research, the work produced in the project will be integrated in real simulation codes to perform
simulations that could not be considered with today's technologies.

**Grant:**ANR-06-CIS6-009

**Dates:**2006 – 2009

**Partners:**CEA/DSM(leader of the project), CELIA.

**Overview:**This research project aims at satisfying simulation needs in the fields of Astrophysics, Hot Dense Matter and Inertial Confinement Fusion. A large part of the scientific
production in these fields rely upon simulations of complex unsteady hydrodynamic flows, coupled to non equilibrium transport and chemical kinetics. As the characteristic time scales of
transport may be much shorter than the fluid time scale, implicit numerical tools are required. Some of the numerical codes developed in the presently co working institutions (CEA/DSM, INRIA,
CELIA) may be now regarded as mature. Nevertheless, large scale exploitation of these codes present issues the resolution of which exceed the isolated capabilities of the developing teams. A
coordinated action is thus necessary. These issues are :

Numerical methods for multimaterial, three dimensional compressible, unsteady flows.

Parallel algorithms. Specifically, efficient implicit solvers for non linear transport or diffusion equations.

Data management and visualisation.

Constitution and efficient access to large shared data bases for equations of state and transport coefficients.

Source code management and software design in order to ensure: Cross validation of codes, Stability and long term maintenance of software, Interoperability (portability of physical packages, exchange of numerical data, code linking).

**Grant:**European Commission

**Dates:**2006-2009

**Partners:**AIRBUS F et AIRBUS D, DASSAULT, ALENIA, DLR, ONERA, NLR, ARA, VKI, INRIA, Nankin University, Universities of Stuttgart, Bergame, Twente, Nottingham, Swansea, Charles (Prague),
Varsovie, CENAERO, ENSAM Paris )

**Overview:**Computational Fluid Dynamics is a key enabler for meeting the strategic goals of future air transportation. However, the limitations of today numerical tools reduce the scope
of innovation in aircraft development, keeping aircraft design at a conservative level. Within the 3rd Call of the 6th European Research Framework Programme, the strategic target research
project ADIGMA has been initiated. The goal of ADIGMA is the development and utilization of innovative adaptive higher-order methods for the compressible flow equations enabling reliable,
mesh independent numerical solutions for large-scale aerodynamic applications in aircraft design. A critical assessment of the newly developed methods for industrial aerodynamic applications
will allow the identification of the best numerical strategies for integration as major building blocks for the next generation of industrial flow solvers. In order to meet the ambitious
objectives, a partnership of 22 organizations from universities, research organizations and aerospace industry from 10 countries with well proven expertise in CFD has been set up guaranteeing
high level research work with a clear path to industrial exploitation.

**Web:**
http://

**Grant:**France-Berkeley Fund

**Dates:**2007-2009.

**Partners:**CERFACS, ENSEEIHT-IRIT and GRAAL (INRIA project)

**Overview:**In the framework of the France-Berkeley Fund, we have been awarded a research grant to enable an exchange program involving both young and confirmed scientists. The
collaboration will focus on massively parallel solvers for large sparse matrices and will reinforce the collaboration initiated by Emmanuel Agullo. On the French side, this project also
involves I. Duff (CERFACS) and J. L'Excellent (GRAAL, INRIA project).

**Partners:**ENSEEIHT-IRIT and GRAAL (INRIA project)

**Overview:**

The JAEA (Japan Atomic Energy Agency) develops its own solvers in sparse linear algebra for its applications in numerical simulations. The collaboration, initiated by Prof. M. Dayde (ENSEEIHT-IRIT), started by the organization of the REDIMPS workshop at Tokyo (may 2007), mainly around the TLSE plateform and the DIET middleware. However, during this workshop, researchers from JAEA have highlighted the need to compare their approaches with the solvers we are developing.

Rémi Abgrall is scientific associate editor of the international journals “Mathematical Modelling and Numerical Analysis”, “Computer and Fluids”, “Journal of Computational Physics”, “Journal of Scientific Computing” and “Journal of Computing Science and Mathematics”. He is member of the scientific committee of the international conference ICCFD. He is member of the CFD committee of ECOMAS. He is also member of the scientific committee of CERFACS and that of the ANR “Intensive Computation and Simulation” theme. He has also been member of the scientific comitee of the "Numerical flow models for controlled fusion" conference, Porquerolles, France, April 2007. He is member of the Comité National des Universités, section 26. He is member of the board of the GAMNI group of SMAI.

Olivier Coulaud is member of the scientific committee of the international conference VECPAR'08 and member of the thematic CP6 committee for IDRIS, CINES and CCRT.

Cécile Dobrzynski has been one of the organisors of the CEMRACS
http://

Pierre Ramet is member of the commitee of Researchers at CINES for the thematic number 6 (mathematics).

Jean Roman is President of the Project Committee of INRIA Futurs and member of the National Evaluation Committee of INRIA. He has been member of the scientific committee of the international conferences ADVCOMP07, CSC07 (SIAM), Preconditionning07 (SIAM), EuroMicroPDP08 (IEEE) and of the national conference Renpar'07. He is member of the ANR steering committee for the “Intensive Computation and Simulation” theme and of the “Strategic Comity for Intensive Computation” of the French Research Ministry.

In complement of the normal teaching activity of the university members and of ENSEIRB members, Olivier Coulaud and Pascal Hénon teach at ENSEIRB (computer science engineering school).

Mario Ricchiuto has given lectures in the “Mastère Spécialisé en Ingénierie Aéronautique et Spatiale” organised by ENSAM, MatMéca, ENSEIRB, Institu de Cognitique and several local industrial partners. He also teaches at MatMéca.