BACCHUS is a joint team of INRIA Bordeaux - Sud-Ouest, LaBRI (Laboratoire Bordelais de Recherche en Informatique – CNRS UMR 5800, University of Bordeaux and IPB) and IMB (Institut
Mathématique de Bordeaux – CNRS UMR 5251, University of Bordeaux). BACCHUS has been created on the first of January, 2009 (
http://

The purpose of the
`BACCHUS`project is to analyze and solve efficiently scientific computation problems that arise in complex research and industrial applications and that involve scaling. By scaling we
mean that the applications considered require an enormous computational power, of the order of tens or hundreds of teraflops, and that they handle huge amounts of data. Solving these kinds of
problems requires a multidisciplinary approach involving both applied mathematics and computer science.

Our major focus are fluid problems, and especially the simulation of
*physical wave propagation problems*including fluid mechanics, inert and reactive flows, multimaterial and multiphase flows, acoustics, etc.
`BACCHUS`intends to contribute to the solution of these problems by bringing contributions to all steps of the development chain that goes from the design of new high-performance, more
robust and more precise numerical schemes, to the creation and implementation of optimized parallel algorithms and high-performance codes.

By taking into account architectural and performance concerns from the early stages of design and implementation, the high-performance software which will implement our numerical schemes will be able to run efficiently on most of today's major parallel computing platforms (UMA and NUMA machines, large networks of nodes, production GRIDs).

A large number of engineering problems involve fluid mechanics. They may involve the coupling of one or more physical models. An example is provided by aeroelastic problems, which have been
studied in details by other INRIA teams. Another example is given by flows in pipelines where the fluid (a mixture of air–water–gas) does not have well-known physical properties, and there are
even more exotic situations that will be discussed later. Another application is the influence of fluid flow on noise production. Problems in aeroacoustics are indeed becoming more and more
important in everyday life. In some occasions, one needs specific numerical tools to take into account
*e.g.*a fluids' exotic equation of state, or because the amount of required computational resources becomes huge, as in unsteady flows. Another situation where specific tools are needed is
when one is interested in very specific physical quantities, such as
*e.g.*the lift and drag of an airfoil, a situation where commercial tools can only provide a very crude answer.

It is a fact that there are many commercial codes. They allow users to simulate a lot of different flow types. The quality of the results is however far from optimal in many cases. Moreover,
the numerical technology implemented in these codes is often not the most recent. To give a few examples, consider the noise generated by wake vortices in supersonic flows (external
aerodynamics/aeroacoustics), or the direct simulation of a 3D compressible mixing layer in a complex geometry (as in combustion chambers). Up to our knowledge, due to the very different
temporal and physical scales need to be captured, a direct simulation of these phenomena is not in the reach of the most recent technologies because the numerical resources required are
currently unavailable !
*We need to invent specific algorithms for this purpose.*

In order to efficiently simulate these complex physical problems, we are working on some fundamental aspects of the numerical analysis of non linear hyperbolic problems.
*Our goal is to develop more accurate and more efficient schemes that can adapt to modern computer architectures*.

More precisely,
*we are working on a class of numerical schemes*, known in literature as Residual Distribution schemes,
*specifically tailored to unstructured and hybrid meshes*. They have the most possible compact stencil that is compatible with the expected order of accuracy. This
*accuracy is at least of second order, and it can go up to any order of accuracy, even though fourth order is considered for practical applications.*Since the stencil is compact, the
implementation on parallel machines becomes simple. These schemes are very flexible in nature, which is so far one of the most important advantage over other techniques. This feature has
allowed us to adapt the schemes to the requirements of different physical situations (
*e.g.*different formulations allow either en efficient explicit time advancement for problems involving small time-scales, or a fully implicit space-time variant which is unconditionally
stable and allows to handle stiff problems where only the large time scales are relevant). This flexibility has also enabled to devise a variant using the same data structure of the popular
Discontinuous Galerkin schemes, which are also part of our scientific focus.

The compactness of the second order version of the schemes enables us to use efficiently the high performance parallel linear algebra tools developed by the team. However, the high order versions of these schemes, which are under development, require modifications to these tools taking into account the nature of the data structure used to reach higher orders of accuracy. This leads to new scientific problems at the border between numerical analysis and computer science. In parallel to these fundamental aspects, we also work on adapting more classical numerical tools to complex physical problems such as those encountered in interface flows, turbulent or multiphase flows, geophysical flows, and material science.

We expect within a few years to be able to demonstrate the potential of our developments on applications ranging from the the reproduction of the complex multidimensional interactions between tidal waves and estuaries, unsteady aerodynamics and aeroacoustics associated to both external and internal compressible flows, compressible ideal and non-ideal MHD (in relation with the ITER project), and the behavior of complex materials. This will be achieved by means of a multi-disciplinary effort involving our research on residual discretizations schemes, the parallel advances in algebraic solvers and partitioners, and the strong interactions with specialists in computer science, scientific computing, physics, mechanics, and mathematical modeling.

Our research in numerical algorithms has led to the development of the
`RealfluiDS`platform which is described in section
. New software developments are under way in the field of free surface flows and
complex materials modeling. These developments are performed in the code
`SLOWS`(Shallow-water fLOWS) for free surface flows, and in the solver
`COCA`(CodeOxydationCompositesAutocicatrisants) for the simulation of the self-healing process in composite materials. These developments will be described in sections
and
.

This work is supported by the EU-Strep IDIHOM, various research contracts and in part by the ANEMOS projects and the ANR-Emergence RealFluids grant. A large part of the team also beneficiates of the ERC grant ADDECCO.

Another topic of interest is the quantification of uncertainties in non linear problems. In many applications, the physical model is not known accurately. The typical example that of
turbulence models in aeronautics. These models all depend on a number of parameters which can radically change the output of the simulation. Being impossible to lump the large number of
temporal and spatial scales of a turbulent flow in a few model parameters, these values or often calibrated to quantitatively reproduce a certain range of effects observed experimentally. A
similar situation is encountered in many applications such as real gas or multiphase flows, where the equation of state form suffer from uncertainties, and free surface flows with sediment
transport, where often both the hydrodynamic model and the sediment transport model depend on several parameters, and my have more than one formal expression. This type of uncertainty, called
*epistemic*is associated to a lack of knowledge and could be reduced by further experiments and investigation. Instead, another type of uncertainty, called textitaleatory, is related to
the intrinsec aleatory quality of a physical measure and can not be reduced. The dependency of the numerical simulation from these uncertainties can be studied by propagation of chaos
techniques such as those developped during the recent years via polynomial chaos techniques. Different implementations exists, depending whether the method is intrusive or not. The accuracy of
these methods is still a matter of research, as well how they can handle an as large as possible number of uncertainties or their versatility with respect to the structure of the random
variable pdfs. Our objective is to develop some non-intrusive or semi-intrusive methods, trying to define an unified framework for obtained a reliable and accurate numerical solution at a
moderate computational cost.

This part of our activities is supported by the ERC grant ADDECCO, the ANR-MN projetc UFO and the associated team AQUARIUS.

Solving large sparse systems

Sparse direct solvers are mandatory when the linear system is very ill-conditioned; such a situation is often encountered in structural mechanics codes, for example. Therefore, to obtain an industrial software tool that must be robust and versatile, high-performance sparse direct solvers are mandatory, and parallelism is then necessary for reasons of memory capability and acceptable solving time. Moreover, in order to solve efficiently 3D problems with more than 50 million unknowns, which is now a reachable challenge with new multicore supercomputers, we must achieve good scalability in time and control memory overhead. Solving a sparse linear system by a direct method is generally a highly irregular problem that induces some challenging algorithmic problems and requires a sophisticated implementation scheme in order to fully exploit the capabilities of modern supercomputers.

In the
`BACCHUS`project, we focused first on the block partitioning and scheduling problem for high performance sparse parallel factorization with static pivoting for large sparse symmetric
systems. Our strategy is suitable for non-symmetric sparse matrices with symmetric pattern, and for general distributed heterogeneous architectures the computation and communication
performance of which are predictable in advance. This has led to software developments (see sections
,
)

In addition to the project activities on direct solvers, we also study some robust preconditioning algorithms for iterative methods. The goal of these studies is to overcome the huge memory consumption inherent to the direct solvers in order to solve 3D problems of huge size (several million of unknowns). Our studies focus on the building of generic parallel preconditioners based on ILU factorizations. The classical ILU preconditioners use scalar algorithms that do not exploit well CPU power and are difficult to parallelize. Our work aims at finding some unknown orderings and partitioning that lead to a dense block structure of the incomplete factors. Then, based on the block pattern, some efficient parallel blockwise algorithms can be devised to build robust preconditioners that are also able to fully exploit the capabilities of modern high-performance computers.

In this context, we study two approaches.

The first approache is to define an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers. Such incomplete factorization can take advantage of the latest breakthroughs in sparse direct methods and particularly should be very competitive in CPU time (effective power used from processors and good scalability) while avoiding the memory limitation encountered by direct methods. By this way, we expect to be able to solve systems in the order of hundred million of unknowns and even one billion of unknowns. Another goal is to analyze and justify the chosen parameters that can be used to define the block sparse pattern in our incomplete factorization. The driving rationale for this study is that it is easier to incorporate incomplete factorization methods into direct solution software than it is to develop new incomplete factorizations.

Our main goal at this point is to achieve a significant diminution of the memory needed to store the incomplete factors (with respect to the complete factors) while keeping enough fill-in to make the use of BLAS3 (in the factorization) and BLAS2 (in the triangular solves) primitives profitable.

In this approach, we focus on the critical problem to find approximate supernodes of ILU(k) factorizations. The problem is to find a coarser block structure of the incomplete factors.
The “exact” supernodes that are exhibited from the incomplete factor non zero pattern are usually very small and thus the resulting dense blocks are not large enough for an efficient use
of the BLAS3 routines. A remedy to this problem is to merge supernodes that have nearly the same structure. The benefits of this approach have been shown in
. These algorithms are implemented in the
`PaStiX`library.

The second technique makes use of a Schur complement approach.

In recent years, a few Incomplete LU factorization techniques were developed with the goal of combining some of the features of standard ILU preconditioners with the good scalability features of multilevel methods. The key feature of these techniques is to reorder the system in order to extract parallelism in a natural way. Often a number of ideas from domain decomposition are utilized and mixed to derive parallel factorizations.

Under this framework, we developed in collaboration with Yousef Saad (University of Minnesota) algorithms that generalize the notion of “faces” and “edge” of the “wire-basket” decomposition. The interface decomposition algorithm is based on defining a “hierarchical interface structure” (HID). This decomposition consists in partitioning the set of unknowns of the interface into components called connectors that are grouped in “classes” of independent connectors .

In the context of robust preconditioner technique, we have developed an approach that uses the HID ordering to define a new hybrid direct-iterative solver. The principle is to build a
decomposition of the adjacency matrix of the system into a set of small sub-domains (the typical size of a sub-domain is around a few hundreds or thousand nodes) with overlap. We build
this decomposition from the nested dissection separator tree obtained using a sparse matrix reordering software as
`Scotch`. Thus, at a certain level of the separator tree, the sub-trees are considered as the interior of the sub-domains and the union of the separators in the upper part of the
elimination tree constitutes the interface between the sub-domains.

The interior of these sub-domains are treated by a direct method. Solving the whole system is then equivalent to solve the Schur complement system on the interface between the sub-domains which has a much smaller dimension. We use the hierarchical interface decomposition (HID) to reorder and partition this system. Indeed, the HID gives a natural dense block structure of the Schur complement. Based on this partition, we define some efficient block preconditioners that allow the use of BLAS routines and a high degree of parallelism thanks to the HID properties.

We propose several algorithmic variants to solve the Schur complement system that can be adapted to the geometry of the problem: typically some strategies are more suitable for systems coming from a 2D problem discretisation and others for a 3D problem; the choice of the method also depends on the numerical difficulty of the problem. This has led to software developments (see sections )

Finding vertex separators for sparse matrix ordering is only one of the many uses of generic graph partitioning tools. For instance, finding balanced and compact domains in problem graphs is essential to the efficiency of parallel iterative solvers. Here again, because of the size of the problems at stake, parallel graph partitioning tools are mandatory to provide good load balance and minimal communication cost.

The execution of parallel applications implies communication between processes executed on the different cores. On NUMA architectures which are strongly heterogeneous in terms of latency and capacity, communication cost strongly depends on the repartition of tasks among cores. Architecture-aware load balancing must take into account both the characteristics of the parallel applications (including for instance task processing costs and the amount of communication between tasks) and the topology of the target architecture (providing the powers of cores and the costs of communication between all of them). When processes are assumed to coexist simultaneously for all the duration of the program, this optimization problem is called mapping. A mapping is called static if it is computed prior to the execution of the program and is never modified at run-time.

The sequential
`Scotch`tool was able to perform static mapping since its first version, but this feature was not widely known nor used by the community. With the increasing need to map very large
problem graphs onto very large and strongly heterogeneous parallel machines (whether hierarchical NUMA clusters or GPU-based systems), there is an increasing demand for parallel static
mapping tools. Since, in the context of dynamic repartitioning, parallel partitioning software will have to run on their target architectures, parallel partitioning algorithms suitable for
efficient execution on such heterogeneous architectures have to be investigated.

Many simulations which model the evolution of a given phenomenon along with time (turbulence and unsteady flows, for instance) need to re-mesh some portions of the problem graph in order to capture more accurately the properties of the phenomenon in areas of interest. This re-meshing is performed according to criteria which are closely linked to the undergoing computation and can involve large mesh modifications: while elements are created in critical areas, some may be merged in areas where the phenomenon is no longer critical.

Performing such re-meshing in parallel creates additional problems. In particular, splitting an element which is located on the frontier between several processors is not an easy task, because deciding when splitting some element, and defining the direction along which to split it so as to preserve numerical stability most, require shared knowledge which is not available in distributed memory architectures. Ad-hoc data structures and algorithms have to be devised so as to achieve these goals without resorting to extra communication and synchronization which would impact the running speed of the simulation.

Most of the works on parallel mesh adaptation attempt to parallelize in some way all the mesh operations: edge swap, edge split, point insertion, etc. It implies deep modifications in the (re)mesher and often leads to bad performance in term of CPU time. An other work proposes to base the parallel re-meshing on existing mesher and load balancing to be able to modify the elements located on the frontier between several processors.

In addition, the preservation of load balance in the re-meshed simulation requires dynamic redistribution of mesh data across processing elements. Several dynamic repartitioning methods have been proposed in the literature , , which rely on diffusion-like algorithms and the solving of flow problems to minimize the amount of data to be exchanged between processors. However, integrating such algorithms into a global framework for handling adaptive meshes in parallel has yet to be done.

The main objective of the
`BACCHUS`project is to analyze and solve scientific computing problems coming from complex research and industrial applications that require a scalable approach. This allows us to
validate the numerical schemes, the algorithms and the associated software that we develop. We have today three reference application domains which are fluid mechanics, material physics and the
MHD simulation dedicated to the ITER project.

In these three domains, we study and simulate phenomena that are by nature multiscale and multiphysics, and which require enormous computing power. A major part of these works leads to industrial collaborations in particular with the CNES, ONERA, and with the french CEA/CESTA, CEA/Ile-de-France and CEA/Cadarache centers.

.

The numerical simulation of steady and unsteady flows is still a challenge due to the large margin of improvement in efficiency and accuracy of the underlying numerical schemes, and of their
computer implementation. The challenge is even greater when considering real applications involving complex geometries and large irregular unstructured grids. The numerical schemes developed in
`BACCHUS`are implemented using
`Scotch`,
`HIPS`and
`PaStiX`whenever the type of problems and the CPU requirements make this useful.

One of our application fields is the one of steady subsonic, transonic and supersonic flow problems when the equation of state is for example the one of air in standard conditions, or a more general one as in real gases and multiphase flows. This class of physical problems corresponds to “standard” aerodynamics and the models are those of the Euler equations and the Navier Stokes ones, possibly with turbulent effects. Here we consider residual distribution and SUPG schemes.

Another field of application is the one of
*unsteady*problems for the same physical models. Depending on the applications, the physical models considered involve the Navier-Stokes equations, or the non-linear or linearized
linearized Euler equations. The schemes we develop are the Residual distribution schemes
and Discontinuous Galerkin schemes
. Specific modifications, with respect to their steady counter parts, are done
in order to reduce dramatically the computational time, while maintaining the desired accuracy.

Detached-Eddy Simulation (DES) is a hybrid technique proposed by Spalart et al. in 1997 as a numerically feasible and plausibly accurate approach for predicting massively separated flows. Traditionally, high Reynolds number separated flows have been predicted using Reynolds Averaged Navier-Stokes equations (RANS). Although RANS models are considered as the most practical turbulence handling technique for industrial problems these models are not adapted to massively separated flows widely encountered when dealing with iced bodies. Another growing approach, Large-Eddy Simulation (LES), offers the advantage to directly compute the dominant unsteady structures of the flow. Unfortunately the high computational cost of applying LES to complete configurations such as an airplane, a submarine, or a road vehicle remains prohibitive because of the resolution required in the boundary layers. The aim of Detached-Eddy Simulation (DES) is to combine the most favourable aspects of both techniques, i.e., application of RANS models for predicting the attached boundary layer and LES for time-dependent three-dimensional large eddies. The cost scaling of the method is then affordable since LES is not applied to solve the relatively smaller structures that populate the boundary layer. Simulations of performance degradations due to icing have increased the demand for numerically feasible and accurate approach for predicting massively separated flows around complex geometries. In this aspect flow field predictions obtained using DES are encouraging. To obtain the DES model formulation, the length scale of the S-A destruction term is modified to be the minimum of the distance to the closest wall and a length scale proportional to the local grid spacing. Concurrently with its encouraging results, weaknesses of DES were discovered. Starting from a valid RANS solution, gradually refining the grid alters the solution in obscure ways. The grid is ambiguous and the DES equations fail to recognize that pure RANS behaviour was intended. Resolving the issue of ambiguous grids is a priority but as proven to be a resilient difficulty. A better understanding of the coupling mechanisms between the models is needed.

**Inflight icing:**

Every year, sudden aircraft performance degradation due to ice accretion causes several incidents and accidents. Icing is a serious and not yet totally mastered meteorological hazard due to supercooled water droplets that impact on aerodynamic surfaces. Icing results in performance degradations including substantial reduction of engine performance and stability, reduction in maximum lift and stall angle and an increase of drag. One of the most important challenges in understanding the performance degradation is the accurate prediction of complex and massively separated turbulent flows. Turbulent flows are currently modelled and computed using a variety of strategies. The majority of predictions around engineering geometries are obtained from solutions of the Reynolds Averaged Navier-Stokes (RANS) equations. These approaches are often acceptable in the thin shear layers where RANS methods have been calibrated. In other regimes, especially flows in which the turbulent eddies are not standard, i.e., not in the calibration range of the model, the performance of RANS models is, at best, uneven. This in turn motivates other strategies, one of them being Large-Eddy Simulation (LES). The application of LES to prediction of turbulent flows in practical configurations is increasing but the computational cost remains prohibitive. Within the past five years hybrid methods have emerged as a popular approach for predicting complex flows. Spalart et al. proposed DES as a cost-effective and plausibly accurate approach for predicting flows experiencing massive separation. Therefore, the overall objectives are the following:

Analysis of the DES approach;

Develop the DES model for the simulation of 3D turbulent flow;

Discuss the issues that impact the method, including the underlying RANS turbulence model and the simulation design for DES (grids and choice of time steps);

Use Airbus test cases to answer the following question: is it possible and advisable to use DES to quantify the performance degradation due to icing;

Potential benefits:

Help in the certification process;

Include the data in flight simulators to train pilots under icing operating conditions.

**Ice shedding:**

Actual concerns about greenhouse gases lead to changes in the design of aircraft with an increase use of composite materials. This in turns offers new possibilities for design of ice protection systems, thus renewing interest in de-icing simulation tools. To save fuel burn, aircraft manufacturers are investigating ice protection systems such as electro-thermal or electro-mechanical de-icing systems to replace anti-icing systems. By reducing the adhesive shear strength between ice and surface, de-icing systems remove ice formed on the protected surfaces following a periodic cycle. This cycle is defined such that inter cycle ice shapes remain acceptable from a performance point of view. One of the drawbacks of de-icing device is the ice pieces shed into the flow. The knowledge of ice shedding trajectories could allow assessing the risk of impact/ingestion on/in aircraft components located downstream. When the pieces leave the aircraft surface, they become projectiles that can hit and cause severe damage to aircraft surface or other components, such as aircraft horizontal and vertical tails, or aircraft engine. Aircraft certification authorities, such as FAA, have specific requirements for large ice fragment ingestion during engine certification. Control surfaces or wing flaps are also sensitive to ice shedding because they can be blocked by ice fragments. Aircraft manufacturers rely mainly on flight tests to evaluate the potential negative effects of ice shedding because of the lack of appropriate numerical tools. The random shape and size taken by ice shed particles together with their rotation as they move make it difficult for classical CFD tools to predict trajectories. The numerical simulation of a full unsteady viscous flow, with a set of moving bodies immersed within, shows several difficulties for grid based methods. Drawbacks income from the meshing procedure for complex geometries and the re-griding procedure in tracing the body motion. A new approach that take into account the effect of ice accretion on flow field is used to solve the ice trajectory problem. The approach is based on mesh adaptation, penalization method and level sets.

A challenging and important field of application is that of
*free surface flows for geophysical applications*such as the propagation of tsunamis, and their interaction with complex coastal environments. A model often used to simulate these
phenomena is the so-called
*shallow water*model, describing the dynamics of depth and depth averaged velocity of the water. These model, while bearing many similarities with the equations of compressible
gas-dynamics, present many peculiarities : the presence of source terms modeling the effects of bathymetry variations and of friction on the bottom and often controlling the dynamics of
the flow, the fact that dry states occur normally (differently from vacuum in gas-dynamics), and that their dynamics when considering wave/coast interaction is one of the most important
outputs of the simulation. Our work aims at borrowing tools developed in the context of industrial/aeronautics applications for these environmental applications. In particular, we have
adapted to this model the residual schemes used for aeronautic applications
, showing a very important potential of this class of numerical schemes for
these applications.

An important field of application consists in the use of real-gas thermodynamic model for the simulation of turbulent flows in turbine cascade. The aim is to demonstrate the potentiality
of BZT fluids for turbine applications. BZT fluids are characterized by negative values of the fundamental derivative of gasdynamics for a range of temperatures and pressures in the vapor
phase, which leads to non- classical gasdynamic behaviors such as the disintegration of compression shocks. The non-classical phenomena typical of BZT fluids have several practical outcomes:
prominent among them is an active research effort to reduce losses caused by wave drag and shock/boundary layer interactions in turbomachines and nozzles, with particular application to ORCs
used to generate electric energy in low-power applications. The use of BZT fluids as ORC working fluids is potentially interesting because the shock formation and the consequent losses could
be ideally avoided if turbine expansion could happen entirely within or very close to particular region called
*inversion zone*where the fundamental derivative of gasdynamics is negative. In fact, as recently investigated, rarefaction shock waves are physically admissible in the inversion region.
Within this project, several advancements with regards to the thermodynamic modeling of the fluids
, the numerical simulation of the fluid flow and the
cross-validation of the numerical results
, and the robust optimization of some simple configuration
,
,
, have been performed. Here we consider more classical
finite-volume scheme (HLL scheme with a second-order spatial accuracy ensured by means of a MUSCL-type reconstruction).

In the context of a previous ANR project called ASTER (Adaptive MHD Simulation of Tokamak Elms for iteR), we have established a collaboration with the physicists of the CEA/DRFC group. The
magneto-hydrodynamic instability called ELM for Edge Localized Mode is commonly observed in the standard tokamak operating scenario. The energy losses the ELM will induce in ITER plasmas are a
real concern. However, the current understanding of what sets the size of these ELM induced energy losses is extremely limited. Recently, encouraging results on the simulation of an ELM cycle
have been obtained with the
`JOREK`code developed at CEA but at reduced toroidal resolution. The
`JOREK`code uses a fully implicit time evolution scheme in conjunction with the
`PaStiX`sparse matrix library.

To improve the order of the spatial representation of the variables and their gradients, the so-called Bezier finite elements have been developed and implemented in the
`JOREK`code. This allows an accurate alignment of the finite elements with the magnetic geometry of tokamak plasmas. This alignment is necessary due to the large anisotropy of the
physics behavior along and perpendicular to the magnetic fieldlines. The Bezier elements, an extension of the standard cubic Hermite elements, allow the local refinement of the elements. During
a postdoctoral position, H. Sellama has implemented an adaptive refinement and successfully applied to a tearing instability test case and to the injection of pellets in the plasma.

The fully implicit time evolution scheme in the
`JOREK`code leads to large sparse matrices which have to be solved at every time step. The MHD model leads to very badly conditioned matrices. In principle the
`PaStiX`library can solve these large sparse using the direct method. However, for large 3D problems the CPU time for the direct solver becomes too large. Iterative solution methods
require a preconditioner adapted to the problem. Many of the commonly used preconditioners have been tested but no satisfactory solution has been found. Instead, a physics based preconditioner
has been constructed by using the diagonal block for each of the Fourier modes in the toroidale direction. This means the preconditoner represents the linear part of each harmonic but neglects
the interaction between harmonics. This scheme leads independent matrices that are factorized and solved in parallel using the
`PaStiX`solver. A GMRES iterative solver with the preconditioner has proved to be an efficient solver for the non-linear MHD code. The developments of the
`JOREK`code in the ASTER project have allowed simulating ELMs with a much improved accuracy in a real 3D geometry. The typical problem size has increased from

We develop two kinds of software. The first one consists in generic libraries that are used within application codes. These libraries comprise a sequential and parallel partitioner for large
irregular graphs or meshes (
`Scotch`), a middleware library for distributed mesh handling (
`PaMPA`), and high performance direct or hybrid solvers for very large sparse systems of equations (
`PaStiX`and
`HIPS`). The second kind of software corresponds to dedicated software for fluid mechanics including the team's historical code
`RealfluiDS`, and the more recent developments
`Aerosol`,
`SLOWS`, and
`COCA`.

For parallel software developments, we use the message passing paradigm (basing on the MPI interface), sometimes combined with threads so as to exploit multi-core architectures at their best: in some computation kernels such as solvers, when processing elements reside on the same compute node, message buffer space can be saved because the aggregation of partial results can be performed directly in the memory of the receiving processing element. Memory savings can be tremendous, and help us achieve problem sizes which could not be reached before (see Section ).

`RealfluiDS`is a software dedicated to the simulation of inert or reactive flows. It is also able to simulate multiphase, multimaterial, MHD flows and turbulent flows. There exist 2D and
3D dimensional versions. The 2D version is used to test new ideas that are later implemented in the 3D one. This software implements the more recent residual distribution schemes. The code has
been parallelized with and without overlap of the domains. An Uncertainty Quantification library has been added to the software. A partitioning tool exists in the package, which uses
`Scotch`. In the coming years, all the know how of
`RealfluiDS`will be transfered to
`Aerosol`.

The software AeroSol is jointly developed in the teams Bacchus and Cagire. It is a high order finite element library written in C++. The code design has been carried for being able to perform efficient computations, with continuous and discontinuous finite elements methods on hybrid and possibly curvilinear meshes. The distribution of the unknowns is made with the software PaMPA, developed within the team Bacchus and the team Pumas. Maxime Mogé has been hired on a young engineer position (IJD) obtained in the ADT OuBa HOP for participating to the parallelization of the library, and arrived on November, 1st 2011.

Current features include

**development environement**use of CMake for compilation, CTest for automatic tests and memory checking, lcov and gcov for code coverage reports.

**In/Out**link with the XML library for handling with parameter files. Reader for GMSH, and writer on the VTK-ASCII legacy format.

**Quadrature formula**up to 11th order for Lines, Quadrangles, Hexaedra, Pyramids, Prisms, up to 14th order for tetrahedron, up to 21st order for triangles.

**Finite elements**up to fourth degree for Lagrange finite elements on lines, triangles and quadrangles.

**Geometry**elementary geometrical functions for first order lines, triangles, quadrangles.

**Time iteration**explicit Runge-Kutta up to fourth order, explicit Strong Stability Preserving schemes up to third order.

**Linear Solvers**link with the external linear solver UMFPack.

**Memory handling**discontinuous and continuous discretizations based on PaMPA for triangular and quadrangular meshes.

**Numerical schemes**continuous Galerkin method for the Laplace problem (up to fifth order) with non consistent time iteration or with direct matrix inversion. Scalar stabilized residual
distribution schemes with explicit Euler time iteration have been implemented for steady problems.

`SLOWS`(Shallow-water fLOWS) is a
`C-`platform allowing the simulation of free surface shallow water flows with friction. Arbitrary bathymetries are allowed, defined either by some complex piecewise analytical
expression, or by

`COCA`(CodeOxydationCompositesAutocicatrisants) is a
`fortran-90`code for the simulation of the oxidation process in self-healing composite materials, developed in collaboration with the Laboratoire des Composites ThermoStructuraux in
Bordeaux (UMR-5801 LCTS). This process involves the chemical oxidation of some of the matrix components of the composite, and the production of a liquid oxide that flows and fills material
cracks, acting as a diffusion barrier against oxygen and thus protecting the ceramic fibers of the material.
`COCA`simulates this process using a finite element discretization of the model equations. In its current version only transverse cracks are available.
`COCA`makes use of
`PaStiX`to solve the algebraic systems arising from the discretization.

This work is supported by the French “Commissariat à l'Énergie Atomique CEA/CESTA” in the context of structural mechanics and electromagnetism applications.

`PaStiX`(
http://
`RealfluiDS`(see Section
). The
`PaStiX`library is released under INRIA CeCILL licence.

The
`PaStiX`library uses the graph partitioning and sparse matrix block ordering package
`Scotch`(see Section
).
`PaStiX`is based on an efficient static scheduling and memory manager, in order to solve 3D problems with more than 50 million of unknowns. The mapping and scheduling algorithm handles a
combination of 1D and 2D block distributions. This algorithm computes an efficient static scheduling of the block computations for our supernodal parallel solver which uses a local aggregation
of contribution blocks. This can be done by taking into account very precisely the computational costs of the BLAS 3 primitives, the communication costs and the cost of local aggregations.
We also improved this static computation and communication scheduling algorithm to anticipate the sending of partially aggregated blocks, in order to free memory dynamically. By doing this, we
are able to reduce the aggregated memory overhead, while keeping good performance.

Another important point is that our study is suitable for any heterogeneous parallel/distributed architecture when its performance is predictable, such as clusters of multicore nodes. In particular, we now offer a high performance version with a low memory overhead for multicore node architectures, which fully exploits the advantage of shared memory by using an hybrid MPI-thread implementation.

Direct methods are numerically robust methods, but the very large three dimensional problems may lead to systems that would require a huge amount of memory despite any memory optimization. A studied approach consists in defining an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers. Such incomplete factorization can take advantage of the latest breakthroughs in sparse direct methods and particularly should be very competitive in CPU time (effective power used from processors and good scalability) while avoiding the memory limitation encountered by direct methods.

`HIPS`(Hierarchical Iterative Parallel Solver) is a scientific library that provides an efficient parallel iterative solver for very large sparse linear systems.

The key point of the methods implemented in
`HIPS`is to define an ordering and a partition of the unknowns that relies on a form of nested dissection ordering in which cross points in the separators play a special role
(Hierarchical Interface Decomposition ordering). The subgraphs obtained by nested dissection correspond to the unknowns that are eliminated using a direct method and the Schur complement system
on the remaining of the unknowns (that correspond to the interface between the sub-graphs viewed as sub-domains) is solved using an iterative method (GMRES or Conjugate Gradient at the time
being). This special ordering and partitioning allows for the use of dense block algorithms both in the direct and iterative part of the solver and provides a high degree of parallelism to
these algorithms. The code provides a hybrid method which blends direct and iterative solvers.
`HIPS`exploits the partitioning and multistage ILU techniques to enable a highly parallel scheme where several subdomains can be assigned to the same process. It also provides a scalar
preconditioner based on the multistage ILUT factorization.

`HIPS` can be used as a standalone program that reads a sparse linear system from a file ; it also provides an interface to be called from any C, C++ or Fortran code. It handles
symmetric, unsymmetric, real or complex matrices. Thus,
`HIPS`is a software library that provides several methods to build an efficient preconditioner in almost all situations.

Since august 2008,
`HIPS`is publicly available at
http://

`Scotch`(
http://

The initial purpose of
`Scotch`was to compute high-quality partitions and static mappings of valuated graphs representing parallel computations and target architectures of arbitrary topologies. The original
contribution consisted in developing a “
*divide and conquer*” algorithm in which processes are recursively mapped onto processors by using graph bisection algorithms that are applied both to the process graph and to the
architecture graph. This allows the mapper to take into account the topology and heterogeneity of the valuated graph which models the interconnection network and its resources (processor speed,
link bandwidth). As new multicore, multinode parallel machines tend to be less uniform in terms of memory latency and communication bandwidth, this feature is regaining interest.

The software has then been extended in order to produce vertex separators instead of edge separators, using a multilevel framework. Recursive vertex separation is used to compute orderings of the unknowns of large sparse linear systems, which both preserve sparsity when factorizing the matrix and exhibit concurrency for computing and solving the factored matrix in parallel.

Version
`5.0`of
`Scotch`, released on August 2007, was the first version to comprise parallel routines. This extension, called
`PT-Scotch`(for “
*Parallel Threaded*
`Scotch`
*”), is based on a distributed memory model, and makes use of the MPI and, optionally, Posix thread APIs. A distributed graph structure has been defined, which allows users to reserve
vertex indices on each processor for future local adaptive refinement. Its parallel graph ordering routine provides orderings which are of the same quality as the ones yielded by the sequential
Scotchordering routine, while competing software
ParMETISexperiences a severe loss of quality when the number of processors increase.
Scotch
5.0was released under the CeCILL-C free/libre software license, and has been registered at APP (“Agence pour la Protection des Programmes”).*

Version
`5.1`of
`Scotch`, released on September 2008, extended the parallel features of
`PT-Scotch`, which can now compute graph partitions in parallel by means of a parallel recursive bipartitioning framework. Release
`5.1.10`had made
`Scotch`the first full 64-bit implementation of a general purpose graph partitioner, so that
`PT-Scotch`has been able to successfully break the “32-bit” barrier and partition a graph above 2 billion vertices, spread across 2048 processors, at the French CCRT computer center.

Version
`6.0`, about to be released, offers new sequential features: static mapping with fixed vertices, static remapping, and static remapping with fixed vertices.

`Scotch`has been integrated in numerous third-party software, which indirectly contribute to its diffusion. For instance, it is used by the
Zoltanmodule of the
Trilinossoftware (SANDIA Labs), by
Code_Aster Libre, a GPLed thermal and mechanical analysis software developed by French state-owned electricity producer EDF, by the parallel solvers
`MUMPS`(ENSEEITH/IRIT, LIP and LaBRI),
`SuperLUDist`(U.C. Berkeley),
`PaStiX`(LaBRI) and
`HIPS`(LaBRI), as well as by several other scientific computing software.

`MMG3D`is a tetrahedral fully automatic remesher. Starting from a tetrahedral mesh, it produces quasi-uniform meshes with respect to a metric tensor field. This tensor prescribes a
length and a direction for the edges, so that the resulting meshes will be anisotropic. The software is based on local mesh modifications and an anisotropic version of Delaunay kernel is
implemented to insert vertices in the mesh. Moreover,
`MMG3D`allows one to deal with rigid body motion and moving meshes. When a displacement is prescribed on a part of the boundary, a final mesh is generated such that the surface points
will be moved according this displacement.
`MMG3D`is used in particular in GAMMA for their mesh adaptation developments, but also at EPFL (maths department), Dassault Aviation, Lemma (a french SME), etc.
`MMG3D`can be used in
`FreeFem++`(
http://

`Montjoie`is a finite element code initially handling only quadrilateral/hexaedral elements. Because of the tensorization of these elements, efficient algorithms can be written for the
computation of finite element matrices. It can handle tetrahedra, prisms, pyramids, hexaedra with continuous finite element, edge elements and discontinuous Galerkin formulations. A local order
of approximation can be used in each element of the mesh.

The developement of PLATO (A platform for Tokamak simulation) (
http://

A (small) database corresponding to axi-symmetrical solutions of the equilibrium plasma equations for realistic geometrical and magnetic configurations (ToreSupra, JET and ITER). The construction of meshes is always an important time consuming task. Plato will provide meshes and solutions corresponding to equilibrium solutions that will be used as initial data for more complex computations.

A set of tool for the handling, manipulation and transformation of meshes and solutions using different discretisations (P1, Q1, P3, etc)

Numerical templates allowing the use of 3D discretization schemes using finite element schemes in the poloidal plane and spectral Fourier or structured finite volume representations in the toroidal one.

Several applications (Ideal MHD and drift approximation) used in the framework of the Inria large scale initiative "FUSION”.

`PaMPA`(“Parallel Mesh Partitioning and Adaptation”) is a middleware library dedicated to the management of distributed meshes. Its purpose is to relieve solver writers from the tedious
and error prone task of writing again and again service routines for mesh handling, data communication and exchange, remeshing, and data redistribution. It is based on a distributed data
structure that represents meshes as a set of
*entities*(elements, faces, edges, nodes, etc.), linked by
*relations*(that is, computation dependencies).

Version
`0.1`allows users to declare a distributed mesh, declare values attached to the entities of the meshes (e.g. temperature attached to elements, pressures to the faces, etc.), exchange
values between overlapping entities located at the boundaries of subdomains assigned to different processors, and iterate over the relations of entities (e.g. iterate over the faces of
elements).

`PaMPA`is already used as the data structure manager for two solvers being developed at INRIA:
`Plato`and
`Aerosol`.

`PaMPA`will soon interface with
`Scotch`for mesh redistribution, and with
`MMG3D`to offer parallel remeshing features (in this particular example, for tetrahedral elements).

This year, many developments have been conducted and implemented in the
`RealfluiDS`and
`SLOWS`software after the initial ideas discussed in
and in
,
,
, which have opened up many doors.

First of all, the parallel three dimensional high order extension of the scheme of
has been finally validated on several external aerodynamics
configurations
, including the classical ONERA M6 wing case on a large mesh
containing

Meanwhile, the improvement of the treatment of viscous terms has been investigated within the PhD theses of G. Baurin and D. DeSantis
,
. The validation on laminar flows of a classical formulation based
on a Petrov-Galerkin approach
has shown its limitations. An improved formulation, based on a
recovery of the solution gradient, has been proposed and tested. In both the second and third order cases, while showing the improvement in accuracy for steady state laminar flows, the
results also show a slow iterative convergence, and a systematic small accuracy drop when the cell Reynolds number is of order one. These issues are currently under investigations, while the
current formulation is being enhanced by adding a Spalart Almaras turbulent model. Contributions to these activities come from the PhD of Guillaume Baurin, who has implemented the third order
version of our methods in a real industrial platform (N3S Natur of SAFRAN developped by Incka), and from the PhD of Dante DeSantis who is developing the turbulent implementation in
`RealfluiDS`within the EU project IDIHOM.

Meanwhile, we are refining and validating the extension of the schemes to elements using improved approximations based on Bézier and NURBS polynomials. the initial two-dimensional implementation , of third and fourth order schemes on curved meshes is now being enhanced by adding a local mesh refinement procedure and is also currently being extended to three space dimensions. Contributions to this topic come from the PhD of Algiane Froehly.

R. Abgrall has extended the RDS formalism to Lagrangian hydrodynamics. The results are comparable to what can be obtained for more standard methods, a publication is in preparation.

Concerning time dependent flows, the ideas of , have led to two main lines of developments. On one hand, the unconditionally second order and stable space-time approach of has been further validated and extended to higher orders of accuracy . The main advantage of this technique is its ability to preserve monotonicity unconditionally w.r.t. the time step. This has interesting applications in shallow water flows in which the schemes previously developed did allow to preserve the positivity of the water depth, however with an inefficient implicit procedure constrained by an explicit-type time step restriction.

In parallel, the genuinely explicit formulation of has been combined with the positivity preserving approach of to obtain a genuinely explicit positivity preserving scheme for shallow water simulations . With a time step restriction quite close to that necessary for the scheme of , the approach proposed allows a very efficient explicit time stepping with a tenfold reduction of the computational time for the same accuracy level.

These developments are implemented in the
`SLOWS`platform and are thoroughly summarized in the manuscript
. We now dispose of a spectrum of numerical tools allowing either
classical temporal integration based on implicit multistep schemes, or on unconditionally stable and positivity preserving space-time schemes, or on a genuinely explicit approach. Current
developments aim at extending these tools to arbitrary accuracy, and at developing hybrid implicit/explicit approaches.

In this work, Héloïse Beaugendre, Boniface Nkonga and Christelle Wervaecke proposed a strongly coupled numerical formulation for the Spalart-Allmaras model, in the framework of Stabilized
Finite Element Methods. Computations are performed for compressible Newtonian fluids (2D and 3D) on unstructured grids of high aspect ratio. Results are compared with experimental data and
also with solutions obtained by different numerical strategies. The additional transport equations for subscale model are often numerically weakly coupled to Navier-Stokes equations through
operator splitting. These variables are strongly coupled for the transport process within a stabilized finite element formulation. The stabilization tensor is defined, such as to reduce mesh
dependencies and to still be consistent at the asymptotic of highly anisotropic meshes. Indeed, this tensor involves a measure of the local length scale

R. Abgrall and P.M. Congedo have made a detailed comparison between the semi-intrusive method developed last year with more classical non intrusive polynomial chaos methods, and Monte Carlo results. The effectiveness of this method is illustrated for a modified version of Kraichnan-Orszag three-mode problem where a discontinuous pdf is associated to the stochastic variable, and for a nozzle flow with shocks. The results have been analyzed in terms of accuracy and probability measure flexibility. Finally, the importance of the probabilistic reconstruction in the stochastic space is shown up on an example where the exact solution is computable, the viscous Burgers equation. These results have been reported in ,

Following this studies, two contributions have been obtained within the context of Gianluca Geraci's thesis. First one is an adaptive strategy, inspired by the Harten multi-resolution framework that has been developed in order to compute efficiently statistics. This preliminary work aims to show the potentialities of this approach in order to evaluate the possibility to include this strategy in the semi-intrusive method developed in the recent years. We obtained well-converged results with a lower computational cost due to a reduction of the numerical evaluations.

P.M. Congedo investigated the possibility to perform a stochastic inverse analysis by using an hybrid method within a Polynomial Chaos/Genetic Algorithms framework. This strategy has been applied on the numerical simulation of a dense gas shock-tube. Previous theoretical and numerical studies have shown that a rarefaction shock wave (RSW) is relatively weak and that the prediction of its occurrence and intensity are highly sensitive to uncertainties on the initial flow conditions and on the fluid thermodynamic model. The objective of this work has been to introduce an innovative, flexible and efficient algorithm combining computational fluid dynamics (CFD), uncertainty quantification (UQ) tools and metamodel-based optimization in order to obtain a reliable estimate for the RSW probability of occurrence and to prescribe the experimental accuracy requirements ensuring the reproducibility of the measurements with sufficient confidence.

Uncertainty quantification tools have been used to perform some applicative studies on epistemic uncertainties, in particular on some complex equations of state , and some turbulence models. We have also started considering the influence of model parameters uncertainties in free surface models for long-waves such as tsunamis , coupling the numerics developed in the team for shallow-water flows and the tools available for uncertainty quantifications. This is certainly a field of application where these developments will demonstrate very useful.

Within the associated team AQUARIUS activities (collaboration with Stanford University), two efficient global strategy for robust optimization have been developed. First one is based on the extension of simplex stochastic collocation to the optimization space, while the second one consists in an hybrid strategy using ANOVA decomposition. The Simplex Stochastic Collocation (SSC) method has been developed for adaptive uncertainty quantification (UQ) in computational problems with random inputs. In this work , we showed how this formulation based on Simplex space representation, discretization of non-hypercube probability spaces and adaptive refinements can be easily coupled with a well-known optimization method, i.e. Nelder-Mead algorithm, also known as Downhill Simplex Method. Numerical results showed that this method is very efficient for mono-objective optimization and minimizes global number of deterministic evaluations in order to determine optimal design. This method has been then applied to a realistic problem of robust optimization of a two-component race-car airfoil.

Two parallel lines of work on developments of numerical models for advanced materials have seen important developments this year.

On one hand, Rémi Abgrall and Pierre-Henri Maire (CEA Cesta) are extending the Lagrangian method developped a couple years ago and currently implemented in the CHIC code to elastodynamics. The stress tensor is no longer diagonal and here we consider the Wilkins model. The main difficulty is to understand the role of the second principle and how to deal with the von Mises criteria.

In parallel, Mario Ricchiuto and the group led by Gérard Vignoles at LCTS (UMR-5801 LCTS) have been developing a finite element numerical model of the evolution of the liquid oxide
evolution during the healing-phase taking place in the silicon-based composite materials similar to those used in SAFRAN's new aero-engines

Rémi Abgrall and Pierre-Henri Maire (CEA CESTA), with François Vilar (PhD at CELIA funded by a CEA grant started in October 2009), are working on fully Lagrangian schemes within the Discontinuous Galerkin schemes. The idea is to start from the formulation of the Euler equation in full Lagrange coordinates: the spatial derivative are written in Lagrangian coordinates. The mesh element are now curved and we are working on the geometrical conservation law. The application to several standard test case indicate the potential of the method.

Penalization methods are an efficient alternative to explicitly impose boundary conditions but their accuracy is generally of first order. In this work we combine the easiness of penalization techniques with the precision of unstructured anisotropic mesh adaptation. Level sets are used to describe the geometry so that geometrical and topological changes due to physics are straight forward to follow. Navier-Stokes simulations are performed and a new way to impose a slipping wall boundary condition is proposed.

A work on high order mesh generation has been pushed further. Starting with a

Moreover, we started to make high order mesh adaptation. That means we are able to refine high order meshes where the error is maximum and so we generate non uniform meshes of order

In parallel to these developments, we have started work on a generalized formal approach to obtain discrete adjoint equations for residual based and Petrov-Galerkin finite element schemes . We have shown that these discrete adjoint equations can now be used as a local error estimator for mesh refinement, giving to these methods to the same potential for adaptation of Galerkin schemes.

In the
`RealfluiDS`code, the Rusanov scheme for Ideal MHD has already shown its ability to capture discontinuities and its robustness many times in 2D problems
. But other spatial schemes could be interesting for applications in
tokamak experiments, since we may not encounter strong shocks in these cases. Hence, according to the type of the problem, coupled schemes could be used. We already developed the 4 well known
base RD schemes : Narrow, LDA, Rusanov and SU (a RD version of the SUPG scheme). Coupling may not be challenging, a working shock sensor is already implemented for stabilized methods. Very high
order of accuracy (at least 3rd order) should be reachable in all cases, the main parts of this work have already been done for several types of elements. The non-dimensionalized equations of
resistive MHD (with viscosity and heat transfer) have been added to the code with a Continuous Galerkin discretization. Also, 2nd order implicit and explicit methods were developed in all
cases. Once we succeed in ensuring a very good iterative convergence, taking into account the hyperbolic divergence cleaning technique in an unsteady context, we will be able to simulate plasma
instabilities. This is really the key issue for now. These results will be presented in the PhD defense of R. Huart planned at january 2012.

The
`JOREK`code is now able to use several hundred of processors routinely. Simulations of ELMs are produced taking into account the X-point geometry with both closed and open field lines.
But a higher toroidal resolution is required for the resolution of the fine scale filaments that form during the ELM instability. The complexity of the tokamak's geometry and the fine mesh that
is required leads to prohibitive memory requirements. In the current release, the memory scaling is not satisfactory: as one increases the number of processes for a given problem size, the
memory footprint on each process does not reduce as much as one can expect.

In the context of the new ANR proposal (
`ANEMOS`project), we are working to reduce memory consumers in the
`JOREK`code. Compression techniques can be foreseen to reduce the footprint of the matrix without having to pay large computation expenses. Moreover, the storage of the factorized
preconditioning matrix inside the direct solver takes also a large amount of memory. We have defined and developped a general programming interface for sparse linear solvers (
http://
`RealfluiDS`and
`JOREK`for
`HIPS`and
`PaStiX`. Using this common interface, we are looking for a fair distribution of data over the parallel processes in order to reduce memory consumption. The effective parallelization of
this assembly step is one of the main bottlenecks up to now, as far as memory usage is concerned. The GMRES driver is also a large consumer in terms of memory and we plan to consider an
up-to-date parallel implementation of this step.

Most of the work carried out within the
`Scotch`project (see section
) has been carried out in the context of the PhD of Sébastien Fourestier.

The first axis concerns dynamic repartitioning and remapping. A new set of sequential routines has been devised, which offers new features: mapping (including plain partitioning) with
fixed vertices, remapping, and remapping with fixed vertices. All of the above developments are about to be released in the major release
`6.0`of
`Scotch`. The porting of the remapping algorithms in parallel is being carried out, and will be part of release
`6.1`.

A work carried out in the Joint Laboratory for Petascale Computing (JLPC) between INRIA and UIUC resulted in the inclusion of
`Scotch`as a load balancer in the
`Charm++`parallel environment. A jointly written conference paper has been submitted on this subject. Another potential use of the remapping features of
`Scotch`concerns multi-phase mapping. Experiments are being carried out at UIUC regarding the use of
`Scotch`as a multi-phase mapper for the
`OpenAtom`scientific code.

This research topic deals with the design of efficient and scalable software tools for parallel dynamic remeshing. This is a joint work with Cécile Dobrzynski, in the context of the PhD of
Cédric Lachat, funded by a CORDI-S grant managed by the
`PUMAS`team.

`PaMPA`(see Section
) is a middleware library dedicated to the management of distributed
meshes.

The software development of
`PaMPA`is going on. The internal data structure for representing meshes has been frozen, and developments are in progress. The first developments aimed at proving the efficiency of the
planned API for handling distributed meshes. A simple P1 FEM Laplacian solver has been written over
`PaMPA`by the PUMAS team to demonstrate how to iterate over
`PaMPA`entities (elements and nodes) and access values borne by the entities, so as to perform FEM computations. These features are available in version
`0.1`, which has not yet been diffused to other interested parties. Several new potential users are already willing to try out this version, e.g. ONERA.

`PaMPA`is already used as the data structure manager for two solvers being developed at INRIA: the
`Plato`solver being developed by the PUMAS team, and the
`Aerosol`new generation fluid dynamics solver being developed in the context of the PhD of Damien Genêt. The interaction with these users allows us to refine the interface to match
their needs.

This work now focuses on the core of the PhD of Cédric Lachat: interfacing
`PaMPA`with
`MMG3D`to demonstrate the ability of
`PaMPA`to perform parallel mesh adaptation.

New supercomputers incorporate many microprocessors which include themselves one or many computational cores. These new architectures induce strongly hierarchical topologies. These are
called NUMA architectures. In the context of distributed NUMA architectures, in collaboration with the INRIA RUNTIME team, we study optimization strategies to improve the scheduling of
communications, threads and I/O. Sparse direct solvers are a basic building block of many numerical simulation algorithms. We have developed dynamic scheduling designed for NUMA architectures
in the
`PaStiX`solver. The data structures of the solver, as well as the patterns of communication have been modified to meet the needs of these architectures and dynamic scheduling. We are
also interested in the dynamic adaptation of the computation grain to use efficiently multi-core architectures and shared memory. Experiments on several numerical test cases have been
performed to prove the efficiency of the approach on different architectures.

In collaboration with the ICL team from the University of Tennessee, and the RUNTIME team from INRIA, we are evaluating the way to replace the scheduling driver of the
`PaStiX`solver by one of the generic frameworks, DAGuE (see
http://

In
`HIPS`, we propose several algorithmic variants to solve the Schur complement system that can be adapted to the geometry of the problem: typically some strategies are more suitable for
systems coming from a 2D problem discretisation and others for a 3D problem; the choice of the method also depends on the numerical difficulty of the problem. We have a parallel version of
HIPS that provides full iterative methods as well as hybrid methods that mixes a direct factorization inside the domain and an iterative method in the Schur complement.

Graphs or meshes partitioners are now able to deal with problems that have more than several billion of unknowns. Solving linear systems is clearly the limiting step to reach this challenge in numerical simulations. During her PhD, Astrid Casadei will have to propose solutions to get an efficient algorithmic coupling of direct and iterative methods that allow a powerful management of whole the levels of parallelism. As a preliminary study, we focus on memory issues to build a Schur complement in our direct solver. During factorization step, memory overhead may occur for two reasons. The first one is due to the fan-in approach, that is to say the local storage of non-local contributions. The second overhead is due to the coupling matrices (between direct part and Schur complement), which remain allocated during the whole computation and are freed only at the end. Our first ideas to reduce memory consumption was to postpone the allocation of each block and, thanks to a right-looking algorithm, a column-block may be freed as soon as it has been treated. However, many blocks may be allocated very quickly, and a solution would be to use a left-looking scheme when dealing with local contributions. Thus, we introduce a mixed version : a right-looking algorithm is used, except for local contributions in the direct part where a left-looking scheme is applied. Some experiments have been performed and first results show that some substantial memory reductions can be achieved.

**Dates:**2008-2011

Transfer and development of the Residual Distribution schemes in the N3S Natur code (in collaboration with INCKA).

**Dates:**2008-2011

Study and validation of very high order SUPG schemes for turbulent flow with Spalart Allmaras turbulence model in AETHER. The SA model and the navier Stokes euqtations are only weakly coupled.

**Grant:**ANR-11-MN

**Dates:**2011 – 2015

**Partners:**CEA IRFM, JAD INRIA, Maison de la Simulation.

**Overview:**

The main goal of the project is to make a significant progress in understanding of largely unknown at present physics of active control methods of plasma edge MHD instabilities Edge
Localized Modes (ELMs) which represent particular danger with respect to heat and particle loads for Plasma Facing Components (PFC) in ITER. Project is focused in particular on the numerical
modeling study of such ELM control methods as Resonant Magnetic Perturbations (RMPs) and pellet ELM pacing both foreseen in ITER. The goals of the project are to improve understanding of the
related physics and propose possible new strategies to improve effectiveness of ELM control techniques. The tool for the non-linear MHD modeling (code
`JOREK`) will be largely developed within the present project to include corresponding new physical models in conjunction with new developments in mathematics and computer science
strategy in order to progress in urgently needed solutions for ITER.

This proposal is the logic but even more challenging continuation of the previous ANR project ASTER (2006-2010). These works are involved in the large-scale initiative supported by INRIA on magnetic fusion and also take a place in a new LABEX VENUS proposal submitted in October 2011.

.

**Grant:**ANR Cosinus 2010

**Dates:**2011–2012

**Partners:**INRIA Saclay-Ile de France (leader of the project), Paris 6, IFP (Rueil-Malmaison), CEA Saclay.

**Overview:**In this collaborative effort, we propose to develop parallel preconditioning techniques for the emergent hierarchical models of clusters of multi-core processors, as used for
example in future petascale machines. The preconditioning techniques are based on recent progress obtained in combining the well known incomplete LU (ILU) factorization with tangential
filtering.

The track we are following in order to contribute to this goal is to investigate improved graph ordering techniques that would privilege the diagonal dominance of the matrices corresponding to the subdomains of the Schur complement. It amounts to integrating numerical values into the adjacency graph of the matrices, so that the importance of off-diagonal terms is taken into account when computing graph separators. The core of this work is planned to take place at the beginning of next year.

This project is a continuation of PETAL project that was funded by ANR Cosinus 2008 call.

**Grant:**ANR MN 2011

**Dates:**2011-2014

**Partners:**INRIA Bordeaux Sud-Ouest (leader), ENSAM Paris Tech, INPG, ONERA, Phimeca.

**Overview**We are interested in the simulation and the optimisaton of flows with uncertainties on the data and/or the models. Only non intrusive methods are considered in this project in
order to re-use easily existing CFD codes, in particular the project members'. We concentrate on the uncertainties occuring in turbulence models for external aerodynamics and those occuring
in thermodynamics models for organic fluids as those use in ORC machines. The number of uncertainties can be arbitrary, so we aim at developing methods that can handle as many uncertainties
as possible, relying on good algorithms and massive parallelisation. Another aim is also to be able to use experimental data to calibrate pdf via Bayesian techniques. Epistemic uncertainties
for turbulence modeling is also a important topic for the project, for which a theoretical framework need to be established.

Title: Industrialisation of High-Order Methods- A Top-Down Approach

Type: COOPERATION (TRANSPORTS)

Instrument: Specific Targeted Research Project (STREP)

Duration: October 2010 - September 2013

Coordinator: Deutsches Zentrum fur Luft und Raumfahrt (Germany)

Others partners: Dassault Aviation (France), EADS Deutschland GmbH, Cassidian Air System (Allemagne), CENAERO (Belgique), NUMECA (Belgique), ARA (UK), Swedish Defence Research Agency (Suède), NLR (Pays Bas), ONERA (France), TSAGI (Russie), VKI (Belgique), ENSAM (France), Imperial College London (UK), Université de Bergamo (Italie), Université de Brescia (Italie), Université de Stuttgart (Allemagne), Poznan University of Technology (Pologne), Warsaw University of Technology (Pologne), Université de Linköping (Suède), Université catholique de Louvain (Belgique).

See also:
http://

Abstract: The IDIHOM project is motivated by the increasing demand of the European aerospace industry to advance their CFD-aided design procedure and analysis by using accurate and fast numerical methods, so-called high-order methods. They will be assessed and improved in a top-down approach by utilizing industrially relevant complex test cases, so-called application chalenges in the general area of turbulent steady and unsteady aerodynamic flows, covering external and internal aerodynamics as well as aeroelastic and aeroacoustic applications.

Program: IDEAS program, European Research Council

Project acronym:
`ADDECCO`

Project title: ADaptive schemes for DEterministic and stoChastiC Flow PrOblems

Duration: December 2008-November 2013

Coordinator: Rémi Abgrall

Other partners: none

Abstract: The numerical simulation of complex compressible flow problem is still a challenge nowadays, even for the simplest physical model such as the Euler and Navier Stokes equations for perfect gases. Researchers in scientific computing need to understand how to obtain efficient, stable, very accurate schemes on complex 3D geometries that are easy to code and to maintain, with good scalability on massively parallel machines. Many people work on these topics, but our opinion is that new challenges have to be tackled in order to combine the outcomes of several branches of scientific computing to get simpler algorithms of better quality without sacrificing their efficiency properties. In this proposal, we will tackle several hard points to overcome for the success of this program.

We first consider the problem of how to design methods that can handle easily mesh refinement, in particular near the boundary, the locations where the most interesting engineering quantities have to be evaluated. CAD tools enable to describe the geometry, then a mesh is generated which itself is used by a numerical scheme. Hence, any mesh refinement process is not directly connected with the CAD. This situation prevents the spread of mesh adaptation techniques in industry and we propose a method to overcome this even for steep problems.

Second, we consider the problem of handling the extremely complex patterns that occur in a flow because of boundary layers: it is not always sufficient to only increase the number of degrees of freedom or the formal accuracy of the scheme. We propose to overcome this with class of very high order numerical schemes that can utilise solution dependent basis functions.

Our third item is about handling unsteady uncertainties in the model, for example in the geometry or the boundary conditions. This need to be done efficiently: the amount of computation increases a priori linearly with the number of uncertain parameters. We propose a non–intrusive method that is able to deal with general probability density functions (pdf), and also able to handle pdfs that may evolve during the simulation via a stochastic optimization algorithm, for example. This will be combined with the first two items of this proposal. Many random variables may be needed, the curse of dimensionality will be dealt thanks to multiresolution method combined with sparse grid methods.

The aim of this proposal is to design, develop and evaluate solutions to each of these challenges. Currently, and up to our knowledge, none of these problems have been dealt with for compressible flows with steep patterns as in many moderns aerodynamics industrial problems. We propose a work program that will lead to significant breakthroughs for flow simulations with a clear impact on numerical schemes and industrial applications. Our solutions, though developed and evaluated on flow problems, have a wider potential and could be considered for any physical problem that are essentially hyperbolic.

Partner 1: von Karman Institute (Belgique)

Topic : Uncertainty quantification for hypersonic flows

Partner 2: von Karman Institute (Belgique)

Topic : Numerical approximation of compressible flows with residual distribution schemes

Partner 3: School of Computing, Leeds University (England)

Topic : Numerical approximation of free surface flows with residual distribution schemes

Partner 4: ONERA (France)

Topic : Numerical approximation of compressible flows with residual distribution schemes

Title: Uncertainty quantification and numerical simulation of high Reynolds number flows

INRIA principal investigator: Pietro Marco Congedo

International Partner:

Institution: Stanford University (United States)

Laboratory: Department of Mechanical Engineering

Researcher: Gianluca Iaccarino

International Partner:

Institution: Stanford University (United States)

Laboratory: Department of Aeronautics and Astronautics

Researcher: Charbel Farhat

Duration: 2011 - 2013

See also:
http://

This research project deals with uncertainty quantification and numerical simulation of high Reynolds number flows. It represents a challenging study demanding accurate and efficient numerical methods. It involves the INRIA team BACCHUS and the groups of Pr. Charbel Farhat from the Department of Aeronautics and Astronautics and Pr. G. Iaccarino from the Department of Mechanical Engineering at Stanford University. The first topic concerns the simulation of flows when only partial information about the physics or the simulation conditions (initial conditions, boundary conditions) is available. In particular we are interested in developing methods to be used in complex flows where the uncertainties represented as random variables can have arbitrary probability density functions. The second topic focuses on the accurate and efficient simulation of high Reynolds number flows. Two different approaches are developed (one relying on the XFEM technology, and one on the Discontinuous Enrichment Method (DEM), with the coupling based on Lagrange multipliers). The purpose of the proposed project is twofold : i) to conduct a critical comparison of the approaches of the two groups (Stanford and INRIA) on each topic in order to create a synergy which will lead to improving the status of our individual research efforts in these areas ; ii) to apply improved methods to realistic problems in high Reynolds number flow.

Within this project, several visits have been done (Pietro Marco Congedo in Stanford during May 2011, Arnaud Krust in Vancouver during june 2011, Per Pettersson (PhD student in Stanford) in Bordeaux at INRIA during july 2011, Gianluca Geraci in Stanford during November 2011, Arnaud Krust in Stanford during November 2011, Catherine Gorle (Post-doc scientist in Stanford) in Bordeaux at INRIA during novembre 2011)

In the context of the MORSE associate team (Matrices Over Runtime Systems at Exascale, see
http://

Jianxian Qiu (Xiamen University, China), one week in november

Elena Vazquez-Cendon (Universidade de Santiago de Compostela, Spain), one week in november

François Morency (École de Technologie Supérieure, Montréal, Canada) from june to october.

Birte Schmidtman (Kaiserslautern, Germany), from march to july 2011.

Per Pettersson (PhD student) in Bordeaux at INRIA during july 2011 (2 weeks), funding: AQUARIUS team

Catherine Gorle (Post-doc scientist in Stanford) in Bordeaux at INRIA during novembre 2011 (one week), funding: AQUARIUS team

Rémi Abgrall is associate editor of the international journals “Mathematical of Computation”, “Computer and Fluids”, “Journal of Computational Physics”, “Journal of Scientific Computing” and “Journal of Computing Science and Mathematics”. He is co-editor in chief of the “International Journal on Numercal methods in Fluids”. He is member of the scientific committee of the international conference ICCFD, He is member of the CFD committee of ECOMAS and of the scientific comitee of ECCOMAS 2012. He is also member of the scientific committee of CERFACS. He is member of the GP1 group of Allistène. He is member of the Comité National du CNRS, section 01. He is member of the board of the GAMNI group of SMAI and is its current responsible. He is member of the board of Institut Polytechnique de Bordeaux.

Pierre Ramet and Rémi Abgrall are members of the GENCI scientific committee (Mathematics and Computer Sciences). R. Abgrall also belongs to the Fluid mechanics one.

François Pellegrini and Pierre Ramet have been members of the “commission consultative” for the LaBRI in 2010.

Pierre Ramet was in the decision board of the "MCIA" project (
*Mésocentre Aquitain : un environnement Mutualisé de Calcul Intensif en Aquitaine*).

Cécile Dobrzynski is one of the organizers of the seminar "Modélisation et Calcul" of the Institut mathématiques de Bordeaux. She is member of the board of the GAMNI group of SMAI and she is secretary. She is member of the scientific committee for the organization of mini-symposia in collaboration between SMAI-GAMNI and AUM for CANUM 2012.

Licence : Algorithmique et Programmation, 32h, L3, ENSEIRB-MATMECA, FRANCE

Licence : Algorithmique et programmation pour le calcul scientifique, 58.66h, L3, Université Bordeaux 1, FRANCE

Licence : Calcul scientifique: résolution de grands systèmes creux, 34.66h, L3, Université Bordeaux 1, FRANCE

Licence: Algorithmique Numérique, 32h, L3, ENSEIRB-MATMECA, FRANCE

Licence : Langages en Fortran 90, 42,67h, L3, ENSEIRB-MATMECA, FRANCE

Licence : TER, 20h, L3, ENSEIRB-MATMECA, FRANCE

Master : Approximation numérique et problème industriel, 26h, M2, ENSEIRB-MATMECA, FRANCE

Master : Projet fin étude, 6h, M2, ENSEIRB-MATMECA, FRANCE

Master : TER, 18h, M1, ENSEIRB-MATMECA, FRANCE

Master: TER, 8h, M2, ENSEIRB-MATMECA, FRANCE

Master: Equilibrage et régulation de charge, 17.33h, M2, ENSEIRB-MATMECA, FRANCE

Master: Mise à niveau: Algorithmique et Programmation, 30h, M1, FRANCE

Master: Approximation numérique et problème industriel, 26h, M1, ENSEIRB-MATMECA, FRANCE

Master: Projet fin étude, 10h, M2, FRANCE

Master: TER, 8h, M2, ENSEIRB-MATMECA, FRANCE

Master: Projets Fluent, 20h, M1, ENSEIRB-MATMECA, FRANCE

Master: Calcul Haute Performance, 45h, M1, ENSEIRB-MATMECA, FRANCE

Master: Calcul Haute Performance, 36h, M2, ENSEIRB-MATMECA, FRANCE

Master : Approximation numérique et problème industriel, 26h, M2, ENSEIRB-MATMECA, FRANCE

Master : Analyse Numérique, 24h, M1, ENSEIRB-MATMECA, FRANCE

Post-graduate : Introduction to CFD, 18h, post-graduate master (Master IAS), ENSAM, FRANCE

HdR : Mario Ricchiuto, Contribution Contributions to the development of residual discretizations for hyperbolic conservation laws with application to shallow water flows, Université de Bordeaux, december 12, 2011

PhD : Pierre-Elie Normand, Application de méthods d'ordre élevé en éléments finis pour l'Aérodynamique., december 14, 2011.

PhD in progress : Robin Huart, Simulation numérique d'écoulements magnétohydrodynamique par des schémas distribuant le résidu, adviser: R. Abgrall. defence in january 2012.