Team Ecuador studies Algorithmic Differentiation (AD) of computer programs, blending :

We aim to produce AD code that can compete with hand-written sensitivity and adjoint programs used in the industry. We implement our algorithms into the tool Tapenade, one of the most popular AD tools at present.

Our research directions :

Algorithmic Differentiation (AD) differentiates
programs. The input of AD is
a source program

Any execution of

where each

which can be mechanically written as a sequence of instructions

The above computation of

Sensitivities, defined for a given direction

This expression is easily computed from right to left, interleaved with the original
program instructions. This is the tangent mode of AD.

Adjoints, defined after transposition (

This expression is most efficiently computed from right to left,
because matrixadjoint mode of AD, most effective for
optimization, data assimilation 34,
adjoint problems 27, or inverse problems.

Adjoint AD builds a very efficient program 29,
which computes the gradient in a time independent from the number of parameters tangent mode
would require running the tangent differentiated program

However, the inverse of their computation order. If the
original program overwrites a part of

Another research issue is to make the AD model cope with the constant evolution of modern language constructs. From the old days of Fortran77, novelties include pointers and dynamic allocation, modularity, structured data types, objects, vectorial notation and parallel programming. We keep developing our models and tools to handle these new constructs.

The most obvious example of a program transformation tool is certainly a compiler. Other examples are program translators, that go from one language or formalism to another, or optimizers, that transform a program to make it run better. AD is just one such transformation. These tools share the technological basis that lets them implement the sophisticated analyses 19 required. In particular there are common mathematical models to specify these analyses and analyze their properties.

An important principle is abstraction: the core of a compiler
should not bother about syntactic details of the compiled program.
The optimization and code generation phases must be independent
from the particular input programming language. This is generally achieved
using language-specific front-ends, language-independent middle-ends,
and target-specific back-ends.
In the middle-end, analysis can concentrate on the semantics
of a reduced set of constructs. This analysis operates
on an abstract representation of programs made of one
call graph, whose nodes are themselves flow graphs whose
nodes (basic blocks) contain abstract syntax trees for the individual
atomic instructions.
To each level are attached symbol tables, nested to capture scoping.

Static program analysis can be defined on this internal representation,
which is largely language independent. The simplest analyses on trees can be
specified with inference rules 22, 30, 20.
But many data-flow analyses are more complex, and better defined on graphs than on trees.
Since both call graphs and flow graphs may be cyclic, these global analyses will be solved iteratively.
Abstract Interpretation 23 is a theoretical framework to
study complexity and termination of these analyses.

Data flow analyses must be carefully designed to avoid or control
combinatorial explosion. At the call graph level, they can run bottom-up or top-down,
and they yield more accurate results when they take into account the different
call sites of each procedure, which is called context sensitivity.
At the flow graph level, they can run forwards or backwards, and
yield more accurate results when they take into account only the possible
execution flows resulting from possible control, which is called flow sensitivity.

Even then, data flow analyses are limited, because they are static and thus have very
little knowledge of actual run-time values. Far before reaching the very theoretical limit of
undecidability, one reaches practical limitations to how much information one can infer
from programs that use arrays 38, 24 or pointers.
Therefore, conservative over-approximations must be made, leading to
derivative code less efficient than ideal.

Scientific Computing provides reliable simulations
of complex systems. For example it is possible to simulate
the steady or unsteady 3D air flow around a plane that captures the physical phenomena
of shocks and turbulence. Next comes optimization,
one degree higher in complexity because it repeatedly simulates and
applies gradient-based optimization steps until an optimum is reached.
The next sophistication is robustness, that detects undesirable solutions which,
although maybe optimal, are very sensitive to uncertainty on design parameters or
on manufacturing tolerances. This makes second derivatives come into play.
Similarly Uncertainty Quantification can use second derivatives to evaluate how uncertainty on
the simulation inputs imply uncertainty on its outputs.

To obtain this gradient and possibly higher derivatives,
we advocate adjoint AD (cf. 3.1)
of the program that discretizes and solves the direct system.
This gives the exact gradient of the discrete function
computed by the program, which is quicker and more sound than differentiating
the original mathematical equations 27.
Theoretical results 26 guarantee convergence
of these derivatives when the direct program converges.
This approach is highly mechanizable. However, it requires
careful study and special developments of the AD model 31, 36
to master possibly heavy memory usage.
Among these additional developments, we promote in particular
specialized AD models for Fixed-Point iterations 28, 21,
efficient adjoints for linear algebra operators such as solvers, or exploitation
of parallel properties of the adjoint code.

Algorithmic Differentiation of programs gives sensitivities or gradients, useful for instance for :

A CFD program computes the flow around a shape, starting from a number of inputs that define the shape and other parameters. On this flow one can define optimization criteria e.g. the lift of an aircraft. To optimize a criterion by a gradient descent, one needs the gradient of the criterion with respect to all inputs, and possibly additional gradients when there are constraints. Adjoint AD is the most efficient way to compute these gradients.

Inverse problems aim at estimating the value of hidden parameters from other measurable values, that depend on the hidden parameters through a system of equations. For example, the hidden parameter might be the shape of the ocean floor, and the measurable values of the altitude and velocities of the surface. Figure 1 shows an example of an inverse problem using the glaciology code ALIF (a pure C version of ISSM 33) and its AD-adjoint produced by Tapenade.

One particular case of inverse problems is data assimilation 34
in weather forecasting or in oceanography.
The quality of the initial state of the simulation conditions the quality of the
prediction. But this initial state is not well known. Only some
measurements at arbitrary places and times are available.
A good initial state is found by solving a least squares problem
between the measurements and a guessed initial state which itself must verify the
equations of meteorology. This boils down to solving an adjoint problem,
which can be done though AD 37.
The special case of 4Dvar data assimilation is particularly challenging.
The 4th dimension in “4D” is time, as available measurements are distributed
over a given assimilation period. Therefore the least squares mechanism must be
applied to a simulation over time that follows the time evolution model.
This process gives a much better estimation of the initial state, because
both position and time of measurements are taken into account.
On the other hand, the adjoint problem involved is more complex,
because it must run (backwards) over many time steps.
This demanding application of AD justifies our efforts in
reducing the runtime and memory costs of AD adjoint codes.

Simulating a complex system often requires solving a system of Partial Differential Equations.
This can be too expensive, in particular for real-time simulations.
When one wants to simulate the reaction of this complex system to small perturbations around a fixed
set of parameters, there is an efficient approximation: just suppose that the system
is linear in a small neighborhood of the current set of parameters. The reaction of the system
is thus approximated by a simple product of the variation of the parameters with the
Jacobian matrix of the system. This Jacobian matrix can be obtained by AD.
This is especially cheap when the Jacobian matrix is sparse.
The simulation can be improved further by introducing higher-order derivatives, such as Taylor
expansions, which can also be computed through AD.
The result is often called a reduced model.

Some approximation errors can be expressed by an adjoint state. Mesh adaptation can benefit from this. The classical optimization step can give an optimization direction not only for the control parameters, but also for the approximation parameters, and in particular the mesh geometry. The ultimate goal is to obtain optimal control parameters up to a precision prescribed in advance.

Our research has an impact on environmental research: in Earth sciences, our gradients are used in inverse problems, to determine key properties in oceanography, glaciology, or climate models. For instance they determine basal friction coefficients of glaciers that are necessary to simulate their future evolution. Another example is to locate sources and sinks of CO2 by coupling atmospheric models and remote measurements.

Tapenade implements the results of our research about models and static analyses for AD. Tapenade can be downloaded and installed on most architectures. Alternatively, it can be used as a web server. Higher-order derivatives can be obtained through repeated application.

Tapenade performs sophisticated data-flow analysis, flow-sensitive and context-sensitive, on the complete source program to produce an efficient differentiated code. Analyses include Type-Checking, Read-Write analysis, and Pointer analysis. AD-specific analyses include the so-called Activity analysis, Adjoint Liveness analysis, and TBR analysis.

Data-Flow inversion is at the heart of the computation of gradients through AD. This amounts to providing (many of) the intermediate values computed at run-time, in reverse order. Most approaches use a stack, whose size may grow dramatically. Earth science applications such as climatology run on meshes of several million cells on thousands of time steps, yielding challenging stack sizes. Even using classical storage/recomputation tradeoffs, stack size may exceed the available main memory.

Most tools deal with this in an ad-hoc manner, identifying locations that consume a large part of the stack size, typically the “snapshots” taken before selected time steps. These snapshots are then stored in a specific way, i.e. directly on file. The way Tapenade's stack is organized allows for a more general and transparent approach. All storage needed by data-flow inversion goes to a unique stack. Without any intervention at the application level, this stack internally detects that its size becomes excessive, which triggers offloading of the deepest part of the stack to distant memory. Conversely, the stack is restored from the distant memory when needed. Offloading and restoring are triggered by the status of the stack, and are implemented in an asynchronous manner, allowing for time overlap between offloading/restoring on one hand, and derivative computation with elementary stack push/pop on the other hand.

We implemented an experimental version of this mechanism and started to use it on the adjoint of Dan Goldberg's “halfpipe-streamice” test case in the MIT GCM (Global Circulation Model). The much larger stack size permitted lets us experiment with different checkpointing patterns, with the effect of a significant speedup. More experiments are needed to validate this promising improvement.

In parallel, Jan Hueckelheim is looking at this asynchronous stack, written in C using POSIX pthreads and mutex'es, to experiment a code analysis tool to detect potential conflicts and data races in multithreaded C codes.

AD suffers from numerous limitations. Some limitations come from the constraints of application languages, or from coding styles adverse to AD or at least that make it inefficient. In this category of programming-related limitations, we have studied the question of data-flow reversal in languages with Garbage Collection (GC). Data-flow reversal means in particular restoring structures that have been deallocated. This is hard in presence of addresses, pointers, and even C pointer arithmetic. In a language with GC, how can we detect and undo a deallocation when it is done in a hidden manner and asynchronously? We proposed an extension of the AD model that deals with GC, we validated it on a simple Navier-Stokes solver application, and measured its performance. The article 13 describing this model and experiment has been published this year by ACM TOMS journal.

Other limitations are of a more abstract nature, and may come for instance from the mathematical equations underlying the code, or from the fundamental difference between computer's float numbers and real numbers. Categorizing these limitations is difficult, and a (very long) catalogue may prove useless. Jan Hueckelheim considered the question of what one can reasonably expect from AD. The assumption is that problematic cases are often not about AD producing wrong derivatives, but rather meaningful derivatives, only for a model different from what the user has in mind. Accurate knowledge of the context in which AD derivatives are meaningful will help users build a realistic expectation for AD tools. A joint article “Understanding Automatic Differentiation Pitfalls” is in preparation. A poster has been presented at the ICML 2023 workshop “Differentiable Almost Everything” in June.

We support users of Algorithmic Differentiation of large codes, mostly on CFD and Earth science.

In CFD, we continue support for the use of Tapenade to provide gradient computations in the Onera Platform “SoNice”. SoNice is a new CFD platform developed at ONERA by Bourasseau, Maugars, Content and colleagues, as a successor to Elsa. In the modular architecture of SoNice, Tapenade is used to build the differentiated counterpart of each module or operator. The implementation of each module is devised to be platform-independent, eventually targetting several multithreaded architectures through e.g. OpenMP or Cuda. This imposes technical constraints on the language differentiated by Tapenade, at various abstraction levels that range from AD models for multithreaded code, down to ways of handling fpp and cpp directives and automatically setting entry points and labels in a differentiated code to allow for some necessary post-processing.

We also continued support for Tapenade differentiation of the plasma Magneto-Hydro-Dynamic code SOLPS-ITER with Stefano Carli of KU Leuwen. A joint article 11 that describes this work was accepted this year for JCP.

In Earth science, this year main application is addressed by Shreyas Gaikwad, University of Texas at Austin, PhD student supervised by Patrick Heimbach, with support from Krishna Narayanan from Argonne. The goal is to produce the adjoint/gradient of all applications based on the MIT GCM code. These applications range from atmospheric to oceanic and glaciology. Last year's successful tangent and adjoint differentiation of MIT GCM was extended to several new validation test cases, and this year's work was to improve performance of the resulting differentiated code by carefully using the code optimization options of Tapenade. Still on MIT GCM, Dan Goldberg, University of Edinburgh, applies Tapenade to the glaciology test case “halfpipe-streamice”. Successful adjoint differentiation required improvements and extensions to Tapenade's handling of fixed-point loops. We are now working on performance improvement (in memory usage and CPU time) by exploring the available options for “checkpointing”, a classical storage/recomputation tradeoff of AD. This will also serve as an application for our new asynchronous stack save/restore mechanism (see section 7.1).

An article 12 was published in JOSS (Journal of Open Source Software) describing last year's application of Tapenade to SICOPOLIS, another similar glaciology code.

Still on Earth science, we support adjoint differentiation of the BEPS 32 code, a “biosphere” simulation model taking vegetation into account. This code is developed in part by the small company The Inversion Lab in Hamburg, Germany. We collaborate with Michael Vossbeck, engineer with The Inversion Lab, to provide the adjoint of BEPS by Tapenade. Differentiation with Tapenade is becoming a strategic part of the business of The Inversion Lab. Michael Vossbeck is progressively getting acquainted with the source code of Tapenade, with the objective of contributing to Tapenade development, and of being able to maintain it. This year, Michael Vossbeck improved the flexibility of the built-in auto-validation tool of Tapenade.

The progress in highly accurate schemes for compressible flows on unstructured meshes (together with advances in massive parallelization of these schemes) allows to solve problems previously out of reach. The four-years programme Norma, associating:

is supported by the French ANR and by the Russian Science Foundation, and is active till september 2024. Norma is a cooperation on the subject of extending Computational AeroAcoustics methods to simulate the noise emited by rotating machines (helicopters, aerial vehicles, unmanned aerial vehicles, wind turbines...). A detailed description with progress reports is available at this location.

Sections 7.5, 7.6, 7.7, 7.8 describe the 2023 contributions of Ecuador to the Norma programme.

Modelling turbulence is an essential aspect of CFD. The purpose of our work in hybrid RANS/LES (Reynolds Averaged Navier-Stokes / Large Eddy Simulation) is to develop new approaches for industrial applications of LES-based analyses. In the applications targeted (aeronautics, hydraulics), the Reynolds number can be as high as several tens of millions, far too high for pure LES models. However, certain regions in the flow can be predicted better with LES than with usual statistical RANS (Reynolds averaged Navier-Stokes) models. These are mainly vortical separated regions as assumed in one of the most popular hybrid models, the Detached Eddy Simulation (DES) model. Here, “hybrid” means that a blending is applied between LES and RANS. The development of hybrid models, in particular DES in the litterature, has raised the question of the domain of validity of these models. According to theory, these models should not be applied to flows involving laminar boundary layers (BL). But industrial flows are complex flows and often combine in a single flow regions of laminar BL, regions of fully developed turbulent BL, and regions of non-equilibrium vortical BL. It is then mandatory for industrial use that the new hybrid models give a reasonable prediction for all these types of flow. We concentrated on evaluating the behavior of hybrid models for laminar BL and for vortical wakes. While less predictive than pure LES on laminar BL, some hybrid models still give reasonable predictions for rather low Reynolds numbers.

An important result for this year is the proposal by Florian Miralles of a novel intermittency-based statistical two-equation model, and its extension to a novel intermittency-based hybrid RANS-LES model. The new model predicts accurately a larger class of turbulent flow, as demonstrated by its ability to predict (using relatively coarse grids) the drag crisis and the increase in Strouhal number when applied to the simulation of the flow around a circular cylinder from sub-critical to super-critical flow regimes (cf. 35). This work is a part of the PhD thesis of Florian Miralles, supported by the Norma programme and defended in november 2023 at university of Montpellier.

The Montpellier-Sophia team has been studying this new model in combination with the mesh adaptation method developed by Bastien Sauvage, for higher Reynolds number and wing geometries. A Norma report and a paper are in preparation.

The physical problem addressed by Norma involves a computational domain made of at least two components having different rotative motions. The numerical problem of their combination gave birth to many specialized schemes, such as the so-called sliding method, chimera method, immersed boundary method (IBM). The Ecuador team is studying a novel Chimera method, in cooperation with Lemma engineering (Sophia-Antipolis).

In 2023, B. Sauvage and Didier Chargy made progress in the computation of the Caradonna-Tung rotor, with a new adapted calculation on 3 million vertices. Results were presented in 15.

Thanks to anisotropic mesh adaptation, high-fidelity industrial calculations can be made to converge at second-order (in the steady case), even for rather singular cases, and convergence error can be evaluated. We are investigating the way to extend this to higher order. Our research relies on a fourth-order accurate three-dimensional vertex-centered Central Essentially Non Oscillating (CENO) scheme. In 2023, the consolidation and validation of the fourth-order accurate CENO in a mesh-adaptive version of NiceFlow has been advanced. A theory is being developped for proposing a h-p approach (a simultaneous adaptation of the mesh and of the truncation order) combining the CENO scheme and our anisotropic mesh adaption approach.

Reducing approximation errors as much as possible by modifying the mesh is a particular kind of optimal control problem. We formulate it exactly this way when we look for the optimal metric of the mesh, which minimizes a user-specified functional (goal-oriented mesh adaptation). In that case, the usual methods of optimal control apply, using adjoint states that can be produced by Algorithmic Differentiation.

The second volume of the book 25 on mesh adaptation by Alauzet, Loseille, Koobus and Dervieux has been published by Wiley this year.

The INRIA teams GammaO (Saclay) and Ecuador (Sophia-Antipolis) have continued to study the application of anisotropic mesh adaptation to unsteady flow simulation. The baseline approach is the Global Fixed Point algorithm combined with a timestep relying on Courant-type stability condition for explicit time-advancing. This remains the best approach for many transient flows. For many unsteady turbulent flows at high Reynolds number, this is however inefficient, since very small cells in the boundary layer impose a very small explicit time step. Implicit timestepping is compulsory. Indeed, the time steps which can be used are notably larger than those permitted with an explicit time advancing. Large time steps bring a higher CPU efficiency, but the time approximation accuracy becomes an issue. Too large time steps degrade the prediction, too small time steps increases the computational cost. The new development focuses on obtaining directly the space-time discretization that minimizes the global error, under the constraint of a prescribed space-time discretization complexity. Application to vortex shedding flows (see figure 2) illustrate the interest of this novel method (cf. 18, 14, and B. Sauvage, F. Alauzet, A. Dervieux, “A space and time fixed point mesh adaptation method”, in preparation).

With statistical turbulence models, only a part of turbulent industrial flows can be predicted at an affordable cost. The rest may involve large detached regions which cannot be accurately described by statistical modelling, or vortices producing noise that the engineer wants to predict accurately. The Large Eddy Simulation (LES) is a model which predicts a part of the vortices of industrial interest. LES relies on filtering too small vortices and on modeling their effect on the larger ones. But this approach is one or two orders of magnitude more computer intensive than statistical modeling and therefore cannot be routinely used by engineers. Germano proposed an analysis using two different filters in order to measure the efficiency of a family of LES models. Recently, Toosi and Larsson demonstrated that the Germano analysis in fact deals with the source term of LES modeling error. We introduce this error term in our mesh adaptation process for RANS flow in order to obtain an approach for adapting the mesh to hybrid RANS/LES flow calculations. Applications to vortex shedding flows are being performed (cf. B. Sauvage, B. Koobus, F. Alauzet, A. Dervieux, “A metric-based mesh adaptation for hybrid RANS/LES flow calculations”, in preparation).

It appears today that with the new algorithms for mesh adaptation, new efficient and accurate automatic tools are available for maximizing the quality of high-fidelity prediction. In the near future, the advantages of these tools will probably make them mandatory in industrial practice. For design optimization, automatic tools are already available. However, these tools today rely on a generation of non- adaptative or poorly adaptative CFD. It is important to propose (see 16 and figure 3) a composite algorithm which nicely combines the novel mesh adaptation technology with the most recent design optimization loops.

Ecuador participates in the Joint Laboratory for Exascale Computing (JLESC) together with colleagues at Argonne National Laboratory.

The team organized the 25th EuroAD workshop, at INRIA Sophia-Antipolis, June 13-14, 2023.

Alain Dervieux is Scientific Director for the LEMMA company.