Optimisation Différentiable en Mécanique des Fluides Numérique

tropics Transformations et Outils Informatiques pour le Calcul Scientifique NUM Laurent Hascoët INRIA Chercheur DR INRIA oui Valérie Pascual INRIA Chercheur CR INRIA Marie-Line Ramfos INRIA Assistant TR INRIA Alain Dervieux INRIA Chercheur DR INRIA oui Hicham Tber INRIA PostDoc (Since September 1 ^st) Mauricio Araya-Polo INRIA PhD (Till October 31 ^st) Benjamin Dauvergne INRIA PhD Christophe Massol INRIA AutreCategorie (Till September 30 ^th) Bruno Koobus UnivFr Enseignant Université de Montpellier 2 Overall Objectives Overall Objectives

The TROPICS team is at the junction of two research domains:

AD:On one hand, we study software engineering techniques, to analyze and transform programs semi-automatically. In the past, we developed semi-automatic parallelization strategies aiming at SPMD parallelization. Presently, we focus on Automatic Differentiation (AD). AD transforms a program Pthat computes a function F, into a program P'that computes some derivatives of F, analytically. In particular, the so-called reverse modeof AD yields gradients. However, this reverse mode remains very delicate to use, and its implementation requires carefully crafted algorithms.

CFD application of AD:On the other hand, we study the application of AD, and particularly of the adjoint method, to Computational Fluid Dynamics. This involves necessary adaptation of optimization strategies. This work applies to two real-life problems, optimal shape design and mesh adaptation.

The second aspect of our work (optimization in Scientific Computing), is thus at the same time the motivation and the application domain of the first aspect (program analysis and transformation, and computation of gradients through AD). Concerning AD, our goal is to automatically produce derivative programs that can compete with the hand-written sensitivity and adjoint programs which exist in the industry. We implement our ideas and algorithms into the tool tapenade, which is developed and maintained by the project. Apart from being an AD tool, tapenadeis also a platform for other analyses and transformations of scientific programs. tapenadeis easily available. We provide a web server, and alternatively a version can be downloaded from our web server. Practical details can be found in section .

Our present research directions are :

Modern numerical methods for finite elements or finite differences: multigrid methods, mesh adaptation.

Optimal shape design or optimal control in the context of fluid dynamics: for example shape optimization of the wings of a supersonic aircraft, to reduce sonic bang. In this context, we study the optimization of nonsteady processes and the use of higher-order derivatives for robust optimization.

Automatic Differentiation : differentiate particular algorithms in a specially adapted manner, validate the derivatives, and reduce runtime and memory consumption when computing gradients (``adjoints'') with the reverse mode and second-order derivatives, targetting at application to large non-stationnary simulation codes.

Common tools for program analysis and transformation: adequate internal representation, Call Graphs, Flow Graphs, Data-Dependence Graphs.

Scientific Foundations Automatic Differentiation Mauricio Araya-Polo Benjamin Dauvergne Laurent Hascoët Christophe Massol Valérie Pascual Hicham Tber program transformation automatic differentiation scientific computing simulation optimization adjoint models automatic differentiation

(AD) Automatic transformation of a program, that returns a new program that computes some derivatives of the given initial program, i.e. some combination of the partial derivatives of the program's outputs with respect to its inputs.

adjoint model

Mathematical manipulation of the Partial Derivative Equations that define a problem, obtaining new differential equations that define the gradient of the original problem's solution.

checkpointing

General trade-off technique, used in the reverse mode of AD, that trades duplicate execution of a part of the program to save some memory space that was used to save intermediate results. Checkpointing a code fragment amounts to running this fragment without any storage of intermediate values, thus saving memory space. Later, when such an intermediate value is required, the fragment is run a second time to obtain the required values.

Automatic or Algorithmic Differentiation (AD) differentiates programs. An AD tool takes as input a source computer program Pthat, given a vector argument X $\in$ IRⁿ, computes some vector function Y= F( X) $\in$ IR^m. The AD tool generates a new source program that, given the argument X, computes some derivatives of F. In short, AD first assumes that Prepresents all its possible run-time sequences of instructions, and it will in fact differentiate these sequences. Therefore, the controlof Pis put aside temporarily, and AD will simply reproduce this control into the differentiated program. In other words, Pis differentiated only piecewise. Experience shows that this is reasonable in most cases, and going further is still an open research problem. Then, any sequence of instructions is identified with a composition of vector functions. Thus, for a given control:

$Im1 $\mtable{...}$$

where each f_kis the elementary function implemented by instruction I_k. Finally, AD simply applies the chain rule to obtain derivatives of F. Let us call X_kthe values of all variables after each instruction I_k, i.e. X₀= Xand X_k= f_k( X_k-1). The chain rule gives the Jacobian $Im2 $F^\#8242 $$ of F

$Im3 ${F^\#8242 {(X)}=f_p^\#8242 {(}X_{p-1}{)~.~}f_{p-1}^\#8242 {(}X_{p-2}{)~.~\#8943 ~.~}f_1^\#8242 {(}X_0{)}}$$

which can be mechanically translated back into a sequence of instructions $Im4 $I_k^\#8242 $$ , and these sequences inserted back into the control of P, yielding program $Im5 $P^\#8242 $$ . This can be generalized to higher level derivatives, Taylor series, etc.

In practice, the above Jacobian $Im6 ${F^\#8242 {(X)}}$$ is often far too expensive to compute and store. Notice for instance that equation ( ) repeatedly multiplies matrices, whose size is of the order of m× n. Moreover, some problems are solved using only some projections of $Im6 ${F^\#8242 {(X)}}$$ . For example, one may need only sensitivities, which are $Im7 ${F^\#8242 {(X).}\mover X\#729 }$$ for a given direction $Im8 $\mover X\#729 $$ in the input space. Using equation ( ), sensitivity is

$Im9 ${F^\#8242 {(X).}\mover X\#729 =f_p^\#8242 {(}X_{p-1}{)~.~}f_{p-1}^\#8242 {(}X_{p-2}{)~.~\#8943 ~.~}f_1^\#8242 {(}X_0{)~.~}\mover X\#729 ,}$$

which is easily computed from right to left, interleaved with the original program instructions. This is the principle of the tangent modeof AD, which is the most straightforward, of course available in tapenade.

However in optimization, data assimilation , adjoint problems , or inverse problems, the appropriate derivative is the gradient $Im10 ${F^{\#8242 *}{(X).}\mover Y¯}$$ . Using equation ( ), the gradient is

$Im11 ${F^{\#8242 *}{(X).}\mover Y¯=f_1^{\#8242 *}{(}X_0{).}f_2^{\#8242 *}{(}X_1{).~\#8943 ~.}f_{p-1}^{\#8242 *}{(}X_{p-2}{).}f_p^{\#8242 *}{(}X_{p-1}{).}\mover Y¯,}$$

which is most efficiently computed from right to left, because matrix ×vector products are so much cheaper than matrix ×matrix products. This is the principle of the reverse modeof AD.

This turns out to make a very efficient program, at least theoretically Section 3.4. The computation time required for the gradient is only a small multiple of the run-time of P. It is independent from the number of parameters n. In contrast, notice that computing the same gradient with the tangent modewould require running the tangent differentiated program ntimes.

We can observe that the X_kare required in the inverseof their computation order. If the original program overwritesa part of X_k, the differentiated program must restore X_kbefore it is used by $Im12 ${f_{k+1}^{\#8242 *}{(}X_k{)}}$$ . There are two strategies for that:

Recompute All (RA):the X_kis recomputed when needed, restarting Pon input X₀until instruction I_k. The taf tool uses this strategy. Brute-force RA strategy has a quadratic time cost with respect to the total number of run-time instructions p.

Store All (SA):the X_kare restored from a stack when needed. This stack is filled during a preliminary run of P, that additionally stores variables on the stack just before they are overwritten. The adifor and tapenadetools use this strategy. Brute-force SA strategy has a linear memory cost with respect to p.

Both RA and SA strategies need a special storage/recomputation trade-off in order to be really profitable, and this makes them become very similar. This trade-off is called checkpointing. Since tapenadeuses the SA strategy, let us describe checkpointing in this context. The plain SA strategy applied to instructions I₁to I_pbuilds the differentiated program sketched on figure , where an initial ``forward sweep'' runs the original program and stores intermediate values (black dots), and is followed by a ``backward sweep'' that computes the derivatives in the reverse order, using the stored values when necessary (white dots). Checkpointing a fragment Cof the program is illustrated on figure . During the forward sweep, no value is stored while in C. Later, when the backward sweep needs values from C, the fragment is run again, this time with storage. One can see that the maximum storage space is grossly divided by 2. This also requires some extra memorization (a ``snapshot''), to restore the initial context of C. This snapshot is shown on figure by slightly bigger black and white dots.

Checkpoints can be nested. In that case, a clever choice of checkpoints can make both the memory size and the extra recomputations grow like only the logarithm of the size of the program.

Static Analyses and Transformation of programs Mauricio Araya-Polo Benjamin Dauvergne Laurent Hascoët Christophe Massol Valérie Pascual Hicham Tber static analysis program transformation compilation abstract syntax tree control flow graph data flow analysis data dependence graph abstract interpretation abstract syntax tree

Tree representation of a computer program, that keeps only the semantically significant information and abstracts away syntactic sugar such as indentation, parentheses, or separators.

control flow graph

Representation of a procedure body as a directed graph, whose nodes, known as basic blocks, contain each a list of instructions to be executed in sequence, and whose arcs represent all possible control jumps that can occur at run-time.

abstract interpretation

Model that describes program static analyses as a special sort of execution, in which all branches of control switches are taken simultaneously, and where computed values are replaced by abstract values from a given semantic domain. Each particular analysis gives birth to a specific semantic domain.

data flow analysis

Program analysis that studies how a given property of variables evolves with execution of the program. Data Flow analyses are static, therefore studying all possible run-time behaviors and making conservative approximations. A typical data-flow analysis is to detect whether a variable is initialized or not, at any location in the source program.

data dependence analysis

Program analysis that studies the itinerary of values during program execution, from the place where a value is generated to the places where it is used, and finally to the place where it is overwritten. The collection of all these itineraries is often stored as a data dependence graph, and data flow analysis most often rely on this graph.

data dependence graph

Directed graph that relates accesses to program variables, from the write access that defines a new value to the read accesses that use this value, and conversely from the read accesses to the write access that overwrites this value. Dependences express a partial order between operations, that must be preserved to preserve the program's result.

The most obvious example of a program transformation tool is certainly a compiler. Other examples are program translators, that go from one language or formalism to another, or optimizers, that transform a program to make it run better. AD is just one such transformation. These tools use sophisticated analyses to improve the quality of the produced code. These tools share their technological basis. More importantly, there are common mathematical models to specify and analyze them.

An important principle is abstraction: the core of a compiler should not bother about syntactic details of the compiled program. In particular, it is desirable that the optimization and code generation phases be independent from the particular input programming language. This can generally be achieved through separate front-ends, that produce an internal language-independent representation of the program, generally an abstract syntax tree. For example, compilers like gccfor cand g77for fortran77have separate front-ends but share most of their back-end.

One can go further. As abstraction goes on, the internal representation becomes more language independent, and semantic constructs such as declarations, assignments, calls, IO operations, can be unified. Analyses can then concentrate on the semantics of a small set of constructs. We advocate an internal representation composed of three levels.

At the top level is the call graph, whose nodes are the procedures. There is an arrow from node Ato node Biff Apossibly calls B. Recursion leads to cycles. The call graph captures the notions of visibility scope between procedures, that come from modules or classes.

At the middle level is the control flow graph. There is one flow graph per procedure, i.e. per node in the call graph. The flow graph captures the control flow between atomic instructions. Flow control instructions are represented uniformly inside the control flow graph.

At the lowest level are abstract syntax trees for the individual atomic instructions. Certain semantic transformations can benefit from the representation of expressions as directed acyclic graphs, sharing common sub-expressions.

To each basic block is associated a symbol table that gives access to properties of variables, constants, function names, type names, and so on. Symbol tables must be nested to implement lexical scoping.

Static program analyses can be defined on this internal representation, which is largely language independent. The simplest analyses on trees can be specified with inference rules , , . But many analyses are more complex, and are thus better defined on graphs than on trees. This is the case for data-flow analyses, that look for run-time properties of variables. Since flow graphs are cyclic, these global analyses generally require an iterative resolution. Data flow equationsis a practical formalism to describe data-flow analyses. Another formalism is described in , which is more precise because it can distinguish separate instancesof instructions. However it is still based on trees, and its cost forbids application to large codes. Abstract Interpretation is a theoretical framework to study complexity and termination of these analyses.

Data flow analyses must be carefully designed to avoid or control combinatorial explosion. The classical solution is to choose a hierarchical model. In this model, information, or at least a computationally expensive part of it, is synthesized. Specifically, it is computed bottom up, starting on the lowest (and smallest) levels of the program representation and then recursively combined at the upper (and larger) levels. Consequently, this synthesized information must be made independent of the context (i.e., the rest of the program). When the synthesized information is built, it is used in a final pass, essentially top down and context dependent, that propagates information from the ``extremities'' of the program (its beginning or end) to each particular subroutine, basic block, or instruction.

Even then, data flow analyses are limited, because they are static and thus have very little knowledge of actual run-time values. Most of them are undecidable; that is, there always exists a particular program for which the result of the analysis is uncertain. This is a strong, yet very theoretical limitation. More concretely, there are always cases where one cannot decide statically that, for example, two variables are equal. This is even more frequent with two pointers or two array accesses. Therefore, in order to obtain safe results, conservative over-approximationsof the computed information are generated. For instance, such approximations are made when analyzing the activity or the TBR (``To Be Restored'') status of some individual element of an array. Static and dynamic array region analyses , provide very good approximations. Otherwise, we make a coarse approximation such as considering all array cells equivalent.

When studying program transformations, one often wants to move instructions around without changing the results of the program. The fundamental tool for this is the data dependence graph. This graph defines an order between run-timeinstructions such that if this order is preserved by instructions rescheduling, then the output of the program is not altered. Data dependence graph is the basis for automatic parallelization. It is also useful in AD. Data dependence analysisis the static data-flow analysis that builds the data dependence graph.

Automatic Differentiation and Computational Fluid Dynamics Alain Dervieux Laurent Hascoët Bruno Koobus computational fluid dynamics linearization optimization adjoint methods adjoint state gradient linearization

The mathematical equations of Fluid Dynamics are Partial Derivative Equations, that are discretized and then solved by a computer program. Linearization of these equations, or alternatively linearization of the computer program, gives a modelization of the behavior of the flow when small perturbations are applied. This is useful when the perturbations are effectively small, like in acoustics, or when one wants the sensitivity of the system with respect to one parameter, like in optimization.

adjoint state

Consider a system of Partial Derivative Equations that define some characteristics of a system with respect to some input parameters. Consider one particular scalar characteristic. Its sensitivity, (or gradient) with respect to the input parameters can be defined as the solution of ``adjoint'' equations, deduced from the original equations through linearization and transposition. The solution of the adjoint equations is known as the adjoint state.

Computational Fluid Dynamics is now able to make reliable simulations of very complex systems. For example it is now possible to simulate completely the 3D air flow around a plane that captures the physical phenomena of shocks and turbulence. The next step in CFD appears to be optimization. Optimization is one degree higher in complexity, because it repeatedly simulates, evaluates directions of optimization and applies optimization steps, until an optimum is reached.

We restrict here to gradient descent methods. One risk is obviously to fall into local minima before reaching the global minimum. We do not address this question, although we believe that more robust approaches, such as evolutionary approaches, could benefit from a coupling with gradient descent approaches. Another well-known risk is the presence of discontinuities in the optimized function. We investigate two kinds of methods to cope with discontinuities: we can devise AD algorithms that detect the presence of discontinuities, and we can design optimization algorithms that solve some of these discontinuities.

We investigate several approaches to obtain the gradient. There are actually two extreme approaches:

One can write an adjoint system, then discretize it and program it by hand. The adjoint system is a new system, deduced from the original equations, and whose solution, the adjoint state, leads to the gradient. A hand-written adjoint is very sound mathematically, because the process starts back from the original equations. This process implies a new separate implementation phase to solve the adjoint system. During this manual phase, mathematical knowledge of the problem can be translated into many hand-coded refinements. But this may take an enormous engineering time. Except for special strategies (see ), this approach does not produce an exact gradient of the discrete functional, and this can be a problem if using optimization methods based on descent directions.

A program that computes the gradient can be built by pure Automatic Differentiation in the reverse mode ( cf ). It is in fact the adjoint of the discrete functional computed by the software, which is piecewise differentiable. It produces exact derivatives almost everywhere. Theoretical results guarantee convergence of these derivatives when the functional converges. This strategy gives reliable descent directions to the optimization kernel, although the descent step may be tiny, due to discontinuities. Most importantly, AD adjoint is generatedby a tool. This saves a lot of development and debug time. But this systematic approach leads to massive use of storage, requiring code transformation by hand to reduce memory usage. Mohammadi's work illustrates the advantages and drawbacks of this approach.

The drawback of AD is the amount of storage required. If the model is steady, can we use this important property to reduce this amount of storage needed? Actually this is possible, as shown in , where computation of the adjoint state uses the iterated states in the direct order. Alternatively, most researchers (see for example ) use only the fully converged state to compute the adjoint. This is usually implemented by a hand modification of the code generated by AD. But this is delicate and error-prone. The TROPICS team investigate hybrid methods that combine these two extreme approaches.

Application Domains Panorama

Automatic Differentiation of programs gives sensitivities or gradients, that are useful for many types of applications:

optimum shape design under constraints, multidisciplinary optimization, and more generally any algorithm based on local linearization,

inverse problems, such as parameter estimation and in particular variational data assimilation in climate sciences (meteorology, oceanography)

first-order linearization of complex systems, or higher-order simulations, yielding reduced models for simulation of complex systems around a given state,

mesh adaptation and mesh optimization with gradients or adjoints,

equation solving with the Newton method,

sensitivity analysis, propagation of truncation errors.

We will detail some of them in the next sections. These applications require an AD tool that differentiates programs written in classical imperative languages, fortran77, fortran95, c, or c++. We also consider our AD tool tapenadeas a platform to implement other program analyses and transformations. tapenadedoes the tedious job of building the internal representation of the program, and then provides an API to build new tools on top of this representation. One application of tapenadeis therefore to build prototypes of new program analyses.

Multidisciplinary optimization

A CFD program computes the flow around a shape, starting from a number of inputs that define the shape and other parameters. From this flow, it computes an optimization criterion, such as the lift of an aircraft. To optimize the criterion by a gradient descent, one needs the gradient of the output criterion with respect to all the inputs, and possibly additional gradients when there are constraints. The reverse mode of AD is a promising way to compute these gradients.

Inverse problems

Inverse problems aim at estimating the value of hidden parameters from other measurable values, that depend on the hidden parameters through a system of equations. For example, the hidden parameter might be the shape of the ocean floor, and the measurable values the altitude and speed of the surface. Another example is data assimilationin weather forecasting. The initial state of the simulation conditions the quality of the weather prediction. But this initial state is largely unknown. Only some measures at arbitrary places and times are available. The initial state is found by solving a least squares problem between the measures and a guessed initial state which itself must verify the equations of meteorology. This rapidly boils down to solving an adjoint problem, which can be done though AD .

Linearization

Simulating a complex system often requires solving a system of Partial Differential Equations. This is sometimes too expensive, in particular in the context of real time. When one wants to simulate the reaction of this complex system to small perturbations around a fixed set of parameters, there is a very efficient approximate solution: just suppose that the system is linear in a small neighborhood of the current set of parameters. The reaction of the system is thus approximated by a simple product of the variation of the parameters with the Jacobian matrix of the system. This Jacobian matrix can be obtained by AD. This is especially cheap when the Jacobian matrix is sparse. The simulation can be improved further by introducing higher-order derivatives, such as Taylor expansions, which can also be computed through AD. The result is often called a reduced model.

Mesh adaptation

It has been noticed that some approximation errors can be expressed by an adjoint state. Mesh adaptation can benefit from this. The classical optimization step can give an optimization direction not only for the control parameters, but also for the approximation parameters, and in particular the mesh geometry. The ultimate goal is to obtain optimal control parameters up to a precision prescribed in advance.

Software Tapenade Laurent Hascoët contact Mauricio Araya-Polo Nicolas Chleq Development Engineer Benjamin Dauvergne Christophe Massol Valérie Pascual Hicham Tber

tapenadeis the Automatic Differentiation tool developed by the TROPICS team. tapenadeprogressively implements the results of our research about models and static analyses for AD. From this standpoint, tapenadeis a research tool. Our objective is also to promote the use of AD in the scientific computation world, including the industry. Therefore the team constantly maintains tapenadeto meet the demands of our industrial users. tapenadecan be simply used as a web server, available at the URL

http://tapenade.inria.fr:8080/tapenade/index.jsp

It can also be downloaded and installed from our FTP server

ftp://ftp-sop.inria.fr/tropics/tapenade/README.html

A documentation is available on our web page

http://www-sop.inria.fr/tropics/

and as an INRIA technical report (RT-0300)

http://hal.inria.fr/inria-00069880

tapenadedifferentiates computer programs according to the model described in section . It supports three modes of differentiation:

the tangentmode that computes a directional derivative $Im7 ${F^\#8242 {(X).}\mover X\#729 }$$ ,

the vector tangentmode that computes $Im13 ${F^\#8242 {(X).}\mover X_n\#729 }$$ for many directions X_nsimultaneously, and can therefore compute Jacobians, and

the reversemode that computes the gradient $Im10 ${F^{\#8242 *}{(X).}\mover Y¯}$$ .

An obvious fourth mode could be the vector reversemode, which is not yet implemented. Many other modes exist in the other AD tools in the world, that compute for example higher degree derivatives or Taylor expansions. For the time being, we restrict ourselves to first-order derivatives and we put our efforts on the reverse mode. But as we said before, we also view tapenadeas a platform to build new program transformations, in particular new differentiations.

Like any program transformation tool, tapenadeneeds sophisticated static analyses in order to produce an efficient output. Concerning AD, the following analyses are a must, and tapenadenow performs them all:

Pointers destinations:For any static program transformation, and in particular differentiation, it is essential to have a precise knowledge of the possible destinations of each pointer at each code line. Otherwise one must make conservative assumptions that will lead to less efficient code. Our static pointer analysis finds precise information about pointer destinations, taking into account memory allocation and deallocation operations.

Activity:The end-user has the opportunity to specify which of the output variables must be differentiated (called the dependent variables), and with respect to which of the input variables (called the independent variables). Activity analysis propagates the dependent, backward through the program, to detect all intermediate variables that possibly influence the dependent. Conversely, activity analysis also propagates the independent, forward through the program, to find all intermediate variables that possibly depend on the independent. Only the intermediate variables that both depend on the independent and influence the dependent are called active, and will receive an associated derivative variable. Activity analysis makes the differentiated program smaller and faster.

Adjoint Liveness and Read-Write:Programs produced by the reverse mode of AD show a very particular structure, due to their mechanism to restore intermediate values of the original program in the reverseorder. This has deep consequences on the liveness and Read-Write status of variables, that we can exploit to take away unnecessary instructions and memory usage from the reverse (adjoint) program. This makes the adjoint program smaller and faster by factors that can go up to 40%.

TBR:The reverse mode of AD, with the Store-All strategy, stores all intermediate variables just before they are overwritten. However this is often unnecessary, because derivatives of some expressions (e.g. linear expressions) only use the derivatives of their arguments and not the original arguments themselves. In other words, the local Jacobian matrix of an instruction may not need all the intermediate variables needed by the original instruction. The To Be Restored (TBR)analysis finds which intermediate variables need not be stored during the forward sweep, and therefore makes the differentiated program smaller in memory.

Several other strategies are implemented in tapenadeto improve the differentiated code. For example, a data-dependence analysis allows tapenadeto move instructions around safely, gathering instructions to reduce cache misses. Also, long expressions are split in a specific way, to minimize duplicate sub-expressions in the derivative expressions.

The input languages of tapenadeare fortran77and fortran95. Thanks to the language-independent internal representation of programs, as shown on figure , we could relatively easily extend tapenadeto C, at least for simple programs. More work is still required to obtain a completely reliable differentiation of C.

There are two user interfaces for tapenade. One is a simple command that can be called from a shell or from a Makefile. The other is interactive, using java swingcomponents and htmlpages. This interface allows one to use tapenadefrom windowsas well as linux. The input interface lets one specify interactively the routine to differentiate, its independent inputs and dependent outputs. The output interface, shown on figure , displays the differentiated programs, with htmllinks that implement source-code correspondence, as well as correspondence between error messages and locations in the source.

tapenadeis now available for linux, sun, and windows-xpplatforms.

Figure shows the architecture of tapenade. It is implemented mostly in java, apart from the front-ends which are separated and can be written in their own languages.

Notice the clear separation between the general-purpose program analyses, based on a general representation, and the differentiation engine itself. Other tools can be built on top of the Imperative Language Analyzer platform.

The end-user can specify properties of external or black-box routines. This is essential for real industrial applications that use many libraries. The source of these libraries is generally hidden. However AD needs some information about these black-box routines in order to produce efficient code. tapenadelets the user specify this information in a separate signature file. Specifically for the reverse mode of AD, tapenadelets the user specify finely which procedure calls must be checkpointed or not, to improve the overal performances of the differentiated program.

New Results Program Analyses to improve Checkpointing in Reverse AD Mauricio Araya-Polo Benjamin Dauvergne Laurent Hascoët data-flow analyses static analyses reverse mode of AD checkpointing snapshots

As described in , the reverse mode of AD is a very appealing approach to compute gradients, but it suffers from the need to restore intermediate values in the reverse of their original computation order. In our approach, this question is basically solved through storage, whose cost can be very high. We use checkpointing to mitigate this cost, trading extra recomputations for memory space. An efficient storage and checkpointing strategy is really the key to a wide usage of reverse AD in the industry. This is why we continue our modelization and experimentation efforts to find the best possible strategies.

This year we formalized the ``snapshot''. When a fragment of code is ``checkpointed'', it is executed twice: the snapshot is the set of variables which must be saved and restored to ensure that the second execution is equivalent to the first. There exist several formulae that give the snapshot of a given checkpoint, all based on common sense. However we observe that further improvement of these formulae is delicate, and common sense is not enough to justify the improvements. In particular, reducing a snapshot can lead to an increase of storage in other parts of the code, and the overall memory consumption may turn out to be worse.

We study this question starting from a formal description of the interactions between all memory storage involved in checkpointing, and from it we derive a system of set equations that we solve formally, using Maple. After resolution, it turns out that there exist a continuum of optimal snapshots, none of which is strictly better (smaller) than the others. We are able to characterize these good solutions, and then to propose heuristics to select the most appropriate. One solution appears to give the best results in most applications, and it is now the default strategy in tapenade. We presented these results at the ICCS 2006 conference in Reading UK .

These results on snapshots are an additional contribution to the more general problem of selecting the best possible placement of checkpoints in a given code. This central problem is addressed from two different angles in the PhD research of Benjamin Dauvergne and Mauricio Araya.

Benjamin Dauvergne focuses on finding indicators that can be used to place checkpoints, i.e. find the nested code fragments that will give best performances if checkpointed. These indicators can result from a profiling run of the original program or of the differentiated program. One of this year's results is the development of a model that correctly predicts the memory and time behavior of a given checkpoints placement. Benjamin Dauvergne presented his preliminary results at this spring euro-AD workshop in Oxford, UK.

At the same time, Mauricio Araya experimented the new functionality of tapenadethat allows the user to select manually which procedure calls must be checkpointed or not. The results are quite interesting: on one large code, an improved placement of checkpoints gained a factor 10 in execution time. On the average, 20% gains are commonplace. In turn, studying why one placement gives better results suggests heuristics that can be applied systematically. Mauricio Araya presented this new functionality of tapenadetogether with the experiments on real codes at the ECCOMAS conference in Egmond aan Zee, The Netherlands .

Target language extensions in TAPENADE Nicolas Chleq Laurent Hascoët Christophe Massol Valérie Pascual Automatic Differentiation Tapenade Fortran90 C pointer analysis data-flow analysis static analysis

Following the evolution of programming practices among our end-users, we continuously adapt tapenadeto new languages and additional programming constructs. This year's results concern the languages Fortran90 and C. About programming constructs, this year's progress concern pointers, dynamic allocation, array (``vectorial'') notation, and modular constructs.

The bulk of new developments related to Fortran90 is now complete. tapenadenow completely handles the modular constructs of Fortran90, namely modules with public and private components, interfaces, renaming, overloading, and optional or default arguments of procedures. Like last year, these developments were driven by the large Fortran90 applications that we are working on, such as the oceanography code OPA 9.0 ( cf ). Inside tapenade, these constructs do not depend on the particular target language: our objective is that these developments will be reused when tapenadehandles other modular languages such as C++.

tapenadealso handles the array notation of Fortran90, also known as ``vectorial'' notation. This year's developments have dealt with the arbitrary combinations of WHERE, SUM, and MASKconstructs, for which the differentiated code is far from intuitive. In particular we found an interesting duality between WHEREand MASKthrough reverse AD: the best adjoint code for a WHEREstatement will often need SUM's with MASK's, and vice-versa.

Christophe Massol has finished the development of the pointer static data-flow analysis in tapenade, including memory allocation and deallocation primitives. He also has adapted all subsequent analyses in tapenade, so that they give correct results on programs with pointers. Finally, the tangent differentiation module was also adapted to pointers, introducing the notion of a differentiated pointer variable, required when the pointed target variables are themselves differentiated.

Nicolas Chleq has developped a C front-end and back-end for tapenade. This development takes advantage of the existing architecture of tapenade, which is mostly language-independent in the middle-end: without any extension to the inside analysis and differentiation parts, we could differentiate a small C program. Yet more testing and validation is needed before a tapenadefor C can be released.

Differentiation of large real applications Samuel Buis INRA, Avignon Benjamin Dauvergne Bruno Ferron IFREMER, Brest Laurent Hascoët Hicham Tber Arthur Vidard LMC/IMAG, Grenoble Automatic Differentiation Tapenade Variational Data Assimilation Agronomy Oceanography

We studied application of Automatic Differentiation to several very large scientific computation codes.

Bruno Ferron of IFREMER Brest gave us the latest Fortran90 version of the ocean simulation code OPA, version 9.0, 80,000 lines long, developed mainly by the LOCEAN lab. at Paris 6 university. We obtained a validated adjoint code for one test configuration of OPA, named GYRE. We were invited to present the results at the Data Assimilation meeting in Toulouse . This configuration simulates the behavior of a large rectangular basin of salt water, under the influence of the wind and of an initial vertical distribution of temperature, during 20 days. With tapenade, we computed the gradient of the heat flux across the northern boundary, with respect to the temperature field 20 days earlier.

Figure shows the computed gradient, which was validated automatically by comparison with divided differences, and validated as well by oceanographers who recognized on it classical shapes known as the Rossby and Kelvin waves. Computing this gradient takes only about 7 times as long as the simulation itself.

Since september 2006, Post-Doc student Hicham Tber has started differentiation of OPA on a much larger configuration of OPA, named NEMO, that simulates the North Atlantic basin on a longer period of time and with more realistic physics.

Colleagues from INRA in Avignon are starting to use tapenadeon large agriculture simulation codes. We collaborate with them to understand their specific needs and to correct the errors they find in tapenade. They use AD in two main directions. One is sensitivity analysis for simulation codes such as STICS and ISBA, that simulate growing plants on a one-year like period. The other is the inverse problem of estimating ground parameters from satellite images, using simulation codes such as SAIL or MULTISAIL .

Optimal control Frédéric Alauzet Projet Gamma, INRIA-Rocquencourt Francois Beux Scuola Normale Superiore di Pisa, Italy Alain Dervieux Laurent Hascoët Bruno Koobus Massimiliano Martinelli Universita di Pisa, Italy Youssef Mesri Université de Nice Mariano Vázquez Universitat de Girona, Spain optimum design optimal control gradient adjoint model

In industry research groups, simulation is well mastered and the next frontier is optimization. This problem is very difficult, because the typical number of optimization parameters is very high, particularly in CFD optimal control. In an industrial context, the optimization of a dozen of scalar parameters will not produce an optimal shape for an aircraft, because an accurate description of a shape requires hundreds of parameters. The optimization parameter can be a function defined on a surface or a volume. In the discrete case, the number of parameters depends on the discretization chosen, and is a priori large. Therefore, optimization requires an enormous computing power. We discuss in why the reverse mode of AD is an elegant way to obtain the adjoints that optimization uses. The reverse mode, and the subsequent adjoint state, are in fact the best way to get the gradients when the number of parameters is large.

In the European project HISAC on supersonic aircrafts ( ), the state equation cannot be solved accurately without a strong anisotropic mesh adaptation. Therefore, we have to design a new algorithm for the simultaneous solution of shape optimisation and mesh adaptation .

Beside specific AD problems, practical application to control problems requires that we consider the following issues, discussed this year in , , :

efficient computation of a large scale adjoint system

efficient optimization algorithms for large scales systems

efficient preconditioners for this optimization.

Mesh adaptation Frederic Alauzet Alain Dervieux Bruno Koobus Adrien Loseille Projet Gamma, INRIA-Rocquencourt Youssef Mesri optimization mesh adaptation adjoint

This subject is addressed jointly by INRIA teams GAMMA (Rocquencourt), TROPICS, and SMASH. In this collaboration, GAMMA brings mesh and approximation expertise, TROPICS contributes AD and adjoint methods, SMASH works on approximation and CFD applications.

The resolution of the optimum problem using the innovative approach of an AD-generated adjoint, can be used in a slightly different context than optimal shape design namely, mesh adaptation. This will be possible if we can map the mesh adaptation problem into a differentiable optimal control problem. To this end, we have introduced a new methodology that consists in setting the mesh adaptation problem under the form of a purely functional one: the mesh is reduced to a continuous property of the computational domain, the continuous metric, and we minimize a continuous model of the error resulting from that metric. Then the problem of searching an adapted mesh is transformed in the research of an optimal metric.

In the case of mesh interpolation minimization, the optimum is given by a close formula and gives access to a complete theory demonstrating that second order accuracy can be obtained on discontinuous field approximation. In the case of adaptation for Partial Differential Equations, an Optimal Control is obtained. Together with project-team GAMMA (Frédéric Alauzet, Adrien Loseille), TROPICS contributes this research to the HISAC IP European project, which involves 30 other partners in aeronautics .

Dissemination Links with Industry, Contracts

Mike Giles from Oxford University, and his student Devendra Ghate continue using tapenadeto build second-order derivatives programs in relation with Rolls-Royce turbine developments.

TROPICS participates in the European project HISAC, which started at the end of last year.

TROPICS participates in the project EVA-Flo: ``Evaluation et Validation Automatique pour le calcul FLOttant'', which is an ANR project accepted this year, and whose main contractor in ENS Lyon (Nathalie Revol).

TROPICS participates in the project LEFE, ``Les Enveloppes Fluides et l'Environnement'', which is a CNRS API project accepted this year.

TROPICS participates in the project NODESIM, ``Non-Deterministic Simulation for CFD-based design methodologies'', which is an European STREP project which was accepted this year.

TROPICS has submitted an INRIA-internal proposal for ``équipes associées'' with our partners in RWTH Aachen, whose results will be known shortly.

We are aware of tapenaderegular use by research groups in Argonne National Lab. (USA), Cranfield university (UK), Oxford university (UK), RWTH Aachen (Germany), and Humboldt university Berlin (Germany).

Conferences and workshops

Alain Dervieux presented his research on March 14 when receiving the Dassault prize of the French Academy of Sciences.

Laurent Hascoët presented AD and tapenadeat the International Conference on High Performance Scientific Computing HPSC 2006 in Hanoi, Vietnam.

Alain Dervieux gave a presentation at the CANUM 2006 conference, where he co-organized a minisymposium on Optimum Design together with J. Sokolowsky.

Alain Dervieux and Bruno Koobus visited the Barcelona Computer Center, and gave two presentations, respectively on Optimal Shape Design and on the VMS turbulence model.

Laurent Hascoët gave an invited talk to present results on reverse AD of the OPA 9.0 oceanography code at the ``Colloque National sur l'Assimilation de Données'' in Toulouse.

Laurent Hascoët presented the team's results on the Data-Flow Equations of Checkpointing at ICCS 2006 in Reading, UK.

Laurent Hascoët is one of the organizers of the euro-AD workshops. This year, one edition took place in Oxford, UK, and one in Aachen, Germany. Benjamin Dauvergne presented some preliminary results at the Oxford edition in June.

Laurent Hascoët gave a lecture on Automatic Differentiation at the CEA-EDF-INRIA ``école d'été'' on numerical analysis in June.

Mauricio Araya gave a presentation and Alain Dervieux was co-author of another at ECCOMAS 2006 in Egmond aan Zee, The Netherlands.

Laurent Hascoët presented the team's research to colleagues at INRA Avignon, with the medium-term goal of introducing AD into the popular INRA codes.

The TROPICS team co-organizes the French-Indian workshop ``Numerical Simulation, Control and Design for Aeronautical and Space Applications'' at INRIA Sophia-Antipolis from November 29 ^ththrough December 1 ^st2006. Alain Dervieux and Laurent Hascoët will both make presentations.

Mauricio Araya defended his PhD thesis on ``Approaches to Assess Validity of Derivatives and to Improve Efficiency in Automatic Differentiation of Programs'', on November 24 ^thin Sophia-Antipolis.

Optimisation Différentiable en Mécanique des Fluides Numérique F. Courty F. Ph. D. Thesis Université Paris-sud 2003 Reverse automatic differentiation for optimum design: from adjoint state assembly to gradient computation F. Courty F. A. Dervieux A. B. Koobus B. L. Hascoët L. Optimization Methods and Software 1055-6788 18 5 2003 615-627 Optimization loops for shape and error control A. Dervieux A. L. Hascoët L. M. Vázquez M. B. Koobus B. Recent Trends in Aerospace Design and Optimization Tata-McGraw Hill, New Delhi 2005 363-373 Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation A Griewank A. SIAM, Frontiers in Applied Mathematics 2000 The Adjoint Data-Flow Analyses: Formalization, Properties, and Applications L. Hascoët L. M. Araya-Polo M. H. M. Bücker H. M. G. Corliss G. P. Hovland P. U. Naumann U. B. Norris B. Automatic Differentiation: Applications, Theory, and Tools Lecture Notes in Computational Science and Engineering Springer 2005 Adjoining Independent Computations L. Hascoët L. S. Fidanova S. C. Held C. G. Corliss G. C. Faure C. A. Griewank A. L. Hascoët L. U. Naumann U. Automatic Differentiation of Algorithms: From Simulation to Optimization, New York, NY Computer and Information Science 35 Springer 2001 299-304 Analyses statiques et transformations de programmes: de la parallélisation à la différentiation L. Hascoët L. Habilitation Université de Nice Sophia-Antipolis 2005 ``To Be Recorded'' Analysis in Reverse-Mode Automatic Differentiation L. Hascoët L. U. Naumann U. V. Pascual V. Future Generation Computer Systems 0167-739X 21 8 2004 TAPENADE 2.1 user's guide L. Hascoët L. V Pascual V. Technical report 300 INRIA 2004 http://hal.inria.fr/inria-00069880 Multilevel optimization of a supersonic aircraft M. Vázquez M. A. Dervieux A. B. Koobus B. Finite Elements in Analysis and Design 0168-874X 40 2004 2101-2124 Approaches to assess Validity of Derivatives and to Improve Efficiency in Automatic Differentiation of Programs M. Araya-Polo M. PhD Université de Nice Sophia-Antipolis 2006 Positivity statements for a mixed-element-volume scheme on fixed and moving grids P.-H. Cournède P.-H. B. Koobus B. A. Dervieux A. Revue européenne de mécanique numérique 1779-7179 1779-7179 15 7-8 2006 767-799 Multilevel functional Preconditioning for shape optimisation F. Courty F. A. Dervieux A. International Journal of CFD 1061-8562 20 7 2006 481-490 Continuous metrics and mesh optimization F. Courty F. D. Leservoisier D. P.-L. George P.-L. A. Dervieux A. Applied Numerical Mathematics 0168-9274 56 2006 117-145 Optimization loops for shape and error control A. Dervieux A. F. Courty F. T. Roy T. M. Vázquez M. B. Koobus B. B. Bugeda B. et al. Verification and validation methods for challenging multiphysics problems Extended version in INRIA Research Report 5413 CIMNE, Barcelona 2006 A methodology for the shape optimization of flexible wings M. Vázquez M. A. Dervieux A. B. Koobus B. Engineering Computations 0264-4401 23 4 2006 344-367 The Data-Flow Equations of Checkpointing in reverse Automatic Differentiation B. Dauvergne B. L. Hascoët L. International Conference on Computational Science, ICCS 2006, Reading, UK 2006 Calculs de sensibilité par différentiation pour l'Aérodynamique A. Dervieux A. Y. Mesri Y. F. Courty F. L. Hascoët L. B. Koobus B. M. Vázquez M. proceedings of CANUM06, ESAIM journal to appear 2006 Data Representation Alternatives in Semantically Augmented Numerical Models M. Fagan M. L. Hascoët L. J. Utke J. 6th IEEE International Workshop on Source Code Analysis and Manipulation, SCAM 2006, Philadelphia, PA, USA 2006 Capacités actuelles de la Différentiation Automatique: l'adjoint d'OPA par TAPENADE B. Ferron B. L. Hascoët L. Colloque National sur l'Assimilation de Données, Toulouse, France 2006 Enabling User-driven checkpointing strategies in Reverse-mode Automatic Differentiation L. Hascoët L. M. Araya-Polo M. Proceedings of the ECCOMAS CFD 2006 conference, Egmond aan Zee, The Netherlands also INRIA Research Report #5930 2006 https://hal.inria.fr/inria-00079223 Sonic Boom reduction by mesh adapted optimal control F. Alauzet F. Y. Mesri Y. A. Dervieux A. Research Report European project HISAC 18th Month 2.3 2006 Compilers: Principles, Techniques and Tools A. Aho A. R. Sethi R. J. Ullman J. Addison-Wesley 1986 A language and an integrated environment for program transformations I. Attali I. V. Pascual V. C. Roudet C. research report 3313 INRIA 1997 http://hal.inria.fr/inria-00073376 ADIFOR 3.0 overview A. Carle A. M. Fagan M. Technical report CAAM-TR-00-02 Rice University 2000 Natural semantics on the computer D. Clément D. J. Despeyroux J. L. Hascoët L. G. Kahn G. K. Fuchi and M. Nivat, editors, Proceedings, France-Japan AI and CS Symposium, ICOT Also, Information Processing Society of Japan, Technical Memorandum PL-86-6. Also INRIA research report # 416 1986 49-89 http://hal.inria.fr/inria-00076140 Reasoning about program transformations J.-F. Collard J.-F. Springer 2002 Abstract Interpretation P. Cousot P. ACM Computing Surveys 0360-0300 28 1 1996 324-328 Interprocedural Array Region Analyses B. Creusillet B. F. Irigoin F. International Journal of Parallel Programming 0885-7458 24 6 1996 513–546 Tangent linear and Adjoint Model Compiler , Users manual 1.2 R. Giering R. 1997 http://www.autodiff.com/tamc Automatic differentiation and iterative processes J.C. Gilbert J. Optimization Methods and Software 1055-6788 1 1992 13–21 Adjoint methods for aeronautical design M.-B. Giles M.-B. Proceedings of the ECCOMAS CFD Conference 2001 Reduced Gradients and Hessians from Fixed Point Iteration for State Equations A. Griewank A. Ch. Faure C. Numerical Algorithms 1017-1398 30(2) 2002 113–139 Evaluating derivatives: principles and techniques of algorithmic differentiation A. Griewank A. SIAM, Frontiers in Applied Mathematics 2000 Transformations automatiques de spécifications sémantiques: application: Un vérificateur de types incremental L. Hascoët L. Ph. D. Thesis Université de Nice Sophia-Antipolis 1987 Automatic Differentiation of Navier-Stokes computations P. Hovland P. B. Mohammadi B. C. Bischof C. Technical report MCS-P687-0997 Argonne National Laboratory 1997 Improved estimates of vegetation biophysical variables from MERIS TOA images by using spatial and temporal constraints C. Lauvernet C. F. Baret F. L. Hascoët L. F.-X. LeDimet F.-X. proceedings of the 9th International symposium on Physical measurements and signatures in remote sensing, ISPMSRS 2005 2005 Variational algorithms for analysis and assimilation of meteorological observations: theoretical aspects F.X. LeDimet F. O. Talagrand O. Tellus 38A 1986 97-110 Practical application to fluid flows of automatic differentiation for design problems B. Mohammadi B. Von Karman Lecture Series 1997 Différentiation Automatique: application à un problème d'optimisation en météorologie N. Rostaing N. Ph. D. Thesis université de Nice Sophia-Antipolis 1993 Symbolic Bounds Analysis of Pointers, Array Indices, and Accessed Memory Regions R. Rugina R. M. Rinard M. Proceedings of the ACM SIGPLAN'00 Conference on Programming Language Design and Implementation ACM 2000 Sonic Boom reduction by mesh adapted optimal control F. Alauzet F. Y. Mesri Y. A. Dervieux A. Research Report European project HISAC 18th Month 2.3 2006 Approaches to assess Validity of Derivatives and to Improve Efficiency in Automatic Differentiation of Programs M. Araya-Polo M. PhD Université de Nice Sophia-Antipolis 2006 Multilevel functional Preconditioning for shape optimisation F. Courty F. A. Dervieux A. International Journal of CFD 1061-8562 20 7 2006 481-490 Positivity statements for a mixed-element-volume scheme on fixed and moving grids P.-H. Cournède P.-H. B. Koobus B. A. Dervieux A. Revue européenne de mécanique numérique 1779-7179 15 7-8 2006 767-799 Continuous metrics and mesh optimization F. Courty F. D. Leservoisier D. P.-L. George P.-L. A. Dervieux A. Applied Numerical Mathematics 0168-9274 56 2006 117-145 Optimization loops for shape and error control A. Dervieux A. F. Courty F. T. Roy T. M. Vázquez M. B. Koobus B. B. Bugeda B. et al. Verification and validation methods for challenging multiphysics problems Extended version in INRIA Research Report 5413 CIMNE, Barcelona 2006 The Data-Flow Equations of Checkpointing in reverse Automatic Differentiation B. Dauvergne B. L. Hascoët L. International Conference on Computational Science, ICCS 2006, Reading, UK 2006 Capacités actuelles de la Différentiation Automatique: l'adjoint d'OPA par TAPENADE B. Ferron B. L. Hascoët L. Colloque National sur l'Assimilation de Données, Toulouse, France 2006 Data Representation Alternatives in Semantically Augmented Numerical Models M. Fagan M. L. Hascoët L. J. Utke J. 6th IEEE International Workshop on Source Code Analysis and Manipulation, SCAM 2006, Philadelphia, PA, USA 2006 Enabling User-driven checkpointing strategies in Reverse-mode Automatic Differentiation L. Hascoët L. M. Araya-Polo M. Proceedings of the ECCOMAS CFD 2006 conference, Egmond aan Zee, The Netherlands also INRIA Research Report #5930 2006 https://hal.inria.fr/inria-00079223 A methodology for the shape optimization of flexible wings M. Vázquez M. A. Dervieux A. B. Koobus B. Engineering Computations 0264-4401 23 4 2006 344-367