The TROPICS team is at the junction of two research domains:
AD:On one hand, we study software engineering techniques, to analyze and transform programs semi-automatically. In the past, we developed semi-automatic parallelization strategies
aiming at SPMD parallelization. Presently, we focus on Automatic Differentiation (AD). AD transforms a program
Pthat computes a function
F, into a program
P'that computes some derivatives of
F, analytically. In particular, the so-called
reverse modeof AD yields gradients. However, this reverse mode remains very delicate to use, and its implementation requires carefully crafted algorithms.
CFD application of AD:On the other hand, we study the application of AD, and particularly of the adjoint method, to Computational Fluid Dynamics. This involves necessary adaptation of optimization strategies. This work applies to two real-life problems, optimal shape design and mesh adaptation.
The second aspect of our work (optimization in Scientific Computing), is thus at the same time the motivation and the application domain of the first aspect (program analysis and transformation, and computation of gradients through AD). Concerning AD, our goal is to automatically produce derivative programs that can compete with the hand-written sensitivity and adjoint programs which exist in the industry. We implement our ideas and algorithms into the tool tapenade, which is developed and maintained by the project. Apart from being an AD tool, tapenadeis also a platform for other analyses and transformations of scientific programs. tapenadeis easily available. We provide a web server, and alternatively a version can be downloaded from our web server. Practical details can be found in section .
Our present research directions are :
Modern numerical methods for finite elements or finite differences: multigrid methods, mesh adaptation.
Optimal shape design or optimal control in the context of fluid dynamics: for example shape optimization of the wings of a supersonic aircraft, to reduce sonic bang. In this context, we study the optimization of nonsteady processes and the use of higher-order derivatives for robust optimization.
Automatic Differentiation : differentiate particular algorithms in a specially adapted manner, validate the derivatives, and reduce runtime and memory consumption when computing gradients (``adjoints'') with the reverse mode and second-order derivatives, targetting at application to large non-stationnary simulation codes.
Common tools for program analysis and transformation: adequate internal representation, Call Graphs, Flow Graphs, Data-Dependence Graphs.
(AD) Automatic transformation of a program, that returns a new program that computes some derivatives of the given initial program, i.e. some combination of the partial derivatives of the program's outputs with respect to its inputs.
Mathematical manipulation of the Partial Derivative Equations that define a problem, obtaining new differential equations that define the gradient of the original problem's solution.
General trade-off technique, used in the reverse mode of AD, that trades duplicate execution of a part of the program to save some memory space that was used to save intermediate results. Checkpointing a code fragment amounts to running this fragment without any storage of intermediate values, thus saving memory space. Later, when such an intermediate value is required, the fragment is run a second time to obtain the required values.
Automatic or Algorithmic Differentiation (AD) differentiates
programs. An AD tool takes as input a source computer program
Pthat, given a vector argument
XIRn, computes some vector function
Y=
F(
X)
IRm. The AD tool generates a new source program that, given the argument
X, computes some derivatives of
F. In short, AD first assumes that
Prepresents all its possible run-time sequences of instructions, and it will in fact differentiate these sequences. Therefore, the
controlof
Pis put aside temporarily, and AD will simply reproduce this control into the differentiated program. In other words,
Pis differentiated only piecewise. Experience shows that this is reasonable in most cases, and going further is still an open research problem. Then, any sequence of
instructions is identified with a composition of vector functions. Thus, for a given control:
where each
fkis the elementary function implemented by instruction
Ik. Finally, AD simply applies the chain rule to obtain derivatives of
F. Let us call
Xkthe values of all variables after each instruction
Ik, i.e.
X0=
Xand
Xk=
fk(
Xk-1). The chain rule gives the Jacobian
of
F
which can be mechanically translated back into a sequence of instructions
, and these sequences inserted back into the control of
P, yielding program
. This can be generalized to higher level derivatives, Taylor series, etc.
In practice, the above Jacobian
is often far too expensive to compute and store. Notice for instance that equation (
) repeatedly multiplies matrices, whose size is of the
order of
m×
n. Moreover, some problems are solved using only some projections of
. For example, one may need only
sensitivities, which are
for a given direction
in the input space. Using equation (
), sensitivity is
which is easily computed from right to left, interleaved with the original program instructions. This is the principle of the tangent modeof AD, which is the most straightforward, of course available in tapenade.
However in optimization, data assimilation , adjoint problems , or inverse problems, the appropriate derivative is the gradient . Using equation ( ), the gradient is
which is most efficiently computed from right to left, because matrix ×vector products are so much cheaper than matrix ×matrix products. This is the principle of the reverse modeof AD.
This turns out to make a very efficient program, at least theoretically
Section 3.4. The computation time required
for the gradient is only a small multiple of the run-time of
P. It is independent from the number of parameters
n. In contrast, notice that computing the same gradient with the
tangent modewould require running the tangent differentiated program
ntimes.
We can observe that the
Xkare required in the
inverseof their computation order. If the original program
overwritesa part of
Xk, the differentiated program must restore
Xkbefore it is used by
. There are two strategies for that:
Recompute All (RA):the
Xkis recomputed when needed, restarting
Pon input
X0until instruction
Ik. The
taf
tool uses this strategy. Brute-force RA
strategy has a quadratic time cost with respect to the total number of run-time instructions
p.
Store All (SA):the
Xkare restored from a stack when needed. This stack is filled during a preliminary run of
P, that additionally stores variables on the stack just before they are overwritten. The
adifor
and
tapenadetools use this strategy. Brute-force SA strategy has a linear memory cost with respect to
p.
Both RA and SA strategies need a special storage/recomputation trade-off in order to be really profitable, and this makes them become very similar. This trade-off is called
checkpointing. Since
tapenadeuses the SA strategy, let us describe checkpointing in this context. The plain SA strategy applied to instructions
I1to
Ipbuilds the differentiated program sketched on figure
, where an initial ``forward sweep'' runs the original
program and stores intermediate values (black dots), and is followed by a ``backward sweep'' that computes the derivatives in the reverse order, using the stored values when necessary (white
dots). Checkpointing a fragment
Cof the program is illustrated on figure
. During the forward sweep, no value is stored while in
C. Later, when the backward sweep needs values from
C, the fragment is run again, this time with storage. One can see that the maximum storage space is grossly divided by 2. This also requires some extra memorization (a ``snapshot''), to
restore the initial context of
C. This snapshot is shown on figure
by slightly bigger black and white dots.
Checkpoints can be nested. In that case, a clever choice of checkpoints can make both the memory size and the extra recomputations grow like only the logarithm of the size of the program.
Tree representation of a computer program, that keeps only the semantically significant information and abstracts away syntactic sugar such as indentation, parentheses, or separators.
Representation of a procedure body as a directed graph, whose nodes, known as basic blocks, contain each a list of instructions to be executed in sequence, and whose arcs represent all possible control jumps that can occur at run-time.
Model that describes program static analyses as a special sort of execution, in which all branches of control switches are taken simultaneously, and where computed values are replaced by abstract values from a given semantic domain. Each particular analysis gives birth to a specific semantic domain.
Program analysis that studies how a given property of variables evolves with execution of the program. Data Flow analyses are static, therefore studying all possible run-time behaviors and making conservative approximations. A typical data-flow analysis is to detect whether a variable is initialized or not, at any location in the source program.
Program analysis that studies the itinerary of values during program execution, from the place where a value is generated to the places where it is used, and finally to the place where it is overwritten. The collection of all these itineraries is often stored as a data dependence graph, and data flow analysis most often rely on this graph.
Directed graph that relates accesses to program variables, from the write access that defines a new value to the read accesses that use this value, and conversely from the read accesses to the write access that overwrites this value. Dependences express a partial order between operations, that must be preserved to preserve the program's result.
The most obvious example of a program transformation tool is certainly a compiler. Other examples are program translators, that go from one language or formalism to another, or optimizers, that transform a program to make it run better. AD is just one such transformation. These tools use sophisticated analyses to improve the quality of the produced code. These tools share their technological basis. More importantly, there are common mathematical models to specify and analyze them.
An important principle is abstraction: the core of a compiler should not bother about syntactic details of the compiled program. In particular, it is desirable that the optimization and code generation phases be independent from the particular input programming language. This can generally be achieved through separate front-ends, that produce an internal language-independent representation of the program, generally an abstract syntax tree. For example, compilers like gccfor cand g77for fortran77have separate front-ends but share most of their back-end.
One can go further. As abstraction goes on, the internal representation becomes more language independent, and semantic constructs such as declarations, assignments, calls, IO operations, can be unified. Analyses can then concentrate on the semantics of a small set of constructs. We advocate an internal representation composed of three levels.
At the top level is the
call graph, whose nodes are the procedures. There is an arrow from node
Ato node
Biff
Apossibly calls
B. Recursion leads to cycles. The call graph captures the notions of visibility scope between procedures, that come from modules or classes.
At the middle level is the control flow graph. There is one flow graph per procedure, i.e. per node in the call graph. The flow graph captures the control flow between atomic instructions. Flow control instructions are represented uniformly inside the control flow graph.
At the lowest level are abstract syntax trees for the individual atomic instructions. Certain semantic transformations can benefit from the representation of expressions as directed acyclic graphs, sharing common sub-expressions.
To each basic block is associated a symbol table that gives access to properties of variables, constants, function names, type names, and so on. Symbol tables must be nested to implement lexical scoping.
Static program analyses can be defined on this internal representation, which is largely language independent. The simplest analyses on trees can be specified with inference rules , , . But many analyses are more complex, and are thus better defined on graphs than on trees. This is the case for data-flow analyses, that look for run-time properties of variables. Since flow graphs are cyclic, these global analyses generally require an iterative resolution. Data flow equationsis a practical formalism to describe data-flow analyses. Another formalism is described in , which is more precise because it can distinguish separate instancesof instructions. However it is still based on trees, and its cost forbids application to large codes. Abstract Interpretation is a theoretical framework to study complexity and termination of these analyses.
Data flow analyses must be carefully designed to avoid or control combinatorial explosion. The classical solution is to choose a hierarchical model. In this model, information, or at least a computationally expensive part of it, is synthesized. Specifically, it is computed bottom up, starting on the lowest (and smallest) levels of the program representation and then recursively combined at the upper (and larger) levels. Consequently, this synthesized information must be made independent of the context (i.e., the rest of the program). When the synthesized information is built, it is used in a final pass, essentially top down and context dependent, that propagates information from the ``extremities'' of the program (its beginning or end) to each particular subroutine, basic block, or instruction.
Even then, data flow analyses are limited, because they are static and thus have very little knowledge of actual run-time values. Most of them are undecidable; that is, there always exists a particular program for which the result of the analysis is uncertain. This is a strong, yet very theoretical limitation. More concretely, there are always cases where one cannot decide statically that, for example, two variables are equal. This is even more frequent with two pointers or two array accesses. Therefore, in order to obtain safe results, conservative over-approximationsof the computed information are generated. For instance, such approximations are made when analyzing the activity or the TBR (``To Be Restored'') status of some individual element of an array. Static and dynamic array region analyses , provide very good approximations. Otherwise, we make a coarse approximation such as considering all array cells equivalent.
When studying program transformations, one often wants to move instructions around without changing the results of the program. The fundamental tool for this is the data dependence graph. This graph defines an order between run-timeinstructions such that if this order is preserved by instructions rescheduling, then the output of the program is not altered. Data dependence graph is the basis for automatic parallelization. It is also useful in AD. Data dependence analysisis the static data-flow analysis that builds the data dependence graph.
The mathematical equations of Fluid Dynamics are Partial Derivative Equations, that are discretized and then solved by a computer program. Linearization of these equations, or alternatively linearization of the computer program, gives a modelization of the behavior of the flow when small perturbations are applied. This is useful when the perturbations are effectively small, like in acoustics, or when one wants the sensitivity of the system with respect to one parameter, like in optimization.
Consider a system of Partial Derivative Equations that define some characteristics of a system with respect to some input parameters. Consider one particular scalar characteristic. Its sensitivity, (or gradient) with respect to the input parameters can be defined as the solution of ``adjoint'' equations, deduced from the original equations through linearization and transposition. The solution of the adjoint equations is known as the adjoint state.
Computational Fluid Dynamics is now able to make reliable simulations of very complex systems. For example it is now possible to simulate completely the 3D air flow around a plane that captures the physical phenomena of shocks and turbulence. The next step in CFD appears to be optimization. Optimization is one degree higher in complexity, because it repeatedly simulates, evaluates directions of optimization and applies optimization steps, until an optimum is reached.
We restrict here to gradient descent methods. One risk is obviously to fall into local minima before reaching the global minimum. We do not address this question, although we believe that more robust approaches, such as evolutionary approaches, could benefit from a coupling with gradient descent approaches. Another well-known risk is the presence of discontinuities in the optimized function. We investigate two kinds of methods to cope with discontinuities: we can devise AD algorithms that detect the presence of discontinuities, and we can design optimization algorithms that solve some of these discontinuities.
We investigate several approaches to obtain the gradient. There are actually two extreme approaches:
One can write an adjoint system, then discretize it and program it by hand. The adjoint system is a new system, deduced from the original equations, and whose solution, the adjoint state, leads to the gradient. A hand-written adjoint is very sound mathematically, because the process starts back from the original equations. This process implies a new separate implementation phase to solve the adjoint system. During this manual phase, mathematical knowledge of the problem can be translated into many hand-coded refinements. But this may take an enormous engineering time. Except for special strategies (see ), this approach does not produce an exact gradient of the discrete functional, and this can be a problem if using optimization methods based on descent directions.
A program that computes the gradient can be built by pure Automatic Differentiation in the reverse mode ( cf ). It is in fact the adjoint of the discrete functional computed by the software, which is piecewise differentiable. It produces exact derivatives almost everywhere. Theoretical results guarantee convergence of these derivatives when the functional converges. This strategy gives reliable descent directions to the optimization kernel, although the descent step may be tiny, due to discontinuities. Most importantly, AD adjoint is generatedby a tool. This saves a lot of development and debug time. But this systematic approach leads to massive use of storage, requiring code transformation by hand to reduce memory usage. Mohammadi's work illustrates the advantages and drawbacks of this approach.
The drawback of AD is the amount of storage required. If the model is steady, can we use this important property to reduce this amount of storage needed? Actually this is possible, as shown in , where computation of the adjoint state uses the iterated states in the direct order. Alternatively, most researchers (see for example ) use only the fully converged state to compute the adjoint. This is usually implemented by a hand modification of the code generated by AD. But this is delicate and error-prone. The TROPICS team investigate hybrid methods that combine these two extreme approaches.
Automatic Differentiation of programs gives sensitivities or gradients, that are useful for many types of applications:
optimum shape design under constraints, multidisciplinary optimization, and more generally any algorithm based on local linearization,
inverse problems, such as parameter estimation and in particular variational data assimilation in climate sciences (meteorology, oceanography)
first-order linearization of complex systems, or higher-order simulations, yielding reduced models for simulation of complex systems around a given state,
mesh adaptation and mesh optimization with gradients or adjoints,
equation solving with the Newton method,
sensitivity analysis, propagation of truncation errors.
We will detail some of them in the next sections. These applications require an AD tool that differentiates programs written in classical imperative languages, fortran77, fortran95, c, or c++. We also consider our AD tool tapenadeas a platform to implement other program analyses and transformations. tapenadedoes the tedious job of building the internal representation of the program, and then provides an API to build new tools on top of this representation. One application of tapenadeis therefore to build prototypes of new program analyses.
A CFD program computes the flow around a shape, starting from a number of inputs that define the shape and other parameters. From this flow, it computes an optimization criterion, such as the lift of an aircraft. To optimize the criterion by a gradient descent, one needs the gradient of the output criterion with respect to all the inputs, and possibly additional gradients when there are constraints. The reverse mode of AD is a promising way to compute these gradients.
Inverse problems aim at estimating the value of hidden parameters from other measurable values, that depend on the hidden parameters through a system of equations. For example, the hidden parameter might be the shape of the ocean floor, and the measurable values the altitude and speed of the surface. Another example is data assimilationin weather forecasting. The initial state of the simulation conditions the quality of the weather prediction. But this initial state is largely unknown. Only some measures at arbitrary places and times are available. The initial state is found by solving a least squares problem between the measures and a guessed initial state which itself must verify the equations of meteorology. This rapidly boils down to solving an adjoint problem, which can be done though AD .
Simulating a complex system often requires solving a system of Partial Differential Equations. This is sometimes too expensive, in particular in the context of real time. When one wants to simulate the reaction of this complex system to small perturbations around a fixed set of parameters, there is a very efficient approximate solution: just suppose that the system is linear in a small neighborhood of the current set of parameters. The reaction of the system is thus approximated by a simple product of the variation of the parameters with the Jacobian matrix of the system. This Jacobian matrix can be obtained by AD. This is especially cheap when the Jacobian matrix is sparse. The simulation can be improved further by introducing higher-order derivatives, such as Taylor expansions, which can also be computed through AD. The result is often called a reduced model.
It has been noticed that some approximation errors can be expressed by an adjoint state. Mesh adaptation can benefit from this. The classical optimization step can give an optimization direction not only for the control parameters, but also for the approximation parameters, and in particular the mesh geometry. The ultimate goal is to obtain optimal control parameters up to a precision prescribed in advance.
tapenadeis the Automatic Differentiation tool developed by the TROPICS team. tapenadeprogressively implements the results of our research about models and static analyses for AD. From this standpoint, tapenadeis a research tool. Our objective is also to promote the use of AD in the scientific computation world, including the industry. Therefore the team constantly maintains tapenadeto meet the demands of our industrial users. tapenadecan be simply used as a web server, available at the URL
http://tapenade.inria.fr:8080/tapenade/index.jsp
It can also be downloaded and installed from our FTP server
ftp://ftp-sop.inria.fr/tropics/tapenade/README.html
A documentation is available on our web page
http://www-sop.inria.fr/tropics/
and as an INRIA technical report (RT-0300)
http://hal.inria.fr/inria-00069880
tapenadedifferentiates computer programs according to the model described in section . It supports three modes of differentiation:
the tangentmode that computes a directional derivative ,
the
vector tangentmode that computes
for many directions
Xnsimultaneously, and can therefore compute Jacobians, and
the reversemode that computes the gradient .
An obvious fourth mode could be the vector reversemode, which is not yet implemented. Many other modes exist in the other AD tools in the world, that compute for example higher degree derivatives or Taylor expansions. For the time being, we restrict ourselves to first-order derivatives and we put our efforts on the reverse mode. But as we said before, we also view tapenadeas a platform to build new program transformations, in particular new differentiations.
Like any program transformation tool, tapenadeneeds sophisticated static analyses in order to produce an efficient output. Concerning AD, the following analyses are a must, and tapenadenow performs them all:
Pointers destinations:For any static program transformation, and in particular differentiation, it is essential to have a precise knowledge of the possible destinations of each pointer at each code line. Otherwise one must make conservative assumptions that will lead to less efficient code. Our static pointer analysis finds precise information about pointer destinations, taking into account memory allocation and deallocation operations.
Activity:The end-user has the opportunity to specify which of the output variables must be differentiated (called the dependent variables), and with respect to which of the input variables (called the independent variables). Activity analysis propagates the dependent, backward through the program, to detect all intermediate variables that possibly influence the dependent. Conversely, activity analysis also propagates the independent, forward through the program, to find all intermediate variables that possibly depend on the independent. Only the intermediate variables that both depend on the independent and influence the dependent are called active, and will receive an associated derivative variable. Activity analysis makes the differentiated program smaller and faster.
Adjoint Liveness and Read-Write:Programs produced by the reverse mode of AD show a very particular structure, due to their mechanism to restore intermediate values of the original program in the reverseorder. This has deep consequences on the liveness and Read-Write status of variables, that we can exploit to take away unnecessary instructions and memory usage from the reverse (adjoint) program. This makes the adjoint program smaller and faster by factors that can go up to 40%.
TBR:The reverse mode of AD, with the Store-All strategy, stores all intermediate variables just before they are overwritten. However this is often unnecessary, because derivatives of some expressions (e.g. linear expressions) only use the derivatives of their arguments and not the original arguments themselves. In other words, the local Jacobian matrix of an instruction may not need all the intermediate variables needed by the original instruction. The To Be Restored (TBR)analysis finds which intermediate variables need not be stored during the forward sweep, and therefore makes the differentiated program smaller in memory.
Several other strategies are implemented in tapenadeto improve the differentiated code. For example, a data-dependence analysis allows tapenadeto move instructions around safely, gathering instructions to reduce cache misses. Also, long expressions are split in a specific way, to minimize duplicate sub-expressions in the derivative expressions.
The input languages of tapenadeare fortran77and fortran95. Thanks to the language-independent internal representation of programs, as shown on figure , we could relatively easily extend tapenadeto C, at least for simple programs. More work is still required to obtain a completely reliable differentiation of C.
There are two user interfaces for tapenade. One is a simple command that can be called from a shell or from a Makefile. The other is interactive, using java swingcomponents and htmlpages. This interface allows one to use tapenadefrom windowsas well as linux. The input interface lets one specify interactively the routine to differentiate, its independent inputs and dependent outputs. The output interface, shown on figure , displays the differentiated programs, with htmllinks that implement source-code correspondence, as well as correspondence between error messages and locations in the source.
tapenadeis now available for linux, sun, and windows-xpplatforms.
Figure shows the architecture of tapenade. It is implemented mostly in java, apart from the front-ends which are separated and can be written in their own languages.
Notice the clear separation between the general-purpose program analyses, based on a general representation, and the differentiation engine itself. Other tools can be built on top of the Imperative Language Analyzer platform.
The end-user can specify properties of external or black-box routines. This is essential for real industrial applications that use many libraries. The source of these libraries is generally hidden. However AD needs some information about these black-box routines in order to produce efficient code. tapenadelets the user specify this information in a separate signature file. Specifically for the reverse mode of AD, tapenadelets the user specify finely which procedure calls must be checkpointed or not, to improve the overal performances of the differentiated program.
As described in , the reverse mode of AD is a very appealing approach to compute gradients, but it suffers from the need to restore intermediate values in the reverse of their original computation order. In our approach, this question is basically solved through storage, whose cost can be very high. We use checkpointing to mitigate this cost, trading extra recomputations for memory space. An efficient storage and checkpointing strategy is really the key to a wide usage of reverse AD in the industry. This is why we continue our modelization and experimentation efforts to find the best possible strategies.
This year we formalized the ``snapshot''. When a fragment of code is ``checkpointed'', it is executed twice: the snapshot is the set of variables which must be saved and restored to ensure that the second execution is equivalent to the first. There exist several formulae that give the snapshot of a given checkpoint, all based on common sense. However we observe that further improvement of these formulae is delicate, and common sense is not enough to justify the improvements. In particular, reducing a snapshot can lead to an increase of storage in other parts of the code, and the overall memory consumption may turn out to be worse.
We study this question starting from a formal description of the interactions between all memory storage involved in checkpointing, and from it we derive a system of set equations that we solve formally, using Maple. After resolution, it turns out that there exist a continuum of optimal snapshots, none of which is strictly better (smaller) than the others. We are able to characterize these good solutions, and then to propose heuristics to select the most appropriate. One solution appears to give the best results in most applications, and it is now the default strategy in tapenade. We presented these results at the ICCS 2006 conference in Reading UK .
These results on snapshots are an additional contribution to the more general problem of selecting the best possible placement of checkpoints in a given code. This central problem is addressed from two different angles in the PhD research of Benjamin Dauvergne and Mauricio Araya.
Benjamin Dauvergne focuses on finding indicators that can be used to place checkpoints, i.e. find the nested code fragments that will give best performances if checkpointed. These indicators can result from a profiling run of the original program or of the differentiated program. One of this year's results is the development of a model that correctly predicts the memory and time behavior of a given checkpoints placement. Benjamin Dauvergne presented his preliminary results at this spring euro-AD workshop in Oxford, UK.
At the same time, Mauricio Araya experimented the new functionality of tapenadethat allows the user to select manually which procedure calls must be checkpointed or not. The results are quite interesting: on one large code, an improved placement of checkpoints gained a factor 10 in execution time. On the average, 20% gains are commonplace. In turn, studying why one placement gives better results suggests heuristics that can be applied systematically. Mauricio Araya presented this new functionality of tapenadetogether with the experiments on real codes at the ECCOMAS conference in Egmond aan Zee, The Netherlands .
Following the evolution of programming practices among our end-users, we continuously adapt tapenadeto new languages and additional programming constructs. This year's results concern the languages Fortran90 and C. About programming constructs, this year's progress concern pointers, dynamic allocation, array (``vectorial'') notation, and modular constructs.
The bulk of new developments related to Fortran90 is now complete. tapenadenow completely handles the modular constructs of Fortran90, namely modules with public and private components, interfaces, renaming, overloading, and optional or default arguments of procedures. Like last year, these developments were driven by the large Fortran90 applications that we are working on, such as the oceanography code OPA 9.0 ( cf ). Inside tapenade, these constructs do not depend on the particular target language: our objective is that these developments will be reused when tapenadehandles other modular languages such as C++.
tapenadealso handles the array notation of Fortran90, also known as ``vectorial'' notation. This year's developments have dealt with the arbitrary combinations of WHERE, SUM, and MASKconstructs, for which the differentiated code is far from intuitive. In particular we found an interesting duality between WHEREand MASKthrough reverse AD: the best adjoint code for a WHEREstatement will often need SUM's with MASK's, and vice-versa.
Christophe Massol has finished the development of the pointer static data-flow analysis in tapenade, including memory allocation and deallocation primitives. He also has adapted all subsequent analyses in tapenade, so that they give correct results on programs with pointers. Finally, the tangent differentiation module was also adapted to pointers, introducing the notion of a differentiated pointer variable, required when the pointed target variables are themselves differentiated.
Nicolas Chleq has developped a C front-end and back-end for tapenade. This development takes advantage of the existing architecture of tapenade, which is mostly language-independent in the middle-end: without any extension to the inside analysis and differentiation parts, we could differentiate a small C program. Yet more testing and validation is needed before a tapenadefor C can be released.
We studied application of Automatic Differentiation to several very large scientific computation codes.
Bruno Ferron of IFREMER Brest gave us the latest Fortran90 version of the ocean simulation code OPA, version 9.0, 80,000 lines long, developed mainly by the LOCEAN lab. at Paris 6 university. We obtained a validated adjoint code for one test configuration of OPA, named GYRE. We were invited to present the results at the Data Assimilation meeting in Toulouse . This configuration simulates the behavior of a large rectangular basin of salt water, under the influence of the wind and of an initial vertical distribution of temperature, during 20 days. With tapenade, we computed the gradient of the heat flux across the northern boundary, with respect to the temperature field 20 days earlier.
Figure shows the computed gradient, which was validated automatically by comparison with divided differences, and validated as well by oceanographers who recognized on it classical shapes known as the Rossby and Kelvin waves. Computing this gradient takes only about 7 times as long as the simulation itself.
Since september 2006, Post-Doc student Hicham Tber has started differentiation of OPA on a much larger configuration of OPA, named NEMO, that simulates the North Atlantic basin on a longer period of time and with more realistic physics.
Colleagues from INRA in Avignon are starting to use tapenadeon large agriculture simulation codes. We collaborate with them to understand their specific needs and to correct the errors they find in tapenade. They use AD in two main directions. One is sensitivity analysis for simulation codes such as STICS and ISBA, that simulate growing plants on a one-year like period. The other is the inverse problem of estimating ground parameters from satellite images, using simulation codes such as SAIL or MULTISAIL .
In industry research groups, simulation is well mastered and the next frontier is optimization. This problem is very difficult, because the typical number of optimization parameters is very high, particularly in CFD optimal control. In an industrial context, the optimization of a dozen of scalar parameters will not produce an optimal shape for an aircraft, because an accurate description of a shape requires hundreds of parameters. The optimization parameter can be a function defined on a surface or a volume. In the discrete case, the number of parameters depends on the discretization chosen, and is a priori large. Therefore, optimization requires an enormous computing power. We discuss in why the reverse mode of AD is an elegant way to obtain the adjoints that optimization uses. The reverse mode, and the subsequent adjoint state, are in fact the best way to get the gradients when the number of parameters is large.
In the European project HISAC on supersonic aircrafts ( ), the state equation cannot be solved accurately without a strong anisotropic mesh adaptation. Therefore, we have to design a new algorithm for the simultaneous solution of shape optimisation and mesh adaptation .
Beside specific AD problems, practical application to control problems requires that we consider the following issues, discussed this year in , , :
efficient computation of a large scale adjoint system
efficient optimization algorithms for large scales systems
efficient preconditioners for this optimization.
This subject is addressed jointly by INRIA teams GAMMA (Rocquencourt), TROPICS, and SMASH. In this collaboration, GAMMA brings mesh and approximation expertise, TROPICS contributes AD and adjoint methods, SMASH works on approximation and CFD applications.
The resolution of the optimum problem using the innovative approach of an AD-generated adjoint, can be used in a slightly different context than optimal shape design namely, mesh adaptation. This will be possible if we can map the mesh adaptation problem into a differentiable optimal control problem. To this end, we have introduced a new methodology that consists in setting the mesh adaptation problem under the form of a purely functional one: the mesh is reduced to a continuous property of the computational domain, the continuous metric, and we minimize a continuous model of the error resulting from that metric. Then the problem of searching an adapted mesh is transformed in the research of an optimal metric.
In the case of mesh interpolation minimization, the optimum is given by a close formula and gives access to a complete theory demonstrating that second order accuracy can be obtained on discontinuous field approximation. In the case of adaptation for Partial Differential Equations, an Optimal Control is obtained. Together with project-team GAMMA (Frédéric Alauzet, Adrien Loseille), TROPICS contributes this research to the HISAC IP European project, which involves 30 other partners in aeronautics .
Mike Giles from Oxford University, and his student Devendra Ghate continue using tapenadeto build second-order derivatives programs in relation with Rolls-Royce turbine developments.
TROPICS participates in the European project HISAC, which started at the end of last year.
TROPICS participates in the project EVA-Flo: ``Evaluation et Validation Automatique pour le calcul FLOttant'', which is an ANR project accepted this year, and whose main contractor in ENS Lyon (Nathalie Revol).
TROPICS participates in the project LEFE, ``Les Enveloppes Fluides et l'Environnement'', which is a CNRS API project accepted this year.
TROPICS participates in the project NODESIM, ``Non-Deterministic Simulation for CFD-based design methodologies'', which is an European STREP project which was accepted this year.
TROPICS has submitted an INRIA-internal proposal for ``équipes associées'' with our partners in RWTH Aachen, whose results will be known shortly.
We are aware of tapenaderegular use by research groups in Argonne National Lab. (USA), Cranfield university (UK), Oxford university (UK), RWTH Aachen (Germany), and Humboldt university Berlin (Germany).
Alain Dervieux presented his research on March 14 when receiving the Dassault prize of the French Academy of Sciences.
Laurent Hascoët presented AD and tapenadeat the International Conference on High Performance Scientific Computing HPSC 2006 in Hanoi, Vietnam.
Alain Dervieux gave a presentation at the CANUM 2006 conference, where he co-organized a minisymposium on Optimum Design together with J. Sokolowsky.
Alain Dervieux and Bruno Koobus visited the Barcelona Computer Center, and gave two presentations, respectively on Optimal Shape Design and on the VMS turbulence model.
Laurent Hascoët gave an invited talk to present results on reverse AD of the OPA 9.0 oceanography code at the ``Colloque National sur l'Assimilation de Données'' in Toulouse.
Laurent Hascoët presented the team's results on the Data-Flow Equations of Checkpointing at ICCS 2006 in Reading, UK.
Laurent Hascoët is one of the organizers of the euro-AD workshops. This year, one edition took place in Oxford, UK, and one in Aachen, Germany. Benjamin Dauvergne presented some preliminary results at the Oxford edition in June.
Laurent Hascoët gave a lecture on Automatic Differentiation at the CEA-EDF-INRIA ``école d'été'' on numerical analysis in June.
Mauricio Araya gave a presentation and Alain Dervieux was co-author of another at ECCOMAS 2006 in Egmond aan Zee, The Netherlands.
Laurent Hascoët presented the team's research to colleagues at INRA Avignon, with the medium-term goal of introducing AD into the popular INRA codes.
The TROPICS team co-organizes the French-Indian workshop ``Numerical Simulation, Control and Design for Aeronautical and Space Applications'' at INRIA Sophia-Antipolis from November 29 ththrough December 1 st2006. Alain Dervieux and Laurent Hascoët will both make presentations.
Mauricio Araya defended his PhD thesis on ``Approaches to Assess Validity of Derivatives and to Improve Efficiency in Automatic Differentiation of Programs'', on November 24 thin Sophia-Antipolis.