Section: Research Program
Algorithmic Differentiation and Scientific Computing
Participants : Alain Dervieux, Laurent Hascoët, Bruno Koobus.
-
Glossary
- linearization
In Scientific Computing, the mathematical model often consists of Partial Differential Equations, that are discretized and then solved by a computer program. Linearization of these equations, or alternatively linearization of the computer program, predict the behavior of the model when small perturbations are applied. This is useful when the perturbations are effectively small, as in acoustics, or when one wants the sensitivity of the system with respect to one parameter, as in optimization.
- adjoint state
Consider a system of Partial Differential Equations that define some characteristics of a system with respect to some input parameters. Consider one particular scalar characteristic. Its sensitivity, (or gradient) with respect to the input parameters can be defined as the solution of “adjoint” equations, deduced from the original equations through linearization and transposition. The solution of the adjoint equations is known as the adjoint state.
Scientific Computing provides reliable simulations of complex systems. For example it is possible to simulate the steady or unsteady 3D air flow around a plane that captures the physical phenomena of shocks and turbulence. Next comes optimization, one degree higher in complexity because it repeatedly simulates and applies gradient-based optimization steps until an optimum is reached. The next sophistication is robustness i.e. to detect and to lower preference to a solution which, although maybe optimal, is very sensitive to uncertainty on design parameters or on manufacturing tolerances. This makes second derivative come into play. Similarly Uncertainty Quantification can use second derivatives to evaluate how uncertainty on the simulation inputs imply uncertainty on its outputs.
We investigate several approaches to obtain the gradient, between two extremes:
-
One can write an adjoint system of mathematical equations, then discretize it and program it by hand. This is time consuming. Although this looks mathematically sound [32], this does not provide the gradient of the discretized function itself, thus degrading the final convergence of gradient-descent optimization.
-
One can apply adjoint AD (cf3.1) on the program that discretizes and solves the direct system. This gives exactly the adjoint of the discrete function computed by the program. Theoretical results [31] guarantee convergence of these derivatives when the direct program converges. This approach is highly mechanizable, but leads to massive use of storage and may require code transformation by hand [36], [39] to reduce memory usage.
If for instance the model is steady, or when the computation uses a Fixed-Point iteration, tradeoffs exist between these two extremes [33], [27] that combine low storage consumption with possible automated adjoint generation. We advocate incorporating them into the AD model and into the AD tools.