The last decade has witnessed a remarkable convergence between several sub-domains of the calculus of variations, namely optimal transport (and its many generalizations), infinite dimensional geometry of diffeomorphisms groups and inverse problems in imaging (in particular sparsity-based regularization). This convergence is due to (i) the mathematical objects manipulated in these problems, namely sparse measures (e.g. coupling in transport, edge location in imaging, displacement fields for diffeomorphisms) and (ii) the use of similar numerical tools from non-smooth optimization and geometric discretization schemes. Optimal Transportation, diffeomorphisms and sparsity-based methods are powerful modeling tools, that impact a rapidly expanding list of scientific applications and call for efficient numerical strategies. Our research program shows the important part played by the team members in the development of these numerical methods and their application to challenging problems.

Optimal Mass Transportation is a mathematical research topic which started two centuries ago with Monge's work on the “Théorie des déblais et des remblais" (see 106).
This engineering problem consists in minimizing the transport cost between two given mass densities. In the 40's, Kantorovich 113 introduced a powerful linear relaxation and introduced its dual formulation. The Monge-Kantorovich problem became a specialized research topic in optimization and Kantorovich obtained the 1975 Nobel prize in economics for his contributions to resource allocations problems. Since the seminal discoveries of Brenier in the 90's 59, Optimal Transportation has received renewed attention from mathematical analysts and the Fields Medal awarded in 2010 to C. Villani, who gave important contributions to Optimal Transportation and wrote the modern reference monographs 152, 153, arrived at a culminating moment for this theory. Optimal Mass Transportation is today a mature area of mathematical analysis with a constantly growing range of applications. Optimal Transportation has also received a lot of attention from probabilists (see for instance the recent survey 118 for an overview of the Schrödinger problem which is a stochastic variant of the Benamou-Brenier dynamical formulation of optimal transport). The development of numerical methods for Optimal Transportation and Optimal Transportation related problems is a difficult topic and comparatively underdeveloped. This research field has experienced a surge of activity in the last five years, with important contributions of the Mokaplan group (see the list of important publications of the team). We describe below a few of recent and less recent Optimal Transportation concepts and methods which are connected to the future activities of Mokaplan :

Brenier's theorem 62 characterizes the unique optimal map as the gradient of a convex
potential. As such Optimal Transportation may be interpreted as an infinite dimensional optimisation problem under “convexity constraint": i.e. the solution of this infinite dimensional optimisation problem is a convex potential. This connects Optimal Transportation to “convexity constrained" non-linear variational problems such as, for instance, Newton's problem of the body of minimal resistance.
The value function of the optimal transport problem is also known to define a distance between source and target densities called the Wasserstein distance which plays a key role in many applications such as image processing.

A formal substitution of the optimal transport map as the gradient of a convex potential in the mass conservation constraint (a Jacobian equation) gives a non-linear Monge-Ampère equation. Caffarelli 70 used this result to extend the regularity theory for the Monge-Ampère equation. In the last ten years, it also motivated new research on numerical solvers for non-linear degenerate Elliptic equations 9412247 46 and the references therein. Geometric approaches based on Laguerre diagrams and discrete data 127 have also been developed. Monge-Ampère based Optimal Transportation solvers have recently given the first linear cost computations of Optimal Transportation (smooth) maps.

In recent years, the classical Optimal Transportation problem has been extended in several directions. First, different ground costs measuring the “physical" displacement have been considered. In particular, well posedness for a large class of convex and concave costs has been established by McCann and Gangbo 105. Optimal Transportation techniques have been applied for example to a Coulomb ground cost in Quantum chemistry in relation with Density Functional theory 90. Given the densities of electrons Optimal Transportation models the potential energy and their relative positions. For more than more than 2 electrons (and therefore more than 2 densities) the natural extension of Optimal Transportation is the so called Multi-marginal Optimal Transport (see 134 and the references therein). Another instance of multi-marginal Optimal Transportation arises in the so-called Wasserstein barycenter problem between an arbitrary number of densities 31. An interesting overview of this emerging new field of optimal transport and its applications can be found in the recent survey of Ghoussoub and Pass 133.

Optimal transport has found many applications, starting from its relation with several physical models such as the semi-geostrophic equations in meteorology 110, 92, 91, 40, 121, mesh adaptation 120, the reconstruction of the early mass distribution of the Universe 102, 60 in Astrophysics, and the numerical optimisation of reflectors following the Optimal Transportation interpretation of Oliker 69 and Wang 154. Extensions of OT such as multi-marginal transport has potential applications in Density Functional Theory , Generalized solution of Euler equations 61 (DFT) and in statistics and finance 37, 104 .... Recently, there has been a spread of interest in applications of OT methods in imaging sciences 54, statistics 51 and machine learning 93. This is largely due to the emergence of fast numerical schemes to approximate the transportation distance and its generalizations, see for instance 43. Figure 1 shows an example of application of OT to color transfer. Figure 9 shows an example of application in computer graphics to interpolate between input shapes.

While the optimal transport problem, in its original formulation, is a static problem (no time evolution is considered), it makes sense in many applications to rather consider time evolution. This is relevant for instance in applications to fluid dynamics or in medical images to perform registration of organs and model tumor growth.

In this perspective, the optimal transport in Euclidean space corresponds to an evolution where each particule of mass evolves in straight line. This interpretation corresponds to the Computational Fluid Dynamic (CFD) formulation proposed by Brenier and Benamou in 39. These solutions are time curves in the space of densities and geodesics for the Wasserstein distance. The CFD formulation relaxes the non-linear mass conservation constraint into a time dependent continuity equation, the cost function remains convex but is highly non smooth. A remarkable feature of this dynamical formulation is that it can be re-cast as a convex but non smooth optimization problem. This convex dynamical formulation finds many non-trivial extensions and applications, see for instance 41. The CFD formulation also appears to be a limit case of Mean Fields games (MFGs), a large class of economic models introduced by Lasry and Lions 115 leading to a system coupling an Hamilton-Jacobi with a Fokker-Planck equation. In contrast, the Monge case where the ground cost is the euclidan distance leads to a static system of PDEs 56.

Another extension is, instead of considering geodesic for transportation metric (i.e. minimizing the Wasserstein distance to a target measure), to make the density evolve in order to minimize some functional. Computing the steepest descent direction with respect to the Wasserstein distance defines a so-called Wasserstein gradient flow, also known as JKO gradient flows after its authors 112. This is a popular tool to study a large class of non-linear diffusion equations. Two interesting examples are the Keller-Segel system for chemotaxis 111, 85 and a model of congested crowd motion proposed by Maury, Santambrogio and Roudneff-Chupin 126. From the numerical point of view, these schemes are understood to be the natural analogue of implicit scheme for linear parabolic equations. The resolution is however costly as it involves taking the derivative in the Wasserstein sense of the relevant energy, which in turn requires the resolution of a large scale convex but non-smooth minimization.

To tackle more complicated warping problems, such as those encountered in medical image analysis, one unfortunately has to drop the convexity of the functional involved in defining the gradient flow. This gradient flow can either be understood as defining a geodesic on the (infinite dimensional) group of diffeomorphisms 36, or on a (infinite dimensional) space of curves or surfaces 155. The de-facto standard to define, analyze and compute these geodesics is the “Large Deformation Diffeomorphic Metric Mapping” (LDDMM) framework of Trouvé, Younes, Holm and co-authors 36, 109. While in the CFD formulation of optimal transport, the metric on infinitesimal deformations is just the

Beside image warping and registration in medical image analysis, a key problem in nearly all imaging applications is the reconstruction of high quality data from low resolution observations. This field, commonly referred to as “inverse problems”, is very often concerned with the precise location of features such as point sources (modeled as Dirac masses) or sharp contours of objects (modeled as gradients being Dirac masses along curves). The underlying intuition behind these ideas is the so-called sparsity model (either of the data itself, its gradient, or other more complicated representations such as wavelets, curvelets, bandlets 125 and learned representation 156).

The huge interest in these ideas started mostly from the introduction of convex methods to serve as proxy for these sparse regularizations. The most well known is the

However, the theoretical analysis of sparse reconstructions involving real-life acquisition operators (such as those found in seismic imaging, neuro-imaging, astro-physical imaging, etc.) is still mostly an open problem. A recent research direction, triggered by a paper of Candès and Fernandez-Granda 73, is to study directly the infinite dimensional problem of reconstruction of sparse measures (i.e. sum of Dirac masses) using the total variation of measures (not to be mistaken for the total variation of 2-D functions). Several works 72, 98, 95 have used this framework to provide theoretical performance guarantees by basically studying how the distance between neighboring spikes impacts noise stability.

In image processing, one of the most popular methods is the total variation regularization 142, 66. It favors low-complexity images that are piecewise constant, see Figure 3 for some examples on how to solve some image processing problems. Beside applications in image processing, sparsity-related ideas also had a deep impact in statistics 147 and machine learning 34. As a typical example, for applications to recommendation systems, it makes sense to consider sparsity of the singular values of matrices, which can be relaxed using the so-called nuclear norm (a.k.a. trace norm) 33. The underlying methodology is to make use of low-complexity regularization models, which turns out to be equivalent to the use of partly-smooth regularization functionals 119, 149 enforcing the solution to belong to a low-dimensional manifold.

The dynamical formulation of optimal transport creates a link between optimal transport and geodesics on diffeomorphisms groups. This formal link has at least two strong implications that Mokaplan will elaborate on: (i) the development of novel models that bridge the gap between these two fields ; (ii) the introduction of novel fast numerical solvers based on ideas from both non-smooth optimization techniques and Bregman metrics, as highlighted in Section 3.2.3.

In a similar line of ideas, we believe a unified approach is needed to tackle both sparse regularization in imaging and various generalized OT problems. Both require to solve related non-smooth and large scale optimization problems. Ideas from proximal optimization has proved crucial to address problems in both fields (see for instance 39, 140). Transportation metrics are also the correct way to compare and regularize variational problems that arise in image processing (see for instance the Radon inversion method proposed in 43) and machine learning (see 93). This unity in term of numerical methods is once again at the core of Section 3.2.3.

The first layer of methodological tools developed by our team is a set of theoretical continuous models that aim at formalizing the problems studied in the applications. These theoretical findings will also pave the way to efficient numerical solvers that are detailed in Section 3.2.

(Participants: G. Carlier, J-D. Benamou, V. Duval, Xavier Dupuis (LUISS Guido Carli University, Roma)) The principal agent problem plays a distinguished role in the literature on asymmetric information and contract theory (with important contributions from several Nobel prizes such as Mirrlees, Myerson or Spence) and it has many important applications in optimal taxation, insurance, nonlinear pricing. The typical problem consists in finding a cost minimizing strategy for a monopolist facing a population of agents who have an unobservable characteristic, the principal therefore has to take into account the so-called incentive compatibilty constraint which is very similar to the cyclical monotonicity condition which characterizes optimal transport plans. In a special case, Rochet and Choné 141 reformulated the problem as a variational problem subject to a convexity constraint. For more general models, and using ideas from Optimal Transportation, Carlier 76 considered the more general

Our expertise:
We have already contributed to the numerical resolution of the Principal Agent problem in the case of the convexity constraint, see 81, 128, 130.

Goals:
So far, the mathematical PA model can be numerically solved for simple utility functions.
A Bregman approach inspired by 43 is currently being developed 79 for more general functions. It would be extremely useful as a complement to the theoretical
analysis. A new semi-Discrete Geometric approach is also investigated where the method reduces to
non-convex polynomial optimization.

(Participants: G. Carlier, J-D. Benamou, G. Peyré)
A challenging branch of emerging generalizations of Optimal Transportation arising in economics, statistics and finance concerns Optimal Transportation with conditional constraints. The martingale optimal transport 37, 104 which appears naturally in mathematical finance aims at computing robust bounds on option prices as the value of an optimal transport problem where not only the marginals are fixed but the coupling should be the law of a martingale, since it represents the prices of the underlying asset under the risk-neutral probability at the different dates. Note that as soon as more than two dates are involved, we are facing a multimarginal problem.

Our expertise:
Our team has a deep expertise on the topic of OT and its generalization, including many already existing collaboration between its members, see for instance 43, 48, 41 for some representative recent collaborative publications.

Goals:
This is a non trivial extension of Optimal Transportation theory and Mokaplan will develop numerical methods (in the spirit of entropic regularization) to address it. A popular problem in statistics is the so-called quantile regression problem, recently Carlier, Chernozhukov and Galichon 77 used an Optimal Transportation approach to extend quantile regression to several dimensions. In this approach again, not only fixed marginals constraints are present but also constraints on conditional means. As in the martingale Optimal Transportation problem, one has to deal with an extra conditional constraint. The duality approach usually breaks down under such constraints and characterization of optimal couplings is a challenging task both from a theoretical and numerical viewpoint.

(Participants: G. Carlier, J-D. Benamou, M. Laborde, Q. Mérigot, V. Duval)
The connection between the static and dynamic transportation problems (see Section 2.3) opens the door to many extensions, most notably by leveraging the use of gradient flows in metric spaces. The flow with respect to the transportation distance has been introduced by Jordan-Kindelherer-Otto (JKO) 112 and provides a variational formulation of many linear and non-linear diffusion equations. The prototypical example is the Fokker Planck equation. We will explore this formalism to study new variational problems over probability spaces, and also to derive innovative numerical solvers.
The JKO scheme has been very successfully used to study evolution equations that have the structure of a gradient flow in the Wasserstein space. Indeed many important PDEs have this structure: the Fokker-Planck equation (as was first considered by 112), the porous medium equations, the granular media equation, just to give a few examples. It also finds application in image processing 65. Figure 4 shows examples of gradient flows.

Our expertise:
There is an ongoing collaboration between the team members on the theoretical and numerical analysis of gradient flows.

Goals:
We apply and extend our research on JKO numerical methods to treat various extensions:

(Participants: G. Carlier, J-D. Benamou, G. Peyré)
Congested transport theory in the discrete framework of networks has received a lot of attention since the 50's starting with the seminal work of Wardrop. A few years later, Beckmann proved that equilibria are characterized as solution of a convex minimization problem. However, this minimization problem involves one flow variable per path on the network, its dimension thus quickly becomes too large in practice. An alternative, is to consider continuous in space models of congested optimal transport as was done in 80 which leads to very degenerate PDEs 57.

Our expertise:
MOKAPLAN members have contributed a lot to the analysis of congested transport problems and to optimization problems with respect to a metric which can be attacked numerically by fast marching methods 48.

Goals:
The case of general networks/anisotropies is still not well understood, general

(Participants: F-X. Vialard, J-D. Benamou, G. Peyré, L. Chizat)
A major issue with the standard dynamical formulation of OT is that it does not allow for variation of mass during the evolution, which is required when tackling medical imaging applications such as tumor growth modeling 68 or tracking elastic organ movements 144. Previous attempts 123, 138 to introduce a source term in the evolution typically lead to mass teleportation (propagation of mass with infinite speed), which is not always satisfactory.

Our expertise:
Our team has already established key contributions both to connect OT to fluid dynamics 39 and to define geodesic metrics on the space of shapes and diffeomorphisms 87.

Goals:
Lenaic Chizat's PhD thesis aims at bridging the gap between dynamical OT formulation, and LDDDM diffeomorphisms models (see Section 2.3). This will lead to biologically-plausible evolution models that are both more tractable numerically than LDDM competitors, and benefit from strong theoretical guarantees associated to properties of OT.

(Participants: G. Carlier, J-D. Benamou)
The Optimal Transportation Computational Fluid Dynamics (CFD) formulation is a limit case of variational Mean-Field Games (MFGs), a new branch of game theory recently developed by J-M. Lasry and P-L. Lions 115 with an extremely wide range of potential applications 107. Non-smooth proximal optimization methods used successfully for the Optimal Transportation can be used in the case of deterministic MFGs with singular data and/or potentials 42. They provide a robust treatment of the positivity constraint on the density of players.

Our expertise:
J.-D. Benamou has pioneered with Brenier the CFD approach to Optimal Transportation. Regarding MFGs, on the numerical side, our team has already worked on the use of augmented Lagrangian methods in MFGs 41 and on the analytical side 75 has explored rigorously the optimality system for a singular CFD problem similar to the MFG system.

Goals:
We will work on the extension to stochastic MFGs. It leads to non-trivial numerical difficulties already pointed out in 30.

(Participants: G. Carlier, J-D. Benamou, Q. Mérigot, F. Santambrogio (U. Paris-Sud), Y. Achdou (Univ. Paris 7), R. Andreev (Univ. Paris 7))
Many models from PDEs and fluid mechanics have been used to give a description of people or vehicles moving in a congested environment.
These models have to be classified according to the dimension (1D model are mostly used for cars on traffic networks, while 2-D models are most suitable for pedestrians), to the congestion effects (“soft” congestion standing for the phenomenon where high densities slow down the movement, “hard” congestion for the sudden effects when contacts occur, or a certain threshold is attained), and to the possible rationality of the agents
Maury et al 126 recently developed a theory for 2D hard congestion models without rationality, first in a discrete and then in a continuous framework. This model produces a PDE that is difficult to attack with usual PDE methods, but has been successfully studied via Optimal Transportation techniques again related to the JKO gradient flow paradigm. Another possibility to model crowd motion is to use the mean field game approach of Lions and Lasry which limits of Nash equilibria when the number of players is large. This also gives macroscopic models where congestion may appear but this time a global equilibrium strategy is modelled rather than local optimisation by players like in
the JKO approach. Numerical methods are starting to be available, see for instance 30, 64.

Our expertise:
We have developed numerical methods to tackle both the JKO approach and the MFG approach. The Augmented Lagrangian (proximal) numerical method can
actually be applied to both models 41, JKO and deterministic MFGs.

Goals:
We want to extend our numerical approach to more realistic congestion model where the speed of agents depends on the density, see Figure 6 for
preliminary results. Comparison with different numerical approaches will also be performed inside the ANR ISOTACE.
Extension of the Augmented Lagrangian approach to Stochastic MFG will be studied.

(Participants: F-X. Vialard, G. Peyré, B. Schmitzer, L. Chizat)
Diffeomorphic image registration is widely used in medical image
analysis. This class of problems can be seen as the computation of a
generalized optimal transport, where the optimal path is a geodesic on a
group of diffeomorphisms.
The major difference between the two approaches
being that optimal transport leads to non smooth optimal maps in
general, which is however compulsory in diffeomorphic image matching. In
contrast, optimal transport enjoys a convex variational formulation
whereas in LDDMM the minimization problem is non convex.

Our expertise:
F-X. Vialard is an expert of diffeomorphic image matching (LDDMM)
150, 63, 148.
Our team has already studied flows and geodesics over non-Riemannian
shape spaces, which allows for piecewise smooth
deformations 87.

Goals:
Our aim consists in bridging the gap between standard
optimal transport and diffeomorphic methods by building new diffeomorphic matching variational formulations that are
convex (geometric obstructions might however appear). A related
perspective is the development of new registration/transport models in
a Lagrangian framework, in the spirit of 145, 144 to
obtain more meaningful statistics on longitudinal studies.

Diffeomorphic matching consists in the minimization of a functional that is a sum of a deformation cost and a similarity measure. The choice of the similarity measure is as important as the deformation cost. It is often chosen as a norm on a Hilbert space such as functions, currents or varifolds. From a Bayesian perspective, these similarity measures are related to the noise model on the observed data which is of geometric nature and it is not taken into account when using Hilbert norms. Optimal transport fidelity have been used in the context of signal and image denoising 117, and it is an important question to extends these approach to registration problems. Therefore, we propose to develop similarity measures that are geometric and computationally very efficient using entropic regularization of optimal transport.

Our approach is to use a regularized optimal transport to design new similarity measures on all of those Hilbert spaces. Understanding the precise connections between the evolution of shapes and probability distributions will be investigated to cross-fertilize both fields by developing novel transportation metrics and diffeomorphic shape flows.

The corresponding numerical schemes are however computationally very costly. Leveraging our understanding of the dynamic optimal transport problem and its numerical resolution, we propose to develop new algorithms. These algorithms will use the smoothness of the Riemannian metric to improve both accuracy and speed, using for instance higher order minimization algorithm on (infinite dimensional) manifolds.

(Participants: F-X. Vialard, G. Peyré, B. Schmitzer, L. Chizat)
The LDDMM framework has been advocated to enable statistics on the space
of shapes or images that benefit from the estimation of the deformation.
The statistical results of it strongly depend on the choice of the
Riemannian metric. A possible direction consists in learning the right
invariant Riemannian metric as done in 151
where a correlation matrix (Figure 7) is learnt which represents
the covariance matrix of the deformation fields for a given population
of shapes.
In the same direction, a question of emerging interest in medical
imaging is the analysis of time sequence of shapes (called longitudinal
analysis) for early diagnosis of disease, for instance 100.
A key question is the inter subject comparison of the organ evolution
which is usually done by transport of the time evolution in a common
coordinate system via parallel transport or other more basic methods.
Once again, the statistical results (Figure 8) strongly depend on
the choice of the metric or more generally on the connection that
defines parallel transport.

Our expertise:
Our team has already studied statistics on longitudinal evolutions in
100, 101.

Goals:
Developing higher order numerical schemes for parallel transport (only
low order schemes are available at the moment) and
developing variational models to learn the metric or the connections for
improving statistical results.

(Participants: G. Peyré, V. Duval, C. Poon, Q. Denoyelle)
As detailed in Section 2.4, popular methods for regularizing inverse problems in imaging make use of variational analysis over infinite-dimensional (typically non-reflexive) Banach spaces, such as Radon measures or bounded variation functions.

Our expertise:
We have recently shown in 149 how – in the finite dimensional case – the non-smoothness of the functionals at stake is crucial to enforce the emergence of geometrical structures (edges in images or fractures in physical materials 53) for discrete (finite dimensional) problems. We extended this result in a simple infinite dimensional setting, namely sparse regularization of Radon measures for deconvolution 95.
A deep understanding of those continuous inverse problems is crucial to analyze the behavior of their discrete counterparts, and in 96 we have taken advantage of this understanding to develop a fine analysis of the artifacts induced by discrete (i.e. which involve grids) deconvolution models.
These works are also closely related to the problem of limit analysis and yield design in mechanical plasticity, see 78, 53 for an existing collaboration between Mokaplan's team members.

Goals:
A current major front of research in the mathematical analysis of inverse problems is to extend these results for more complicated infinite dimensional signal and image models, such as for instance the set of piecewise regular functions. The key bottleneck is that, contrary to sparse measures (which are finite sums of Dirac masses), here the objects to recover (smooth edge curves) are not parameterized by a finite number of degrees of freedom.
The relevant previous work in this direction are the fundamental results of Chambolle, Caselles and co-workers 38, 32, 84. They however only deal with the specific case where there is no degradation operator and no noise in the observations. We believe that adapting these approaches using our construction of vanishing derivative pre-certificate 95 could lead to a solution to these theoretical questions.

(Participants: G. Peyré, J-M. Mirebeau, D. Prandi)
Modeling and processing natural images require to take into account their geometry through anisotropic diffusion operators, in order to denoise and enhance directional features such as edges and textures 137, 97. This requirement is also at the heart of recently proposed models of cortical processing 136. A mathematical model for these processing is diffusion on sub-Riemanian manifold. These methods assume a fixed, usually linear, mapping from the 2-D image to a lifted function defined on the product of space and orientation (which in turn is equipped with a sub-Riemannian manifold structure).

Our expertise:
J-M. Mirebeau is an expert in the discretization of highly anisotropic diffusions through the use of locally adaptive computational stencils 131, 97. G. Peyré has done several contributions on the definition of geometric wavelets transform and directional texture models, see for instance 137. Dario Prandi has recently applied methods from sub-Riemannian geometry to image restoration 55.

Goals:
A first aspect of this work is to study non-linear, data-adaptive, lifting from the image to the space/orientation domain. This mapping will be implicitly defined as the solution of a convex variational problem. This will open both theoretical questions (existence of a solution and its geometrical properties, when the image to recover is piecewise regular) and numerical ones (how to provide a faithful discretization and fast second order Newton-like solvers). A second aspect of this task is to study the implication of these models for biological vision, in a collaboration with the UNIC Laboratory (directed by Yves Fregnac), located in Gif-sur-Yvette. In particular, the study of the geometry of singular vectors (or “ground states” using the terminology of 49) of the non-linear sub-Riemannian diffusion operators is highly relevant from a biological modeling point of view.

(Participants: G. Peyré, V. Duval, C. Poon) Scanner data acquisition is mathematically modeled as a (sub-sampled) Radon transform 108. It is a difficult inverse problem because the Radon transform is ill-posed and the set of observations is often aggressively sub-sampled and noisy 143. Typical approaches 114 try to recover piecewise smooth solutions in order to recover precisely the position of the organ being imaged. There is however a very poor understanding of the actual performance of these methods, and little is known on how to enhance the recovery.

Our expertise:
We have obtained a good understanding of the performance of inverse problem regularization on compact domains for pointwise sources localization 95.

Goals:
We aim at extending the theoretical performance analysis obtained for sparse measures 95 to the set of piecewise regular 2-D and 3-D functions.
Some interesting previous work of C. Poon et al 139
(C. Poon is currently a postdoc in Mokaplan)
have tackled related questions in the field of variable Fourier sampling for compressed sensing application (which is a toy model for fMRI imaging). These approaches are however not directly applicable to Radon sampling, and require some non-trivial adaptations.
We also aim at better exploring the connection of these methods with optimal-transport based fidelity terms such as those introduced in 29.

(Participants: G. Peyré, F-X. Vialard, J-D. Benamou, L. Chizat)
Some applications in medical image analysis require to track shapes whose evolution is governed by a growth process. A typical example is tumor growth, where the evolution depends on some typically unknown but meaningful parameters that need to be estimated. There exist well-established mathematical models 68, 135 of non-linear diffusions that take into account recently biologically observed property of tumors. Some related optimal transport models with mass variations have also recently been proposed 124, which are connected to so-called metamorphoses models in the LDDMM framework 50.

Our expertise:
Our team has a strong experience on both dynamical optimal transport models and diffeomorphic matching methods (see Section 3.1.2).

Goals:
The close connection between tumor growth models 68, 135 and gradient flows for (possibly non-Euclidean) Wasserstein metrics (see Section 3.1.2) makes the application of the numerical methods we develop particularly appealing to tackle large scale forward tumor evolution simulation.
A significant departure from the classical OT-based convex models is however required.
The final problem we wish to solve is the backward (inverse) problem of estimating tumor parameters from noisy and partial observations.
This also requires to set-up a meaningful and robust data fidelity term, which can be for instance a generalized optimal transport metric.

The above continuous models require a careful discretization, so that the fundamental properties of the models are transferred to the discrete setting. Our team aims at developing innovative discretization schemes as well as associated fast numerical solvers, that can deal with the geometric complexity of the variational problems studied in the applications. This will ensure that the discrete solution is correct and converges to the solution of the continuous model within a guaranteed precision. We give below examples for which a careful mathematical analysis of the continuous to discrete model is essential, and where dedicated non-smooth optimization solvers are required.

(Participants: J-D. Benamou, G. Carlier, J-M. Mirebeau, Q. Mérigot) Optimal transportation models as well as continuous models in economics can be formulated as infinite dimensional convex variational problems with the constraint that the solution belongs to the cone of convex functions. Discretizing this constraint is however a tricky problem, and usual finite element discretizations fail to converge.

Our expertise:
Our team is currently investigating new discretizations, see in particular the recent proposal 46 for the Monge-Ampère equation and 130 for general non-linear variational problems. Both offer convergence guarantees and are amenable to fast numerical resolution techniques such as Newton solvers.
Since 46 explaining how to treat efficiently and in full generality Transport Boundary Conditions
for Monge-Ampère, this is a promising fast and new approach to compute Optimal Transportation viscosity solutions.
A monotone scheme is needed. One is based on Froese Oberman work 103, a new different
and more accurate approach has been proposed by Mirebeau, Benamou and Collino 45.
As shown in 89, discretizing the constraint for a continuous function to be convex is not trivial.
Our group has largely contributed to solve this problem with G. Carlier 81, Quentin Mérigot 128 and J-M. Mirebeau 130. This problem is connected to the construction of monotone schemes for the Monge-Ampère equation.

Goals:
The current available methods are 2-D. They need to be optimized and parallelized. A non-trivial extension to 3-D is necessary for many applications. The notion of

(Participants: J-D. Benamou, G. Carlier, J-M. Mirebeau, G. Peyré, Q. Mérigot)
As detailed in Section 2.3, gradient Flows for the Wasserstein metric (aka JKO gradient flows 112) provides a variational formulation of many non-linear diffusion equations. They also open the way to novel discretization schemes.
From a computational point, although the JKO scheme is constructive (it is based on the implicit Euler scheme), it has not been very much used in practice numerically because the Wasserstein term is difficult to handle (except in dimension one).

Our expertise:

Solving one step of a JKO gradient flow is similar to solving an Optimal transport problem. A geometrical a discretization of the Monge-Ampère operator approach has been proposed by Mérigot, Carlier, Oudet and Benamou in 44 see Figure 4. The Gamma convergence of the discretisation (in space) has been proved.

Goals:
We are also investigating the application of other numerical approaches to Optimal Transport to JKO gradient flows either
based on the CFD formulation or on the entropic regularization of the Monge-Kantorovich problem (see section 3.2.3).
An in-depth study and comparison of all these methods will be necessary.

(Participants: V. Duval, G. Peyré, G. Carlier, Jalal Fadili (ENSICaen), Jérôme Malick (CNRS, Univ. Grenoble)) While pervasive in the numerical analysis community, the problem of discretization and

Our expertise:
We have provided the first results on the discrete-to-continous convergence in both sparse regularization variational problems 95, 96 and the static formulation of OT and Wasserstein barycenters 82

Goals:
In a collaboration with Jérôme Malick (INRIA Grenoble), our first goal is to generalize the result of 95 to generic partly-smooth convex regularizers routinely used in imaging science and machine learning, a prototypal example being the nuclear norm (see 149 for a review of this class of functionals).
Our second goal is to extend the results of 82 to the novel class of entropic discretization schemes we have proposed 43, to lay out the theoretical foundation of these ground-breaking numerical schemes.

(Participants: G. Peyré, V. Duval, I. Waldspurger)
There has been a recent spark of attention of the imaging community on so-called “grid free” methods, where one tries to directly tackle the infinite dimensional recovery problem over the space of measures, see for instance 73, 95.
The general idea is that if the range of the imaging operator is finite dimensional, the associated dual optimization problem is also finite dimensional (for deconvolution, it corresponds to optimization over the set of trigonometric polynomials).

Our expertise:
We have provided in 95 a sharp analysis of the support recovery property of this class of methods for the case of sparse spikes deconvolution.

Goals:
A key bottleneck of these approaches is that, while being finite dimensional, the dual problem necessitates to handle a constraint of polynomial positivity, which is notoriously difficult to manipulate (except in the very particular case of 1-D problems, which is the one exposed in 73). A possible, but very costly, methodology is to ressort to Lasserre's SDP representation hierarchy 116. We will make use of these approaches and study how restricting the level of the hierarchy (to obtain fast algorithms) impacts the recovery performances (since this corresponds to only computing approximate solutions). We will pay a particular attention to the recovery of 2-D piecewise constant functions (the so-called total variation of functions regularization 142), see Figure 3 for some illustrative applications of this method.

(Participants: G. Peyré, J-D. Benamou, G. Carlier, Jalal Fadili (ENSICaen))
Both sparse regularization problems in imaging (see Section 2.4) and dynamical optimal transport (see Section 2.3) are instances of large scale, highly structured, non-smooth convex optimization problems.
First order proximal splitting optimization algorithms have recently gained lots of interest for these applications because they are the only ones capable of scaling to giga-pixel discretizations of images and volumes and at the same time handling non-smooth objective functions. They have been successfully applied to optimal transport 39, 132, congested optimal transport 67 and to sparse regularizations (see for instance 140 and the references therein).

Our expertise:
The pioneering work of our team has shown how these proximal solvers can be used to tackle the dynamical optimal transport problem 39, see also 132. We have also recently developed new proximal schemes that can cope with non-smooth composite objectives functions 140.

Goals:
We aim at extending these solvers to a wider class of variational problems, most notably optimization under divergence constraints 41. Another subject we are investigating is the extension of these solvers to both non-smooth and non-convex objective functionals, which are mandatory to handle more general transportation problems and novel imaging regularization penalties.

(Participants: G. Peyré G. Carlier, L. Nenna, J-D. Benamou, L. Nenna, Marco Cuturi (Kyoto Univ.))
The entropic regularization of the Kantorovich linear program for OT has been shown to be surprisingly simple and efficient, in particular for applications in machine learning 93. As shown in 43, this is a special instance of the general method of Bregman iterations, which is also a particular instance of first order proximal schemes according to the Kullback-Leibler divergence.

Our expertise:
We have recently 43 shown how Bregman projections 58 and Dykstra algorithm 35 offer a generic optimization framework to solve a variety of generalized OT problems. Carlier and Dupuis 79 have designed a new method based on alternate Dykstra projections and applied it to the principal-agent problem in microeconomics.
We have applied this method in computer graphics in a paper accepted in SIGGRAPH 2015 146. Figure 9 shows the potential of our approach to handle giga-voxel datasets: the input volumetric densities are discretized on a

Goals:
Following some recent works (see in particular 86) we first aim at studying primal-dual optimization schemes according to Bregman divergences (that would go much beyond gradient descent and iterative projections), in order to offer a versatile and very effective framework to solve variational problems involving OT terms.
We then also aim at extending the scope of usage of this method to applications in quantum mechanics (Density Functional Theory, see 90) and fluid dynamics (Brenier's weak solutions of the incompressible Euler equation, see 61). The computational challenge is that realistic physical examples are of a huge size not only because of the space discretization of one marginal but also because of the large number of marginals involved (for incompressible Euler the number of marginals equals the number of time steps).

FreeForm Optics, Fluid Mechanics (Incompressible Euler, Semi-Geostrophic equations), Quantum Chemistry (Density Functional Theory), Statistical Physics (Schroedinger problem), Porous Media.

Full Waveform Inversion (Geophysics), Super-resolution microscopy (Biology), Satellite imaging (Meteorology)

Mean-field games, spatial economics, principal-agent models, taxation, nonlinear pricing.

1
César Barilla, Guillaume Carlier, Jean-Michel Lasry

We propose a (toy) MFG model for the evolution of residents and firms densities, coupled both by labour market equilibrium conditions and competition for land use (congestion). This results in a system of two Hamilton-Jacobi-Bellman and two Fokker-Planck equations with a new form of coupling related to optimal transport. This MFG has a convex potential which enables us to find weak solutions by a variational approach. In the case of quadratic Hamiltonians, the problem can be reformulated in Lagrangian terms and solved numerically by an IPFP/Sinkhorn-like scheme. We present numerical results based on this approach, these simulations exhibit different behaviours with either agglomeration or segregation dominating depending on the initial conditions and parameters.

2
Jean-David Benamou

We present an overviewof the basic theory, modern optimal transportation extensions and recent algorithmic advances. Selected modelling and numerical applications illustrate the impact of optimal transportation in numerical analysis.

3
Guillaume Carlier, Katharina Eichinger, Alexey Kroshnin

In this paper, we investigate properties of entropy-penalized Wasserstein barycenters as a regularization of Wasserstein barycenters. After characterizing these barycenters in terms of a system of Monge-Ampère equations, we prove some global moment and Sobolev bounds as well as higher regularity properties. We finally establish a central limit theorem for entropic-Wasserstein barycenters.

4
Maria Colombo, Antonio De Rosa, Andrea Marchese, Paul Pegon, Antoine Prouff

We prove the stability of optimal traffic plans in branched transport. In particular, we show that any limit of optimal traffic plans is optimal as well. This result goes beyond the Eulerian stability proved in [Colombo, De Rosa, Marchese ; 2021], extending it to the Lagrangian framework.

5
Vincent Duval

Describing the solutions of inverse problems arising in signal or image processing is an important issue both for theoretical and numerical purposes. We propose a principle which describes the solutions to convex variational problems involving a finite number of measurements. We discuss its optimality on various problems concerning the recovery of Radon measures.

Thomas Gallouët, Quentin Merigot, Andrea Natale

When expressed in Lagrangian variables, the equations of motion for compressible (barotropic) fluids have the structure of a classical Hamiltonian system in which the potential energy is given by the internal energy of the fluid. The dissipative counterpart of such a system coincides with the porous medium equation, which can be cast in the form of a gradient flow for the same internal energy. Motivated by these related variational structures, we propose a particle method for both problems in which the internal energy is replaced by its Moreau-Yosida regularization in the L2 sense, which can be efficiently computed as a semi-discrete optimal transport problem. Using a modulated energy argument which exploits the convexity of the problem in Eulerian variables, we prove quantitative convergence estimates towards smooth solutions. We verify such estimates by means of several numerical tests.

7
Andrea Natale, Gabriele Todeschi

We construct Two-Point Flux Approximation (TPFA) finite volume schemes to solve the quadratic optimal transport problem in its dynamic form, namely the problem originally introduced by Benamou and Brenier. We show numerically that these type of discretizations are prone to form instabilities in their more natural implementation, and we propose a variation based on nested meshes in order to overcome these issues. Despite the lack of strict convexity of the problem, we also derive quantitative estimates on the convergence of the method, at least for the discrete potential and the discrete cost. Finally, we introduce a strategy based on the barrier method to solve the discrete optimization problem.

10
Adrien Vacher, Boris Muzellec, Alessandro Rudi, Francis Bach, François-Xavier Vialard

It is well-known that plug-in statistical estimation of optimal transport suffers from the curse of dimensionality. Despite recent efforts to improve the rate of estimation with the smoothness of the problem, the computational complexity of these recently proposed methods still degrade exponentially with the dimension. In this paper, thanks to an infinitedimensional sum-of-squares representation, we derive a statistical estimator of smooth optimal transport which achieves a precision

15
Xavier Bacon, Guillaume Guillaume Carlier

We use convex duality techniques to study a spatial Pareto problem with transport costs and derive a spatial second welfare theorem. The existence of an integrable equilibrium distribution of quantities is nontrivial and established under general monotonicity assumptions. Our variational approach also enables us to give a numerical algorithmà la Sinkhorn and present simulations for equilibrium prices and quantities in one-dimensional domains and a network of French cities.

16
Jean-David Benamou, Guillaume Chazareix, Wilbert L Ijzerman, Giorgi Rukhaia

We address the “freeform optics” inverse problem of designing a reﬂector surface mapping a prescribed source distribution of light to a prescribed far ﬁeld distribution, for a ﬁnite light source. When the ﬁnite source reduces to a point source, the light source distribution has support only on the optics ray directions. In this setting the inverse problem is well posed for arbitrary source and target probability distributions. It can be recast as an Optimal Transportation problem and has been studied both mathematically and nu-merically. We are not aware of any similar mathematical formulation in the ﬁnite source case: i.e. the source has an “´etendue” with support both in space and directions. We propose to leverage the well-posed variational formulation of the point source problem to build a smooth parameterization of the reﬂec-tor and the reﬂection map. Under this parameterization we can construct a smooth loss/misﬁt function to optimize for the best solution in this class of reﬂectors. Both steps, the parameterization and the loss, are related to Optimal Transportation distances. We also take advantage of recent progress in the numerical approximation and resolution of these mathematical objects to perform a numerical study.

17
Matteo Bonforte, Jean Dolbeault, Bruno Nazaret, Nikita Simonov

The purpose of this work is to establish a quantitative and constructive stability result for a class of subcritical Gagliardo-Nirenberg-Sobolev inequalities which interpolates between the logarithmic Sobolev inequality and the standard Sobolev inequality (in dimension larger than three), or Onofri's inequality in dimension two. We develop a new strategy, in which the flow of the fast diffusion equation is used as a tool: a stability result in the inequality is equivalent to an improved rate of convergence to equilibrium for the flow. The regularity properties of the parabolic flow allow us to connect an improved entropy - entropy production inequality during an initial time layer to spectral properties of a suitable linearized problem which is relevant for the asymptotic time layer. Altogether, the stability in the inequalities is measured by a deficit which controls in strong norms (a Fisher information which can be interpreted as a generalized Heisenberg uncertainty principle) the distance to the manifold of optimal functions. The method is constructive and, for the first time, quantitative estimates of the stability constant are obtained, including in the critical case of Sobolev's inequality. To build the estimates, we establish a quantitative global Harnack principle and perform a detailed analysis of large time asymptotics by entropy methods.

18Guillaume Carlier, Arnaud Dupuy, Alfred Galichon, Yifei Sun
: In this paper, we describe a novel iterative procedure called SISTA to learn the underlying cost in optimal transport problems. SISTA is a hybrid between two classical methods, coordinate descent ("S"-inkhorn) and proximal gradient descent ("ISTA"). It alternates between a phase of exact minimization over the transport potentials and a phase of proximal gradient descent over the parameters of the transport cost. We prove that this method converges linearly, and we illustrate on simulated examples that it is significantly faster than both coordinate descent and ISTA. We apply it to estimating a model of migration, which predicts the flow of migrants using country-specific characteristics and pairwise measures of dissimilarity between countries. This application demonstrates the effectiveness of machine learning in quantitative social sciences.

19
Guillaume Carlier, Gero Friesecke, Daniela Vögler

We present a novel analogue for finite exchangeable sequences of the de Finetti, Hewitt and Savage theorem and investigate its implications for multi-marginal optimal transport (MMOT) and Bayesian statistics. If (Z 1 , ..., Z N) is a finitely exchangeable sequence of N random variables taking values in some Polish space X, we show that the law µ k of the first k components has a representation of the form.

20
Guilaume Carlie

The aim of this short note is to give an elementary proof of linear convergence of the Sinkhorn algorithm for the entropic regularization of multi-marginal optimal transport. The proof simply relies on: i) the fact that Sinkhorn iterates are bounded, ii) strong convexity of the exponential on bounded intervals and iii) the convergence analysis of the coordinate descent (Gauss-Seidel) method of Beck and Tetruashvili.

21
Antonin Chambolle, Robert Tovey

FISTA is a popular convex optimisation algorithm which is known to converge at an optimal rate whenever the optimisation domain is contained in a suitable Hilbert space. We propose a modified algorithm where each iteration is performed in a subspace, and that subspace is allowed to change at every iteration. Analytically, this allows us to guarantee convergence in a Banach space setting, although at a reduced rate depending on the conditioning of the specific problem. Numerically we show that a greedy adaptive choice of discretisation can greatly increase the time and memory efficiency in infinite dimensional Lasso optimisation problems.

22
Yohann De Castro, Vincent Duval, Romain Petit

We introduce an algorithm to solve linear inverse problems regularized with the total (gradient) variation in a gridless manner. Contrary to most existing methods, that produce an approximate solution which is piecewise constant on a fixed mesh, our approach exploits the structure of the solutions and consists in iteratively constructing a linear combination of indicator functions of simple polygons.

23
Antonin Monteil, Paul Pegon

We consider first order local minimization problems

24
Boris Muzellec, Adrien Vacher, Francis Bach, François-Xavier Vialard, Alessandro Rudi

It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds. However, rather than the distance itself, the object of interest for applications such as generative modeling is the underlying optimal transport map. Hence, computational and statistical guarantees need to be obtained for the estimated maps themselves. In this paper, we propose the first tractable algorithm for which the statistical

25
Aude Sportisse, Christophe Biernacki, Claire Boyer, Julie Josse, Matthieu Marbac Lourdelle, Gilles Celeux, Fabien Laporte

In recent decades, technological advances have made it possible to collect large data sets. In this context, the model-based clustering is a very popular, flexible and interpretable methodology for data exploration in a well-defined statistical framework. One of the ironies of the increase of large datasets is that missing values are more frequent. However, traditional ways (as discarding observations with missing values or imputation methods) are not designed for the clustering purpose. In addition, they rarely apply to the general case, though frequent in practice, of Missing Not At Random (MNAR) values, i.e. when the missingness depends on the unobserved data values and possibly on the observed data values. The goal of this paper is to propose a novel approach by embedding MNAR data directly within model-based clustering algorithms. We introduce a selection model for the joint distribution of data and missing-data indicator. It corresponds to a mixture model for the data distribution and a general MNAR model for the missing-data mechanism, which may depend on the underlying classes (unknown) and/or the values of the missing variables themselves. A large set of meaningful MNAR sub-models is derived and the identifiability of the parameters is studied for each of the sub-models, which is usually a key issue for any MNAR proposals. The EM and Stochastic EM algorithms are considered for estimation. Finally, we perform empirical evaluations for the proposed submodels on synthetic data and we illustrate the relevance of our method on a medical register, the TraumaBase ® dataset.

26
Robert Tovey, Vincent Duval

In this work we consider algorithms for reconstructing time-varying data into a finite sum of discrete trajectories, alternatively, an off-the-grid sparse-spikes decomposition which is continuous in time. Recent work showed that this decomposition was possible by minimising a convex variational model which combined a quadratic data fidelity with dynamical Optimal Transport. We generalise this framework and propose new numerical methods which leverage efficient classical algorithms for computing shortest paths on directed acyclic graphs. Our theoretical analysis confirms that these methods converge to globally optimal reconstructions which represent a finite number of discrete trajectories. Numerically, we show new examples for unbalanced Optimal Transport penalties, and for balanced examples we are 100 times faster in comparison to the previously known method.

27
Adrien Vacher, François-Xavier Vialard

Over the past few years, numerous computational models have been developed to solve Optimal Transport (OT) in a stochastic setting, where distributions are represented by samples. In such situations, the goal is to find a transport map that has good generalization properties on unseen data, ideally the closest map to the ground truth, unknown in practical settings. However, in the absence of ground truth, no quantitative criterion has been put forward to measure its generalization performance although it is crucial for model selection. We propose to leverage the Brenier formulation of OT to perform this task. Theoretically, we show that this formulation guarantees that, up to a distortion parameter that depends on the smoothness/strong convexity and a statistical deviation term, the selected map achieves the lowest quadratic error to the ground truth. This criterion, estimated via convex optimization, enables parameter and model selection among entropic regularization of OT, input convex neural networks and smooth and strongly convex nearest-Brenier (SSNB) models. Last, we make an experiment questioning the use of OT in Domain-Adaptation. Thanks to the criterion, we can identify the potential that is closest to the true OT map between the source and the target and we observe that this selected potential is not the one that performs best for the downstream transfer classification task.

28
Irène Waldspurger

Low-rank matrix recovery problems are inverse problems which naturally arise in various fields like signal processing, imaging and machine learning. They are non-convex and NP-hard in full generality. It is therefore a delicate problem to design efficient recovery algorithms and to provide rigorous theoretical insights on the behavior of these algorithms. The goal of these notes is to review recent progress in this direction for the class of so-called "non-convex algorithms", with a particular focus on the proof techniques. Although they aim at presenting very recent research works, these notes have been written with the intent to be, as much as possible, accessible to non-specialists. These notes were written for an eight-hour lecture at Collège de France. The original version, in French, is available online 1 and the videos of the lecture can be found on the Collège de France website.

G. Carlier is in the Editorial Board of Journal de l'Ecole Polytechnique, Applied Math and Opt., Journal of Mathematical Analysis and Applications, Mathematics and financial economics and Journal of dynamics and games. I. Waldspurger is associate editor for the IEEE Transactions on Signal Processing

V. Duval is a member of the Comité de Suivi Doctoral (CSD) and the Comité des Emplois Scientifiques (CES) of the Inria Paris research center.

I. Waldspurger gave three talks for high school or undergraduate students.