EN FR
EN FR


Section: New Results

Optimal control and zero-sum games

Fixed points of order preserving homogeneous maps and zero-sum games

Participants : Marianne Akian, Stéphane Gaubert.

The PhD work of Antoine Hochart  [88] was dealing with the applications of methods of non-linear fixed point theory to zero-sum games.

A highlight of his PhD is the characterization of the property of ergodicity for zero-sum games. In the special “zero-player” case, i.e., for a Markov chain equipped with an additive functional (payment) of the trajectory, the ergodicity condition entails that the mean payoff is independent of the initial state, for any choice of the payment. In the case of finite Markov chains, ergodicity admits several characterizations, including a combinatorial one (the uniqueness of the final class). This carries over to the two player case: ergodicity is now characterized by the absence of certain pairs of conjugate invariant sets (dominions), and it can be checked using directed hypergraphs algorithms. This leads to an explicit combinatorial sufficient condition for the solvability of the “ergodic equation”, which is the main tool in the numerical approach of the mean payoff problem. These results appeared in  [52] for the case of bounded paiements. A more general approach was developed in  [87], in which zero-sum games are now studied abstractly in terms of accretive operators. This allows one to show that the bias vector (the solution of the ergodic equation) is unique for a generic perturbation of the payments. A more recent work include the introduction of an abstract game allowing us to deal with general monotone additively homogeneous operators and thus to unbounded paiements.

Another series of results of the thesis concern the finite action space, showing that the set of payments for which the bias vector is not unique coincides with the union of lower dimensional cells of a polyhedral complex, which an application to perturbation schemes in policy iteration [12].

A last result of the thesis is a representation theorem for “payment free” Shapley operators, showing that these are characterized by monotonicity and homogeneity axioms [13]. This extends to the two-player case known representation theorems for risk measures.

The operator approach to entropy games

Participants : Marianne Akian, Stéphane Gaubert.

Entropy games were recently introduced by Asarin et al. A player (Despot) wishes to minimize a measure of “freedom” given by a topological entropy, whereas the other player (Tribune) wishes to maximize it. In [25], we developed an operator approach for entropy games. We showed that they reduce to risk sensitive type game problems, and deduced that entropy games in Despot has a few positions with non-trivial actions can be solved in polynomial time.

Probabilistic and max-plus approximation of Hamilton-Jacobi-Bellman equations

Participants : Marianne Akian, Eric Fodjo.

The PhD thesis of Eric Fodjo concerns stochastic control problems obtained in particular in the modelisation of portfolio selection with transaction costs. The dynamic programming method leads to a Hamilton-Jacobi-Bellman partial differential equation, on a space with a dimension at least equal to the number of risky assets. The curse of dimensionality does not allow one to solve numerically these equations for a large dimension (greater to 5). We propose to tackle these problems with numerical methods combining policy iterations, probabilistic discretisations, max-plus discretisations, in order to increase the possible dimension.

We consider fully nonlinear Hamilton-Jacobi-Bellman equations associated to diffusion control problems with finite horizon involving a finite set-valued (or switching) control and possibly a continuum-valued control. In  [47], we constructed a lower complexity probabilistic numerical algorithm by combining the idempotent expansion properties obtained by McEneaney, Kaise and Han  [91], [97] for solving such problems with a numerical probabilistic method such as the one proposed by Fahim, Touzi and Warin  [74] for solving some fully nonlinear parabolic partial differential equations, when the volatility does not oscillate too much. In [38], [39] (also presented in [24]), we improve the method of Fahim, Touzi and Warin by introducing probabilistic schemes which are monotone without any restrictive condition, allowing one to solve fully nonlinear parabolic partial differential equations with general volatilities. We study the convergence and obtain error estimates when the parameters and the value function are bounded. We are now studying the more general quadratic growth case.

Tropical-SDDP algorithms for stochastic control problems involving a switching control

Participants : Marianne Akian, Duy Nghi, Benoît Tran.

The PhD thesis of Benoît Tran, supervised by Jean-Philippe Chancelier (ENPC) and Marianne Akian concerns the numerical solution of the dynamic programming equation of discrete time stochastic control problems.

Several methods have been proposed in the litterature to bypass the curse of dimensionality difficulty of such an equation, by assuming a certain structure of the problem. Examples are the max-plus based method of McEneaney  [98], [99], the stochastic dual dynamic programming (SDDP) algorithm of Pereira and Pinto  [104], the mixed integer dynamic approximation scheme of Philpott, Faisal and Bonnans  [60], the probabilistic numerical method of Fahim, Touzi and Warin  [74]. We propose to associate and compare these methods in order to solve more general structures, in particular problems involving a finite set-valued (or switching) control and a continuum-valued control, with the property that the value function associated to a fixed switching strategy is convex.