EN FR
EN FR


Section: New Results

Optimal control and zero-sum games

Fixed points of order preserving homogeneous maps and zero-sum games

Participants : Marianne Akian, Stéphane Gaubert.

In a series of joint works with Antoine Hochart, we apply methods of non-linear fixed point theory to zero-sum games.

A key issue is the solvability of the ergodic equation associated to a zero-sum game with finite state space, i.e., given a dynamic programming operator T associated to an undiscounted problem, one looks for a vector u, called the bias, and for a scalar λ, the ergodic constant, such that T(u)=λe+u. The bias vector is of interest as it allows to determine optimal stationnary strategies.

In [14], we studied zero-sum games with perfect information and finite action spaces, and showed that the set of payments for which the bias vector is not unique (up to an additive constant) coincides with the union of lower dimensional cells of a polyhedral complex, in particular, the bias vector is unique, generically. We provided an application to perturbation schemes in policy iteration.

In [36], we apply game theory methods to the study of the nonlinear eigenproblem for homogeneous order preserving self maps of the interior of the cone. We show that the existence and uniqueness of an eigenvector is governed by combinatorial conditions, involving dominions (sets of states “controlled” by one of the two players). In this way, we characterize the situation in which the existence of an eigenvector holds independently of perturbations, and we solve an open problem raised in  [91].

In [15], we provide a representation theorem for “payment free” Shapley operators, showing that these are characterized by monotonicity and homogeneity axioms [15]. This extends to the two-player case known representation theorems for risk measures.

Nonlinear fixed point methods to compute joint spectral raddi of nonnegative matrices

Participants : Stéphane Gaubert, Nikolas Stott.

In [29], we introduce a non-linear fixed point method to approximate the joint spectral radius of a finite set of nonnegative matrices. We show in particular that the joint spectral radius is the limit of the eigenvalues of a family of non-linear risk-sensitive type dynamic programming operators. We develop a projective version of Krasnoselskii-Mann iteration to solve these eigenproblems, and report experimental results on large scale instances (several matrices in dimensions of order 1000 within a minute). The situation in which the matrices are not nonnegative is amenable to a similar approach  [94].

Probabilistic and max-plus approximation of Hamilton-Jacobi-Bellman equations

Participants : Marianne Akian, Eric Fodjo.

The PhD thesis of Eric Fodjo concerns stochastic control problems obtained in particular in the modelisation of portfolio selection with transaction costs. The dynamic programming method leads to a Hamilton-Jacobi-Bellman partial differential equation, on a space with a dimension at least equal to the number of risky assets. The curse of dimensionality does not allow one to solve numerically these equations for a large dimension (greater to 5). We propose to tackle these problems with numerical methods combining policy iterations, probabilistic discretisations, max-plus discretisations, in order to increase the possible dimension.

We consider fully nonlinear Hamilton-Jacobi-Bellman equations associated to diffusion control problems with finite horizon involving a finite set-valued (or switching) control and possibly a continuum-valued control. In  [46], we constructed a lower complexity probabilistic numerical algorithm by combining the idempotent expansion properties obtained by McEneaney, Kaise and Han  [103], [109] for solving such problems with a numerical probabilistic method such as the one proposed by Fahim, Touzi and Warin  [82] for solving some fully nonlinear parabolic partial differential equations, when the volatility does not oscillate too much. In [32], [33], we improve the method of Fahim, Touzi and Warin by introducing probabilistic schemes which are monotone without any restrictive condition, allowing one to solve fully nonlinear parabolic partial differential equations with general volatilities. We study the convergence and obtain error estimates when the parameters and the value function are bounded. The more general quadratic growth case has been studied in the PhD manuscript [12].

Tropical-SDDP algorithms for stochastic control problems involving a switching control

Participants : Marianne Akian, Duy Nghi Benoît Tran.

The PhD thesis of Benoît Tran, supervised by Jean-Philippe Chancelier (ENPC) and Marianne Akian concerns the numerical solution of the dynamic programming equation of discrete time stochastic control problems.

Several methods have been proposed in the litterature to bypass the curse of dimensionality difficulty of such an equation, by assuming a certain structure of the problem. Examples are the max-plus based method of McEneaney  [110], [111], the stochastic max-plus scheme proposed by Zheng Qu  [118], the stochastic dual dynamic programming (SDDP) algorithm of Pereira and Pinto  [116], the mixed integer dynamic approximation scheme of Philpott, Faisal and Bonnans  [61], the probabilistic numerical method of Fahim, Touzi and Warin  [82]. We propose to associate and compare these methods in order to solve more general structures.

In a first work [35], we build a common framework for both the SDDP and a discrete time and finite horizon version of Zheng Qu's algorithm for deterministic problems involving a finite set-valued (or switching) control and a continuum-valued control. We propose an algorithm that generates monotone approximations of the value function as a pointwise supremum, or infimum, of basic (affine or quadratic for example) functions which are randomly selected. We give sufficient conditions that ensure almost sure convergence of the approximations to the value function.

Parametrized complexity of optimal control and zero-sum game problems

Participants : Marianne Akian, Stéphane Gaubert, Omar Saadi.

As already said above, the dynamic programing approach to optimal control and zero-sum game problems suffers of the curse of dimensionality. The aim of the PhD thesis is to unify different techniques to bypass this difficulty, in order to obtain new algorithms and new complexity results.

As a first step, we worked to extend an algorithm proposed by Sidford et al. in  [126]. There, they proposed a randomized value iteration algorithm which improves the usual complexity bounds of the value iteration for discounted Markov Decision Problems (discrete time stochastic control problems). In a joint work with Zheng Qu (Hong Kong University), we are extending this algorithm to the ergodic (mean payoff) case, exploiting techniques from non-linear spectral theory  [48]; this extension covers as well the case of two players (zero-sum).