FR

EN

Homepage Inria website
  • Inria login
  • The Inria's Research Teams produce an annual Activity Report presenting their activities and their results of the year. These reports include the team members, the scientific program, the software developed by the team and the new results of the year. The report also describes the grants, contracts and the activities of dissemination and teaching. Finally, the report gives the list of publications of the year.

  • Legal notice
  • Cookie management
  • Personal data
  • Cookies



Section: New Results

Ordinary Differential Equation Methods for Markov Decision Processes and Application to Kullback–Leibler Control Cost

A new approach to computation of optimal policies for MDP (Markov decision process) models is introduced in [5], published in SICON this year. The main idea is to solve not one, but an entire family of MDPs, parameterized by a scalar ζ that appears in the one-step reward function. For an MDP with d states, the family of relative value functions {hζ*:ζ} is the solution to an ODE, ddζhζ*=𝒱(hζ*), where the vector field 𝒱:RdRd has a simple form, based on a matrix inverse. Two general applications are presented: Brockett's quadratic-cost MDP model, and a generalization of the “linearly solvable” MDP framework of Todorov in which the one-step reward function is defined by Kullback–Leibler divergence with respect to nominal dynamics. This technique was introduced by Todorov in 2007, where it was shown under general conditions that the solution to the average-reward optimality equations reduce to a simple eigenvector problem. Since then many authors have sought to apply this technique to control problems and models of bounded rationality in economics. A crucial assumption is that the input process is essentially unconstrained. For example, if the nominal dynamics include randomness from nature (eg, the impact of wind on a moving vehicle), then the optimal control solution does not respect the exogenous nature of this disturbance. In [16] we introduce a technique to solve a more general class of action-constrained MDPs.