Section: New Results
Reachability in MDPs
Markov decision process (MDP) provide the appropriate formalism for the control of fully observable probabilistic systems. There are three kinds of methods for their analysis: linear programming, policy iteration and value iteration. However for large scale systems, only value iteration is still available as it requires less memory than the other methods. For quantitative problems like optimal control for maximizing the discounted reward of an MDP, value iteration is equipped with a stopping criterion that ensures an error bound provided by the user. Value iteration algorithms have also been proposed for the central problem of reachability. However neither stopping criterion nor convergence rate were known for such algorithms. In [37] , we have solved these two problems and based on it we have also improved the bound on the number of iterations in order to adapt the value iteration for an exact computation.