Section: New Results

Optimal Decision Making under Uncertainty

Participants : Olivier Teytaud [correspondent] , Jean-Joseph Christophe, Adrien Couëtoux, Jérémie Decock, Nicolas Galichet, Manuel Loth, Marc Schoenauer, Michèle Sebag.

The UCT-SIG works on sequential optimization problems, where a decision has to be made at each time step along a finite time horizon, and the underlying problem involves uncertainties along an either adversarial or stochastic setting.

The most prominent application domain is now energy management, at various time scales, and more generally, planning in uncertain environments. The main advances done this year include:

  • A work on metagaming/investment [12] , where a macroscopic decision has to be made (e.g., investment decisions, which plants should be built) prior to operational decisions (e.g., unit commitment policy, i.e., the operational management of the system). This is a key part of our activity for 2014.

  • Bandit problems with risk [36] . Bandit problems are quite related to metagaming problems (they correspond to the unstructured case).

  • A theoretical work on the consistency of Monte Carlo Tree Search / Upper Confidence Tree in continuous domains [27] . A non-trivial extension was necessary for proving such a consistency.

  • Noisy optimization is a key part of our work [61] , as it is crucial for direct policy search or more generally for dynamic optimization:

    • We have proven lower bounds under “locality assumptions” [33] , which are usually informally assumed by some practicionners for justifying the use of evolutionary algorithms.

    • In cases with strong noise (variance not decreasing to zero around the optimum) we proved log-log convergence for simple rules for choosing the number of resamplings [23] .

  • Several submissions are joint works with Ailab, National Dong Hwa University, Hualien, Taiwan. The drafts can be found at http://www.lri.fr/~teytaud/indema.html .

  • In collaboration with Christian Shulte (KTH, Stockholm), one of the main contributors to the well-known general-purpose CP solver GECODE (http://www.gecode.org/ ), and within the Microsoft-Inria joint lab Adapt project, ideas from UCT have been integrated in GECODE for the choice of the variable values during the exploration of the constraint tree. The most critical issue lied in the definition of a meaningful reward for a given node (variable = value) that could cope with the multiple restarts of the search: the deeper the failure, the larger the reward (and hence this work also pertains to the CRI-SIG(Section 6.4 ). Initial results have been obtained with job-shop scheduling problems [47] and more extensive results have been obtained on 3 benchmarks of the CP community [46] .