EN FR
EN FR


Section: New Results

Optimal Decision Making under Uncertainty

Participants : Olivier Teytaud [correspondent] , Jean-Joseph Christophe, Jérémie Decock, Nicolas Galichet, Marc Schoenauer, Michèle Sebag, Weijia Wang.

The UCT-SIG works on sequential optimization problems, where a decision has to be made at each time step along a finite time horizon, and the underlying problem involves uncertainties along an either adversarial or stochastic setting.

After several years of success in the domain of GO, the most prominent application domain here is now energy management, at various time scales, and more generally planning. Furthermore, the work in this SIG has also lead to advances in continuous optimization at large, that somehow overlap with the work in the OPT-SIG (see 6.3 ).

The main advances done this year include:

Bandit-based Algorithms

Active learning for the identification of biological dynamical systems has been tackled using Multi-Armed Bandit algorithms [35] . Weijia Wang's PhD [5] somehow summarizes the work done in TAO regarding Multi-objective Reinforcement Learning with MCTS algorithm. Differential Evolution was applied as an alternative to solve non-stationary Bandit problems [45] .

Continuous optimization: parallelism, real-world, high-dimension and cutting-plane methods

Our work in continuous optimization extends testbeds as follows: (i) including higher dimension (many testbeds in evolutionary algorithms consider dimension 40 or 100) (ii) taking into account computation time and not only the number of function evaluations (this makes a big difference in high dimension) (iii) including real world objective functions (iv) including parallelism, in particular, parallel convergence rates for differential evolution and particle swarm optimization [21] . We have a parallel version of cutting plane methods, which use more than black-box evaluations of the objective functions - we keep in mind that some of our black-box methods, on the other hand, also do not need convexity or the existence of a gradient.

Noisy optimization

We have been working on noisy optimization in discrete and continuous domains. In the discrete case, we have shown the impact of heavy tails, and we have shown that resampling can solve some published open problems in an anytime manner. In the continuous case, we have shown [16] that a classical evolutionary principle (namely the step-size proportional to the distance to the optimum) implies that the optimal rates can not be reached - more precisely, we can have simple regret at best O(1/numberoffitnessevaluations) in the simple case of an additive noise, whereas some published algorithms reached O(1/numberoffitnessevaluations). One of the most directly applicable of our works is bias correction when the objective function f(x) has the form f(x)=𝔼ωf(x,ω) and is approximated by f(x)=1Ni=1Nf(x,ωi) for a given finite sample ω1,,ωN. We have also worked on portfolios of noisy optimizers [20] , [34] .

Discrete-time control with constrained action spaces.

While Direct Policy Search is a reliable approach for discrete time control, it is not easily applicable in the case of a constrained high-dimensional action space. In the past, we have proposed DVS (Direct Value Search) for such cases [54] . The method is satisfactory, and we have additional mathematical results; in particular we prove positive results for non-Markovian, non-convex problems, and we prove a polynomial-time decision making and, simultaneously, exact asymptotic consistency for a non-linear transition [24] . Related work [60] also proposes to directly learn the value function, in a RL context, using some trajectories known to be bad.

Games.

While still lightly contributing to the game of GO with our taiwanese partners [8] , we obtained significant improvements in randomized artificial intelligence algorithms by decomposing the variance of the result into (i) the random seed (ii) the other random contributions such as the random seed of the opponent and/or the random part in the game. By optimizing our probability distribution on random seeds, we get significant improvements in e.g. phantom Go. This is basically a simple tool for learning opening books [44] .

Adversarial bandits.

High-dimensional adversarial bandits lead to two main drawbacks: (i) computation time (ii) highly mixed nature of the obtained solution. We developped methods which focus on sparse solution. Provably consistent, these methods are faster when the Nash equilibrium is sparse, and provides highly sparse solutions[17] .