Section: Overall Objectives

Highlights of the Year

Extensions of Multi-Armed Bandits and Monte-Carlo Tree Search

Risk Avoidance Exploration might exert a toll on the agent/system safety in real-world contexts (e.g., controlling a power system or a robot). Risk adverse criteria have been pioneered in MAB, together with multi-objective reinforcement learning – see [12] and [19] .

Continuous Options The Rapid Action Value Estimate (RAVE) has been extended to continuous settings [27] .

Information Theory and Natural Gradient

Information-geometric Optimization: convergence results. Theoretical guarantees have been obtained for continuous optimization algorithms in the framework of information geometry (IGO). Previous improvement guarantees for gradient descent-based methods were valid only for infinitesimally small step sizes. Information geometry and using the natural gradient provide improvement guarantees for finite step sizes as is the case in practice [22] . Along the same lines, geodesics in statistical manifolds have been used for estimation of distribution optimization algorithms.

Neural Network Training is a hard optimization problem, sensitive to the problem representation and the optimization trajectory. Within a Riemannian geometry framework, the use of intrinsic Riemannian gradient has been shown to support an affine transformation-invariant optimization approach, with significant robustness improvements at the same cost as the state of the art [66] . This Riemannian approach has been applied to recurrent neural nets, with very satisfactory results on difficult symbolic sequences with non-local dependencies [65] . In the related field of stacked restricted Boltzman machines, we have shown that the layer-wise approach supporting the celebrated deep learning approach yields globally optimal results provided the inference (as opposed to generative) model is rich enough, with quantitative estimates [60] . This result is the first of its kind on layerwise deep learning.