TAO - 2013 - Annual activity report

TAO

TAO - 2013

Project-Team Tao

Members

Overall Objectives

Research Program

The Four Pillars of TAO

Application Domains

Software and Platforms

New Results

Bilateral Contracts and Grants with Industry

Bilateral Contracts with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: Overall Objectives

Highlights of the Year

Extensions of Multi-Armed Bandits and Monte-Carlo Tree Search

Risk Avoidance Exploration might exert a toll on the agent/system safety in real-world contexts (e.g., controlling a power system or a robot). Risk adverse criteria have been pioneered in MAB, together with multi-objective reinforcement learning – see [12] and [19] .

Continuous Options The Rapid Action Value Estimate (RAVE) has been extended to continuous settings [27] .

Information Theory and Natural Gradient

Information-geometric Optimization: convergence results. Theoretical guarantees have been obtained for continuous optimization algorithms in the framework of information geometry (IGO). Previous improvement guarantees for gradient descent-based methods were valid only for infinitesimally small step sizes. Information geometry and using the natural gradient provide improvement guarantees for finite step sizes as is the case in practice [22] . Along the same lines, geodesics in statistical manifolds have been used for estimation of distribution optimization algorithms.

Neural Network Training is a hard optimization problem, sensitive to the problem representation and the optimization trajectory. Within a Riemannian geometry framework, the use of intrinsic Riemannian gradient has been shown to support an affine transformation-invariant optimization approach, with significant robustness improvements at the same cost as the state of the art [66] . This Riemannian approach has been applied to recurrent neural nets, with very satisfactory results on difficult symbolic sequences with non-local dependencies [65] . In the related field of stacked restricted Boltzman machines, we have shown that the layer-wise approach supporting the celebrated deep learning approach yields globally optimal results provided the inference (as opposed to generative) model is rich enough, with quantitative estimates [60] . This result is the first of its kind on layerwise deep learning.

Previous |

Home | Next next