FLOWERS - 2012 - Annual activity report

FLOWERS

FLOWERS - 2012

Project-Team Flowers

Members

Overall Objectives

Scientific Foundations

Scientific Foundations

Application Domains

Application Domains

Software

New Results

Bilateral Contracts and Grants with Industry

Bilateral Grants with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: Overall Objectives

Highlights of the Year

Ergo-Robots: Large-scale life-long learning robot experiment

The FLOWERS team, in collaboration with University Bordeaux I/Labri, has participated as a central actor of the exhibition “Mathematics: A Beautiful Elsewhere” at Fondation Cartier pour l'Art Contemporain in Paris. This installation, called “Ergo-Robots/FLOWERS Fields” was made in collaboration with artist David Lynch and mathematician Mikhail Gromov (IHES, France), and shows computational models of curiosity-driven learning, human-robot interaction as well as self-organization of linguistic conventions. This exhibition, at the crossroads of science and art, allowed to disseminate our work towards the general public, explaining concepts related to learning mechanims in humans and robots to a large audience (80000 visitors). This was also an opportunity for experimenting and improving our technologies for life-long robot learning experimentation. For one of the first times in the world outside the laboratory, we demonstrated how it is possible to achieve experimentation with learning robots quasi-continuously for 5 months. This opens novel stimulating scientific perspectives in the field of developmental robotics. This experimentation was presented through large audience radios, magazines and newspapers (France Inter, France Culture, RFI, Sciences et Avenir, Tangente, Financial Times, Daily Telegraph, Libération, ...).

More information available at: http://flowers.inria.fr/ergo-robots.php and http://fondation.cartier.com/ .

MACSi: Integrated system for curiosity-driven visual object discovery on ICub robot

In the frame of the MACSi ANR project conducted together with ISIR (UPMC - Paris) a complete cognitive architecture for humanoid robots interacting with objects and caregivers in a developmental robotics scenario has been integrated on the iCub robot [43] . The architecture is foundational to the MACSi project and to several research axis of FLOWERS: it is designed to support experiments to make a humanoid robot gradually enlarge its repertoire of known objects and skills combining autonomous learning, social guidance and intrinsic motivation. This complex learning process requires the capability to learn affordances, i.e. the capacity for the robot to predict which actions are possible on scene elements. Several papers presenting the general framework for achieving these goals, focusing on the elementary action, perception and interaction modules have been published. This architecture is an important milestone of the project, enabling future experiments on object learning and recognition, object categorization and interaction between autonomous exploration and social guidance.

Algorithmic architecture for learning inverse models in high-dimensional robots

Through the design of the SAGG-RIAC algorithmic architecture, and its publication in a major robotics journal [22] , we have produced a highly-efficient system for intrinsically motivated goal exploration mechanism which allows active learning of inverse models in high-dimensional redundant robots. Based on active goal babbling, this allows a robot to efficiently and actively learn distributions of parameterized motor skills/policies that solve a corresponding distribution of parameterized tasks/goals. We have conducted experiments with high-dimensional continuous sensorimotor spaces in three different robotic setups: 1) learning the inverse kinematics in a highly-redundant robotic arm, 2) learning omnidirectional locomotion with motor primitives in a quadruped robot, 3) an arm learning to control a fishing rod with a flexible wire. We showed that 1) exploration in the task space can be a lot faster than exploration in the actuator space for learning inverse models in redundant robots; 2) selecting goals maximizing competence progress creates developmental trajectories driving the robot to progressively focus on tasks of increasing complexity and is statistically significantly more efficient than selecting tasks randomly, as well as more efficient than different standard active motor babbling methods; 3) this architecture allows the robot to actively discover which parts of its task space it can learn to reach and which part it cannot.

Formalization of several links between intrinsic motivation architectures and statistical machine learning

We incorporated several key concepts of intrinsically motivated developmental learning, especially measures of learning progress for curiosity-driven exploration, in several standard machine learning formalisms. First, we introduced and formalized a general class of learning problems for which a developmental learning strategy is optimal [47] . This class of learning problems characterizes problems where the issue of life-long multitask learning under bouded ressources is crucial. Within this formalization, we related the SAGG-RIAC architecture [22] with multi-armed bandits formalisms [47] allowing to study the properties of problems where there several discrete choices to make to accelerate learning. Third, we also included empirical measures of learning progress in standard reinforcement learning problem allowing to automatically choose the best exploration strategy [42] and to extend Rmax approaches, for exploration in model-based RL, to non-stationary problems [46] .

Bridging black-box optimization and RL for skill learning in robots

In this year, we have made substantial advances in understanding of the relationship between black-box optimization and reinforcement learning for direct policy search, and the application of such methods to robotics manipulation, as well as their use for modelling human behavior. The key discovery has been that black-box optimization and reinforcement learning have converged towards a same set of algorithmic properties, such as parameter perturbation and reward-weighted averaging, allowing for a direct comparison and integration of such algorithms (see “Relationship between Black-Box Optimization and Reinforcement Learning” below). On the one hand, this has enabled us to exploit principles from black-box optimization, such as covariance matric adaptation, in the context of reinforcement learning. The resulting algorithm ( $P I^{2}$ -CMAES) enables adaptive exploration and life-long learning in robots [63] , and in reaching experiments leads to proximo-distal maturation as an emergent property [60] (see “Emergent Proximo-Distal Maturation through Adaptive Exploration” below). On the other hand, it has allowed us to demonstrate that black-box optimization outperforms reinforcement learning for a particular class of policies [69] . This is an important result, as these types of policies are typically used for robotic skill learning. Therefore, more efficient and robust black-box optimization algorithms may be applied to learning with such policies, without compromising convergence speed and cost of the final solution.

Algorithm for learning sequences of motion primitives

As for applications, we have also extended policy improvement algorithms to work with sequences of motion primitives, enabling 11-DOF manipulation robots to learn how to grasp under uncertainty through fine manipulation, and perform extended pick-and-place tasks [31] (see “Reinforcement Learning with Sequences of Motion Primitives for Robust Manipulation” below). We have also shown that learning variable impedance control is able to mimic the behavior of humans when exposed to force fields (see “Model-free Reinforcement Learning of Impedance Control in Stochastic Environments” below) .

Algorithms for autonomous dimensionality reduction

In 2012, we have made significant progress in incremental online learning algorithms capable of finding latent variables in high-dimensional sensory spaces, by either using the principle of multimodal correspondence[24] or weak, self-generated supervision[40] . These advances will be key in further extending the applicability of key artificial curiosity algorithms for learning with high-dimensional sensori spaces.

The following paper obtained the Best Paper Award in the category “Computational Models of Cognitive Development” at the IEEE ICDL-Epirob international conference: [53]

Best Paper Award :

[53] Curiosity-driven phonetic learning in ICDL-Epirob - International Conference on Development and Learning, Epirob.

C. Moulin-Frier, P.-Y. Oudeyer.

Previous |

Home | Next next