Section: New Results

Learning Algorithms for Autonomous Robots: Concepts and Algorithms

The SAGG-RIAC algorithm: competence based active learning of motor skills

Participants : Adrien Baranès, Pierre-Yves Oudeyer.

We have continued to develop and experiment the Self-Adaptive Goal Generation - Robust Intelligent Adaptive Curiosity (SAGG-RIAC) algorithm as an intrinsically motivated goal exploration mechanism which allows high-dimensional redundant robots with various body schemas to efficiently and actively learn motor skills in their task space. The main idea is to push the robot to perform active babbling in the low- dimensional goal/task space, as opposed to motor babbling in the high-dimensional actuator space (possibly defined with motor primitives), by self-generating goals actively and adaptively in regions of the task space which provide a maximal competence improvement for reaching those goal states. Then, a lower level active motor learning algorithm is used to drive the robot to locally explore how to reach a given self-generated goal. We have conducted systematic experiments with high-dimensional continuous sensorimotor spaces related to different robotic setups such as a highly-redundant robotic arm, a quadruped, and an arm controlling a fishing rod with a flexible wire and show that 1) exploration in the task space can be a lot faster than exploration in the actuator space for learning inverse models in redundant robots; 2) selecting goals based on the maximal improvement heuristics creates developmental trajectories driving the robot to progressively focus on areas of increasing complexity and is statistically significantly more efficient than selecting goals randomly, as well as more efficient than different standard active motor babbling methods. These results were published in [13] , [15] , [17] and a journal publication is in preparation.

SGIM-D: Bootstrapping Intrinsically Motivated Learning with Human Demonstration

Participants : Mai Nguyen, Pierre-Yves Oudeyer.

We have studied the coupling of internally guided learning and social interaction, and more specifically the improvement owing to demonstrations, of the learning by intrinsic motivation. We have designed Socially Guided Intrinsic Motivation by Demonstration (SGIM- D), an algorithm for learning the mapping between high dimensions in continuous, non-preset, highly redundant environments. We have shown through a robot learning experiment involving a high-dimensional sensorimotor space related to fishing skills that SGIM-D efficiently combines the advantages of social learning and intrinsic motivation to gain a wide repertoire while being specialised in specific subspaces. An article presenting aspects of this work was awarded the second best student paper award in IEEE ICDL/Epirob 2011 [27] .

Maturationally-Constrained Competence-Based Intrinsically Motivated Learning

Participants : Adrien Baranès, Pierre-Yves Oudeyer.

We have continued to develop computational models of the coupling of intrinsic motivations and physiological maturational constraints, showing that both mechanisms may have complex bidirectional interactions allowing the active control of the growth of complexity in motor development which directs an efficient learning and exploration process. The coupling relies on the Self- Adaptive Goal Generation - Robust Intelligent Adaptive Curiosity algorithm (SAGG-RIAC) that instantiates an intrinsically motivated goal exploration mechanism for motor learning of inverse models. Then, we have introduced a functional model of maturational constraints inspired by the myelination process in humans, and showed how it can be coupled with the SAGG-RIAC algorithm, forming a new system called McSAGG-RIAC2. We have then conducted systematic experiments to evaluate qualitative and, more importantly, quantitative properties of these systems when applied to the learning of the forward and inverse kinematic of an unknwn robotic arm of up to 60 dimensions, the learning of walking in a 12DOF quadruped controlled with 24 dimensions motor synergies, and learning the control of a fishing rod involving a flewible/rope component. These results were published in [13] , [15] , [17] and a journal publication is in preparation.

Actor-Critic for Parallel Learning

Participants : Thomas Degris, Matha White, Richard Sutton.

Parallel learning is necessary for a robot to learn multiple tasks in parallel while executing a behavior in the environment not necessarily directly related to the tasks to learn. In previous existing work, an interesting class of learning algorithms for control are actor–critic. First, these algorithms can be used with high-dimensional action space. Second, they also sometimes provide computational models for biological decision-making systems. At FLOWERS, we work on new actor–critic algorithms suitable for parallel learning, with theoretical guaranties, applicable and practical to use with robots, and formulated in the general framework of reinforcement learning.

Curiosity for Parallel Learning of Predictions and Tasks from the Continuous Interaction of a Robot with its Environment

Participants : Thomas Degris, Adam White, Pierre-Yves Oudeyer.

On one hand, a robot needs a wide variety of knowledge to fully interact with its environment. On the other hand, a robot, like humans or animals, can only perform one behavior at a time in the real world to learn this vast amount of knowledge. A solution to scale up learning while keeping the interaction time with the real world realistic is to learn multiple elements of knowledge simultaneously in parallel. The Horde architecture proposes a set of demons each learning about new policies (i.e. skills) and predictions about these skills (i.e. partial models) off-policy simultaneously. The number of demons learning in parallel is limited only by memory and processing power, and not by the fact that there is only one sensorimotor interaction with the environment to learn from. At FLOWERS, we investigate the question of what the behavior policy of the robot should be to speed-up learning of the demons. Our goal is to test if the Horde scales-up to complex humanoid robots and if, driven by intrinsic motivations, it can autonomously learn building blocks of knowledge for future, more complex, behaviors.

Optimal Teaching on Sequential Decision Tasks

Participants : Manuel Lopes, Maya Cakmak.

A helpful teacher can significantly improve the learning rate of an autonomous learning agent. Teaching algorithms have been formally studied within the field of Algorithmic Teaching. These give important insights into how a teacher can select the most informative examples while teaching a new concept. However the field has so far focused purely on classification tasks. In this paper we introduce a novel method for optimally teaching sequential decision tasks. We present an algorithm that automatically selects the set of most informative demonstrations and evaluate it on several navigation tasks. Next, we present a set of human subject studies that investigate the optimality of human teaching in these tasks. We evaluate examples naturally chosen by human teachers and found that humans are generally sub-optimal. Then based on our proposed optimal teaching algorithm we try to elicit better teaching from humans. We do this by explaining the intuition of the teaching algorithm in an informal language prior to the teaching task. We found that this improves the examples elicited from human teachers on all considered tasks. This shows that a simple modification the instructions given to human teachers, has the potential of greatly improving the performance of the agent trained by the human [41] .

Inverse Coordinated Reinforcement Learning

Participants : Manuel Lopes, Jonathan Sprauel.

Inverse Coordinated Reinforcement Learning

We extended of inverse reinforcement learning to the multi-agent case. Under this formalism a team of agents can learn a task goal, encoded as a reward function, by observing another team executing that task. Our agents behave using local information and limited communication following the coordinated reinforcement learning framework. We show that a team behavior can be learned using this formalism and how well this mechanism can deal with changing initial conditions and number of agents [68] .