Section: New Results

Robotic And Computational Models Of Human Development and Cognition

Computational Models Of Information-Seeking, Curiosity And Attention in Humans and Animals

Participants : Manuel Lopes, Pierre-Yves Oudeyer [correspondant] , Jacqueline Gottlieb, Celeste Kidd, Alvaro Ovalle, William Schueller, Sebastien Forestier, Nabil Daddaouda, Nicholas Foley.

This project involves a collaboration between the Flowers team, the Cognitive Neuroscience Lab of J. Gottlieb at Columbia Univ. (NY, US), and the developmental psychology lab of Celeste Kidd at Univ. Rochester, US, on the understanding and modeling of mechanisms of curiosity, attention and active intrinsically motivated exploration that until now have been little explored in neuroscience, machine learning and cognitive robotics.

It is organized around the study of the hypothesis that information gain (or control gain) could generate intrinsic reward in the brain (living or artificial), driving attention and exploration independently from material rewards, and allowing for autonomous lifelong acquisition of open repertoires of skills. The project combines expertise about attention and exploration in the brain and a strong methodological framework for conducting experimentations with monkeys, human adults (Gottlieb's lab) and children (Kidd's lab) together with computational modeling of curiosity/intrinsic motivation and learning in the Flowers team.

Such a collaboration paves the way towards a central objective, which is now a central strategic objective of the Flowers team: designing and conducting experiments in animals and humans informed by computational/mathematical theories of information seeking, and allowing to test the predictions of these computational theories.


Curiosity can be understood as a family of mechanisms that evolved to allow agents to maximize their knowledge (or their control) of the useful properties of the world - i.e., the regularities that exist in the world - using active, targeted investigations. In other words, we view curiosity as a decision process that maximizes learning/competence progress (rather than minimizing uncertainty) and assigns value ("interest") to competing tasks based on their epistemic qualities - i.e., their estimated potential allow discovery and learning about the structure of the world.

Because a curiosity-based system acts in conditions of extreme uncertainty (when the distributions of events may be entirely unknown) there is in general no optimal solution to the question of which exploratory action to take [29], [155], [162]. Therefore we hypothesize that, rather than using a single optimization process as it has been the case in most previous theoretical work [131], curiosity is comprised of a family of mechanisms that include simple heuristics related to novelty/surprise and measures of learning progress over longer time scales [153] [110], [149]. These different components are related to the subject's epistemic state (knowledge and beliefs) and may be integrated with fluctuating weights that vary according to the task context. We will quantitatively characterize this dynamic, multi-dimensional system in the framework of Bayesian Reinforcement Learning, as described below.

Because of its reliance on epistemic currencies, curiosity is also very likely to be sensitive to individual differences in personality and cognitive functions. Humans show well-documented individual differences in curiosity and exploratory drives [143], [161], and rats show individual variation in learning styles and novelty seeking behaviors [128], but the basis of these differences is not understood. We postulate that an important component of this variation is related to differences in working memory capacity and executive control which, by affecting the encoding and retention of information, will impact the individual's assessment of learning, novelty and surprise and ultimately, the value they place on these factors [159], [171], [106], [175]. To start understanding these relationships, about which nothing is known, we will search for correlations between curiosity and measures of working memory and executive control in the population of children we test in our tasks, analyzed from the point of view of a computational model based on Bayesian reinforcement learning.

A final premise guiding our research is that essential elements of curiosity are shared by humans and non-human primates. Human beings have a superior capacity for abstract reasoning and building causal models, which is a prerequisite for sophisticated forms of curiosity such as scientific research. However, if the task is adequately simplified, essential elements of curiosity are also found in monkeys [143], [141] and, with adequate characterization, this species can become a useful model system for understanding the neurophysiological mechanisms.


Our studies have several highly innovative aspects, both with respect to curiosity and to the traditional research field of each member team.

  • Linking curiosity with quantitative theories of learning and decision making: While existing investigations examined curiosity in qualitative, descriptive terms, here we propose a novel approach that integrates quantitative behavioral and neuronal measures with computationally defined theories of Bayesian Reinforcement Learning and decision making.

  • Linking curiosity in children and monkeys: While existing investigations examined curiosity in humans, here we propose a novel line of research that coordinates its study in humans and non-human primates. This will address key open questions about differences in curiosity between species, and allow access to its cellular mechanisms.

  • Neurophysiology of intrinsic motivation: Whereas virtually all the animal studies of learning and decision making focus on operant tasks (where behavior is shaped by experimenter-determined primary rewards) our studies are among the very first to examine behaviors that are intrinsically motivated by the animals' own learning, beliefs or expectations.

  • Neurophysiology of learning and attention: While multiple experiments have explored the single-neuron basis of visual attention in monkeys, all of these studies focused on vision and eye movement control. Our studies are the first to examine the links between attention and learning, which are recognized in psychophysical studies but have been neglected in physiological investigations.

  • Computer science: biological basis for artificial exploration: While computer science has proposed and tested many algorithms that can guide intrinsically motivated exploration, our studies are the first to test the biological plausibility of these algorithms.

  • Developmental psychology: linking curiosity with development: While it has long been appreciated that children learn selectively from some sources but not others, there has been no systematic investigation of the factors that engender curiosity, or how they depend on cognitive traits.

Current results

In particular, new results in 2015 include:

Intrinsically motivated oculomotor exploration guided by uncertainty reduction and conditioned reinforcement in non-human primates

Using a novel oculomotor paradigm, combined with reinforcement learning (RL) simulations, we show that monkeys are intrinsically motivated to search for and look at reward-predictive cues, and that their intrinsic motivation is shaped by a desire to reduce uncertainty, a desire to obtain conditioned reinforcement from positive cues, and individual variations in decision strategy and the cognitive costs of acquiring information. The results suggest that free-viewing oculomotor behavior reveals cognitive and emotional factors underlying the curiosity driven sampling of information. These results were published in [66].

Experiments in Active Categorization

An ongoing effort to characterize curiosity and exploration in an experimental setting consists in evaluating the manner in which diverse tasks or goals are selected. This would include monitoring what does a test subject decide to learn, in what order and how is it done. This has been referred to as strategic learning [31]. Accordingly, it is of particular interest for the project to observe the type of learning dynamics in relation to their learning progress [153]. This principle tries to establish links between the selection and ordering of tasks and the speed or the rate of improvement a subject may achieve. This implies that during free exploration the subject would focus on tasks that are considered of certain complexity and where it makes consistent progress. At the same time the subject would avoid: (1) trivial tasks that do not offer much learning due to their simplicity or (2) very complicated tasks where little or no progress is achieved.

We have been working on prototyping an experiment where the subject is presented with different stimuli classification tasks of varying difficulty. The goal for each of the tasks is to correctly predict and differentiate between different classes of stimuli. Two main aspects of the task are under the control of the subject: (1) the task that he/she wants to learn and (2) once selected a task, what elements to explore in order to subsequently being able to predict future stimuli. Essentially the subject autonomously organizes which tasks to focus on and in what order. Therefore one of the objectives of this investigation is to analyze if the learning dynamics are guided by the amount of progress the subject achieves in the tasks.

Computational Models Of Tool Use and Speech Development: the Roles of Active Learning, Curiosity and Self-Organization

Participants : Pierre-Yves Oudeyer [correspondant] , Clement Moulin-Frier, Sébastien Forestier, Linda Smith.

Modeling Cognitive Development and Tool Use in Infants

A scientific challenge in developmental and social robotics is to model how autonomous organisms can develop and learn open repertoires of skills in high-dimensional sensorimotor spaces, given limited resources of time and energy. This challenge is important both from the fundamental and application perspectives. First, recent work in robotic modeling of development has shown that it could make decisive contributions to improve our understanding of development in human children, within cognitive sciences [131]. Second, these models are key for enabling future robots to learn new skills through lifelong natural interaction with human users, for example in assistive robotics [157].

In recent years, two strands of work have shown significant advances in the scientific community. On the one hand, algorithmic models of active learning and imitation learning combined with adequately designed properties of robotic bodies have allowed robots to learn how to control an initially unknown high-dimensional body (for example locomotion with a soft material body [3]). On the other hand, other algorithmic models have shown how several social learning mechanisms could allow robots to acquire elements of speech and language [118], allowing them to interact with humans. Yet, these two strands of models have so far mostly remained disconnected, where models of sensorimotor learning were too “low-level” to reach capabilities for language, and models of language acquisition assumed strong language specific machinery limiting their flexibility. Preliminary work has been showing that strong connections are underlying mechanisms of hierarchical sensorimotor learning, artificial curiosity, and language acquisition [49].

Recent robotic modeling work in this direction has shown how mechanisms of active curiosity-driven learning could progressively self-organize developmental stages of increasing complexity in vocal skills sharing many properties with the vocal development of infants [37]. Interestingly, these mechanisms were shown to be exactly the same as those that can allow a robot to discover other parts of its body, and how to interact with external physical objects [152].

In such current models, the vocal agents do not associate sounds to meaning, and do not link vocal production to other forms of action. In other models of language acquisition, one assumes that vocal production is mastered, and hand code the meta-knowledge that sounds should be associated to referents or actions [118]. But understanding what kind of algorithmic mechanisms can explain the smooth transition between the learning of vocal sound production and their use as tools to affect the world is still largely an open question.

The goal of this work is to elaborate and study computational models of curiosity-driven learning that allow flexible learning of skill hierarchies, in particular for learning how to use tools and how to engage in social interaction, following those presented in [152],[3], [43], [37]. The aim is to make steps towards addressing the fundamental question of how speech communication is acquired through embodied interaction, and how it is linked to tool discovery and learning.

A first question that we study in this work is the type of mechanisms that could be used for hierarchical skill learning allowing to manage new task spaces and new action spaces, where the action and task spaces initially given to the robot are continuous and high-dimensional and can be encapsulated as primitive actions to affect newly learnt task spaces.

We presented firsts results on that question at the 38th Annual Meeting of the Cognitive Science Society, Philadelphia, Pennsylvania, USA, August 10-13th [80]. In this work, we presented the HACOB (Hierarchical Active Curiosity-driven mOdel Babbling) architecture of algorithms that actively chooses which sensorimotor model to train in a hierarchy of models representing the environmental structure. We studied this architecture using a simulated robotic arm interacting with objects in a 2D environment (See Fig. 8). Studies of child development of tool use precursors showed successive but overlapping phases of qualitatively different types of behaviours [167]. We hypothesized that two mechanisms in particular play a role in the structuring of these phases: the intrinsic motivation to explore and the representation used to encode sensorimotor experience.

Figure 8. Left: simulated robotic environment with a 4 DOF robotic arm, 2 tools and a toy. Right: Observed behaviours of an agent: it first explores its arm to move its hand, then also explore to move the stick and the toy.
IMG/Forestier_cogsci_setup.png IMG/Forestier_cogsci_ow.png

We showed that using a hierarchical structure of sensorimotor models and active model babbling as an intrinsic motivation to explore sensorimotor models that have a high learning progress, then overlapping phases of behaviours are autonomously emerging in the developmental trajectories of agents. To our knowledge, this is the first model of curiosity-driven development of simple tool use and of the self-organization of overlapping phases of behaviours. In particular, our model explains why and how intrinsically motivated exploration of non-optimal methods to solve certain sensorimotor problems can be useful to discover how to solve other sensorimotor problems, in accordance with Siegler's overlapping waves theory, by scaffolding the learning of increasingly complex affordances in the environment.

In computational models of strategy selection for the problem of integer addition, Shrager and Siegler proposed a mechanism that maintains the concurrent exploration of alternative strategies with use frequencies that are proportional to their performance for solving a particular problem. This mechanism was also used by Chen and Siegler to interpret an experiment with 1.5- and 2.5-year-olds that had to retrieve an out-of-reach toy, and where they could use one of several available strategies that included leaning forward to grasp a toy with the hand or using a tool to retrieve the toy.

In a paper that we presented at the The Sixth Joint IEEE International Conference on Developmental Learning and Epigenetic Robotics, Cergy-Pontoise, France, September 19-22nd [82], we studied tool use discovery and considered other mechanisms of strategy selection and evaluation. In particular, we presented models of curiosity-driven exploration where strategies are explored according to the learning progress/information gain they provide (as opposed to their current efficiency to actually solve the problem). In these models, we defined a curiosity-driven agent learning a hierarchy of different sensorimotor models in a simple 2D setup with a robotic arm, a stick and a toy. In a first phase, the agent learns from scratch how to use its robotic arm to control the tool and to catch the toy, and in a second phase with the same learning mechanisms, the agent has to solve three problems where the toy can only be reached with the tool (See Fig. 9). We showed that agents choosing strategies based on learning progress also display overlapping waves of behavior compatible with the one observed in infants, and we suggested that curiosity-driven exploration could be at play in Chen and Siegler's experiment, and more generally in tool use discovery.

Figure 9. Left: Chen and Siegler's experimental setup with 1.5 and 2.5 years old babies who have to pick the good toy to retrieve an interesting toy. Right: Simulated robotic setup with a 3 DOF robotic arm that has 2 strategies to retrieve a toy: either grasp it with the hand, or use the stick to pull the toy.
IMG/Forestier_icdl_chen.png IMG/Forestier_icdl_setup.png
Curiosity-driven developmental processes and their role in development and evolution of language

Infants’ own activities create and actively select their learning experiences. In a collaboration with Linda Smith, we have analyzed recent models of embodied information seeking and curiosity-driven learning and have showed that these mechanisms have deep implications for development and evolution. In [69], we have discussed how these mechanisms yield self-organized epigenesis with emergent ordered behavioral and cognitive developmental stages. We described a robotic experiment that explored the hypothesis that progress in learning, in and for itself, generates intrinsic rewards: the robot learners probabilistically selected experiences according to their potential for reducing uncertainty. In these experiments, curiosity-driven learning led the robot learner to successively discover object affordances and vocal interaction with its peers. We explain how a learning curriculum adapted to the current constraints of the learning system automatically formed, constraining learning and shaping the developmental trajectory. The observed trajectories in the robot experiment share many properties with those in infant development, including a mixture of regularities and diversities in the developmental patterns. Finally, we argued that such emergent developmental structures can guide and constrain evolution, in particular with regards to the origins of language.

Computational Models Of Developmental Exploration Mechanisms in Vocal Babbling and Arm Reaching in Infants

Participants : Pierre-Yves Oudeyer [correspondant] , Clement Moulin-Frier, Freek Stulp, Jules Borchard.

Proximodistal Freeing of DOFs in Motor Learning as an Emergent Property of Stochastic Optimization Principles

To harness the complexity of their high-dimensional bodies during sensorimotor development, infants are guided by patterns of freezing and freeing of degrees of freedom. We have formulated and studied computationally the hypothesis that such patterns, such as the proximodistal freeing of degrees of freedom when learning to reach, can emerge spontaneously as the result of a family of stochastic optimization processes, without an innate encoding of a mat- urational schedule. In particular, we present simulated experiments with a 6-DOF arm where a computational learner progressively acquires reaching skills through adaptive exploration, and we show that a proximodistal organization appears spontaneously, which we denote PDFF (ProximoDistal Freezing and Freeing of degrees of freedom). We also compare the emergent structuration as different arm structures are used – from human-like to quite unnatural ones – to study the effect of different kinematic structures on the emergence of PDFF.

Emergent Jaw Predominance in Vocal Development through Stochastic Optimization

Infant vocal babbling is strongly relying on jaw oscillations, especially at the stage of canonical babbling, which underlies the syllabic structure of world languages. We have proposed, modelled and analyzed an hypothesis to explain this predominance of the jaw in early babbling. This hypothesis states that general stochastic optimization principles, when applied to learning sensorimotor control, automatically generate ordered babbling stages with a predominant exploration of jaw movements in early stages, just like they generate proximo-distal organization of exploration in arm reaching as described in the paragraph above. In particular, such stochastic optimization principles predominantly explore jaw movement at the beginning of vocal learning, and when close to the rest position of the vocal tract, as it impacts the auditory effects more than other articulators.

Learning and Teaching in Adult-Child and Human-Robot Interaction

Participants : Anna-Lisa Vollmer [correspondant] , Pierre-Yves Oudeyer.

Pragmatic Frames

One of the big challenges in robotics today is to learn from human users that are inexperienced in interacting with robots but yet are often used to teach skills flexibly to other humans and to children in particular. A potential route toward natural and efficient learning and teaching in Human-Robot Interaction (HRI) is to leverage the social competences of humans and the underlying interactional mechanisms. In this perspective, we propose `pragmatic frames' as flexible interaction protocols that provide important contextual cues to enable learners to infer new action or language skills and teachers to convey these cues. Following the concept developed in the field of developmental linguistics [117], we define a pragmatic frame to be an interaction protocol negotiated over time between interaction partners. We further specify a Pragmatic Frame to especially involve an observable coordinated sequence of behaviors and also relevant cognitive operations.

Figure 10 depicts the book reading frame Bruner observed in his studies on word learning.

Figure 10. Example of a learning/teaching pragmatic frame.

At home, a mother is sitting on the sofa with her child on her lap and she is holding a picture book in front of them. The mother points to the book and says “look!” to direct the child's attention. The child then gazes to the image. And the mother asks “What‘s that?”, prompting the child's performance. The child answers with babble strings and smiles, maybe “auo”. “Yes, a pineapple!” The mother gives positive feedback and the correct label. “Anappo”, again babble strings and smiles. And the mother gives positive feedback. This stable sequence that the child is familiar with helps the child to participate and to pick up the only variable information he or she is supposed to learn. We argue that this frame also triggers the relevant cognitive functions to process the information.

Our results in 2016 have been twofold. First, in a paper published in Frontiers in Psychology [70], we have given a theoretical account of pragmatic frames as an alternative to the mapping metaphor which posits that children learn a word by mapping it onto a concept of an object or event. However, we believe that a mapping metaphor cannot account for word learning, because even though children focus attention on objects, they do not necessarily remember the connection between the word and the referent unless it is framed pragmatically, that is, within a task. Word learning with pragmatic frames occurs as children accomplish a goal in cooperation with a partner. We elaborate on pragmatic frames, offer some initial parametrizations of the concept, and embed it in current language learning approaches.

Second, aiming at leveraging the concept of pragmatic frames for Human-Robot Interaction, we published an article in Frontiers in Neurorobotics [71] in which we study a selection of HRI work in the literature which has focused on learning–teaching interaction and analyze the interactional and learning mechanisms that were used in the light of pragmatic frames. This allows us to show that many of the works have already used in practice, but not always explicitly, basic elements of the pragmatic frames machinery. However, we also show that pragmatic frames have so far been used in a very restricted way as compared to how they are used in human–human interaction and argue that this has been an obstacle preventing robust natural multi-task learning and teaching in HRI. In particular, we explain that two central features of human pragmatic frames, mostly absent of existing HRI studies, are that (1) social peers use rich repertoires of frames, potentially combined together, to convey and infer multiple kinds of cues; (2) new frames can be learnt continually, building on existing ones, and guiding the interaction toward higher levels of complexity and expressivity. To conclude, we give an outlook on the future research direction describing the relevant key challenges that need to be solved for leveraging pragmatic frames for robot learning and teaching.

Models of Self-organization of lexical conventions: the role of Active Learning and Active Teaching in Naming Games

Participants : William Schueller [correspondant] , Pierre-Yves Oudeyer.

How does language emerge, evolve and gets transmitted between individuals? What mechanisms underlie the formation and evolution of linguistic conventions, and what are their dynamics? Computational linguistic studies have shown that local interactions within groups of individuals (e.g. humans or robots) can lead to self-organization of lexica associating semantic categories to words [168]. However, it still doesn't scale well to complex meaning spaces and a large number of possible word-meaning associations (or lexical conventions), suggesting high competition among those conventions.

In statistical machine learning and in developmental sciences, it has been argued that an active control of the complexity of learning situations can have a significant impact on the global dynamics of the learning process [30], [131], [140]. This approach has been mostly studied for single robotic agents learning sensorimotor affordances [153], [38]. However active learning might represent an evolutionary advantage for language formation at the population level as well [49], [170].

Naming Games are a computational framework, elaborated to simulate the self-organization of lexical conventions in the form of a multi-agent model [169]. Through repeated local interactions between random couples of agents (designated speaker and hearer), shared conventions emerge. Interactions consist of uttering a word – or an abstract signal – referring to a topic, and evaluating the success or failure of communication.

However, in existing works processes involved in these interactions are typically random choices, especially the choice of a communication topic.

The introduction of active learning algorithms in these models produces significant improvement of the convergence process towards a shared vocabulary, with the speaker [53], [46], [122] or the hearer [90] actively controlling vocabulary growth.

We study here how the convergence time and the maximum level of complexity scale with population size, for three different strategies (one with random topic choice and two with active topic choice) detailed in table 11.

Figure 11. Strategies: Choice of meaning m. Both active strategies use a parameter (α and n), which is each time chosen optimal in our simulations.

As for the version of the Naming Game used in our work, the scenario of the interaction is described in [90]. Vocabulary is updated as described in the Minimal Naming Game, detailed in [177]. In our simulations, we choose to set N=M=W, where N is the population size, M the number of meanings, and W the number of possible words. The computed theoretical success ratio of communication is used to represent the degree of convergence toward a shared lexicon for the whole population. A value of 1 means that the population reached full convergence. Complexity level of an individual lexicon is measured as the total number of distinct associations between meanings and words in the lexicon, or in other words: memory usage.

Figure 12. Strategy comparisons, in terms of convergence time (theoretical success ratio) and complexity level (memory usage). In this case, the hearer is the one choosing the topic. M=W=N=40, averaged over 8 trials
IMG/WSsrtheo.png IMG/WScomplexity.png
Figure 13. Scaling of maximum memory usage and convergence time for the different strategy, in function of population size. In this case, the hearer is the one choosing the topic. M=W=N, averaged over 8 trials.
IMG/WSconv_time.png IMG/WSmax_mem.png

We show here (see figures 12,13) that convergence time and maximum complexity are reduced with active topic choice, a behavior that is amplified as larger populations are considered. The minimal counts strategy yields a strictly minimum complexity (equal to the complexity of a completed lexicon), while converging as fast as the success threshold strategy. Further work will deal with other variants of the Naming Game (with different vocabulary update, population replacement, and different ratio for N, M and W). For the moment only the hearer's choice scenario is studied, because of its high robustness to changes in parameter values for the different strategies [90].