Section: New Results

Computational Models Of Human Development and Cognition

Computational Models Of Information-Seeking and Curiosity-Driven Learning in Humans and Animals

Participants : Pierre-Yves Oudeyer [correspondant] , William Schueller, Sebastien Forestier, Alvaro Ovalle.

This project involves a collaboration between the Flowers team, the Cognitive Neuroscience Lab of J. Gottlieb at Columbia Univ. (NY, US), and the developmental psychology lab of Celeste Kidd at Univ. Rochester, US, on the understanding and modeling of mechanisms of curiosity, attention and active intrinsically motivated exploration that until now have been little explored in neuroscience, machine learning and cognitive robotics.

It is organized around the study of the hypothesis that information gain (or control gain) could generate intrinsic reward in the brain (living or artificial), driving attention and exploration independently from material rewards, and allowing for autonomous lifelong acquisition of open repertoires of skills. The project combines expertise about attention and exploration in the brain and a strong methodological framework for conducting experimentations with monkeys, human adults (Gottlieb's lab) and children (Kidd's lab) together with computational modeling of curiosity/intrinsic motivation and learning in the Flowers team.

Such a collaboration paves the way towards a central objective, which is now a central strategic objective of the Flowers team: designing and conducting experiments in animals and humans informed by computational/mathematical theories of information seeking, and allowing to test the predictions of these computational theories.


Curiosity can be understood as a family of mechanisms that evolved to allow agents to maximize their knowledge (or their control) of the useful properties of the world - i.e., the regularities that exist in the world - using active, targeted investigations. In other words, we view curiosity as a decision process that maximizes learning/competence progress (rather than minimizing uncertainty) and assigns value ("interest") to competing tasks based on their epistemic qualities - i.e., their estimated potential allow discovery and learning about the structure of the world.

Because a curiosity-based system acts in conditions of extreme uncertainty (when the distributions of events may be entirely unknown) there is in general no optimal solution to the question of which exploratory action to take [31], [152], [160]. Therefore we hypothesize that, rather than using a single optimization process as it has been the case in most previous theoretical work [120], curiosity is comprised of a family of mechanisms that include simple heuristics related to novelty/surprise and measures of learning progress over longer time scales [150] [98], [146]. These different components are related to the subject's epistemic state (knowledge and beliefs) and may be integrated with fluctuating weights that vary according to the task context. We will quantitatively characterize this dynamic, multi-dimensional system in the framework of Bayesian Reinforcement Learning, as described below.

Because of its reliance on epistemic currencies, curiosity is also very likely to be sensitive to individual differences in personality and cognitive functions. Humans show well-documented individual differences in curiosity and exploratory drives [137], [159], and rats show individual variation in learning styles and novelty seeking behaviors [115], but the basis of these differences is not understood. We postulate that an important component of this variation is related to differences in working memory capacity and executive control which, by affecting the encoding and retention of information, will impact the individual's assessment of learning, novelty and surprise and ultimately, the value they place on these factors [156], [168], [93], [174]. To start understanding these relationships, about which nothing is known, we will search for correlations between curiosity and measures of working memory and executive control in the population of children we test in our tasks, analyzed from the point of view of a computational model based on Bayesian reinforcement learning.

A final premise guiding our research is that essential elements of curiosity are shared by humans and non-human primates. Human beings have a superior capacity for abstract reasoning and building causal models, which is a prerequisite for sophisticated forms of curiosity such as scientific research. However, if the task is adequately simplified, essential elements of curiosity are also found in monkeys [137], [132] and, with adequate characterization, this species can become a useful model system for understanding the neurophysiological mechanisms.


Our studies have several highly innovative aspects, both with respect to curiosity and to the traditional research field of each member team.

  • Linking curiosity with quantitative theories of learning and decision making: While existing investigations examined curiosity in qualitative, descriptive terms, here we propose a novel approach that integrates quantitative behavioral and neuronal measures with computationally defined theories of Bayesian Reinforcement Learning and decision making.

  • Linking curiosity in children and monkeys: While existing investigations examined curiosity in humans, here we propose a novel line of research that coordinates its study in humans and non-human primates. This will address key open questions about differences in curiosity between species, and allow access to its cellular mechanisms.

  • Neurophysiology of intrinsic motivation: Whereas virtually all the animal studies of learning and decision making focus on operant tasks (where behavior is shaped by experimenter-determined primary rewards) our studies are among the very first to examine behaviors that are intrinsically motivated by the animals' own learning, beliefs or expectations.

  • Neurophysiology of learning and attention: While multiple experiments have explored the single-neuron basis of visual attention in monkeys, all of these studies focused on vision and eye movement control. Our studies are the first to examine the links between attention and learning, which are recognized in psychophysical studies but have been neglected in physiological investigations.

  • Computer science: biological basis for artificial exploration: While computer science has proposed and tested many algorithms that can guide intrinsically motivated exploration, our studies are the first to test the biological plausibility of these algorithms.

  • Developmental psychology: linking curiosity with development: While it has long been appreciated that children learn selectively from some sources but not others, there has been no systematic investigation of the factors that engender curiosity, or how they depend on cognitive traits.

Current results

In particular, new works and results in 2017 include:

Experiments in Active Categorization

In 2017, we have been occupied by the implementation, running and analysis of the human adult experiment piloted the year before. A distinguishing feature of curiosity is that, rather than seeking to obtain information in a known task context (e.g., reading the menu in a restaurant) curiosity has to discover regularities whose existence is a priori unknown. This raises the question of how active learners become interested in specific items: how do agents decide which task to be interested in – i.e., allocate “study time" - given that the underlying rewards or patterns are sparse and unknown? A theoretical solution to this problem is suggested by the optimal learning literature, and proposes that allocation of resources may be based on the relative difficulty of competing tasks , or the learning progress (LP) expected from engaging a task. While these strategies can make equivalent predictions in certain simple situations (e.g., when learning curves are known and concave), LP-based mechanisms are superior in open-ended environments that contain unlearnable tasks. In such situations, LP-based strategies assign lower value to tasks where little progress is made and allow the learner to disengage from such tasks, while performance-based mechanisms, by assigning higher value to the lower-competence task, can push the learner to labor in vain. In the present experiment we asked whether humans possess, and use, metacognitive abilities to guide performance-based or LP-based exploration in two contexts in which they could freely choose to learn about 4 competing tasks. Participants (n = 505, recruited via Amazon Mechanical Turk) were tested on a paradigm in which they could freely choose to engage with one of four different classification tasks. We are currently analyzing the results and working on a computational models of the underlying cognitive and motivational mechanisms.

Computational Models Of Tool Use and Speech Development: the Roles of Active Learning, Curiosity and Self-Organization

Participants : Pierre-Yves Oudeyer [correspondant] , Sébastien Forestier.

Modeling Speech and Tool Use Development in Infants

A scientific challenge in developmental and social robotics is to model how autonomous organisms can develop and learn open repertoires of skills in high-dimensional sensorimotor spaces, given limited resources of time and energy. This challenge is important both from the fundamental and application perspectives. First, recent work in robotic modeling of development has shown that it could make decisive contributions to improve our understanding of development in human children, within cognitive sciences [120]. Second, these models are key for enabling future robots to learn new skills through lifelong natural interaction with human users, for example in assistive robotics [154].

In recent years, two strands of work have shown significant advances in the scientific community. On the one hand, algorithmic models of active learning and imitation learning combined with adequately designed properties of robotic bodies have allowed robots to learn how to control an initially unknown high-dimensional body (for example locomotion with a soft material body [3]). On the other hand, other algorithmic models have shown how several social learning mechanisms could allow robots to acquire elements of speech and language [105], allowing them to interact with humans. Yet, these two strands of models have so far mostly remained disconnected, where models of sensorimotor learning were too “low-level” to reach capabilities for language, and models of language acquisition assumed strong language specific machinery limiting their flexibility. Preliminary work has been showing that strong connections are underlying mechanisms of hierarchical sensorimotor learning, artificial curiosity, and language acquisition [54].

Recent robotic modeling work in this direction has shown how mechanisms of active curiosity-driven learning could progressively self-organize developmental stages of increasing complexity in vocal skills sharing many properties with the vocal development of infants [39]. Interestingly, these mechanisms were shown to be exactly the same as those that can allow a robot to discover other parts of its body, and how to interact with external physical objects [149].

In such current models, the vocal agents do not associate sounds to meaning, and do not link vocal production to other forms of action. In other models of language acquisition, one assumes that vocal production is mastered, and hand code the meta-knowledge that sounds should be associated to referents or actions [105]. But understanding what kind of algorithmic mechanisms can explain the smooth transition between the learning of vocal sound production and their use as tools to affect the world is still largely an open question.

The goal of this work is to elaborate and study computational models of curiosity-driven learning that allow flexible learning of skill hierarchies, in particular for learning how to use tools and how to engage in social interaction, following those presented in [51], [3], [45], [39]. The aim is to make steps towards addressing the fundamental question of how speech communication is acquired through embodied interaction, and how it is linked to tool discovery and learning.

We take two approaches to study those questions. One approach is to develop robotic models of infant development by looking at the developmental psychology literature about tool use and speech and trying to implement and test the psychologists' hypotheses about the learning mechanisms underlying infant development. Our second approach is to directly collaborate with developmental psychologists to analyze together the data of their experiments and develop other experimental setup that are well suited to answering modeling questions about the underlying exploration and learning mechanisms. We thus started to collaborate with Lauriane Rat-Fischer, a developmental psychologist working in Toulouse on the emergence of tool use in the first years of human life. We are currently analyzing together the behaviour of 22 month old infants in a tool use task where the infants have to retrieve a toy put in the middle of a tube by inserting sticks into the tube and pushing the toy out. We are looking at the different actions of the infant with tools and toys but also its looking behaviour, towards the tool, toys or the experimenter, and we are trying to infer the goals and exploration strategies of the infant.

In our recent robotic modeling work, we showed that the Model Babbling learning architecture allows the development of tool use in a robotic setup, through several fundamental ideas. First, goal babbling is a powerful form of exploration to produce a diversity of effects by self-generating goals in a task space. Second, the possible movements of each object define a task space in which to choose goals, and the different task spaces form an object-based representation that facilitates prediction and generalization. Also, cross-learning between tasks updates all skills while exploring one in particular. A novel insight was that early development of tool use could happen without a combinatorial action planning mechanism: modular goal babbling in itself allowed the emergence of nested tool use behaviors.

This year we extended this architecture so that the agent can imitate caregiver's sounds in addition to exploring autonomously [78]. We hypothesized that these same algorithmic ingredients could allow a joint unified development of speech and tool use. Our learning agent is situated in a simulated environment where a vocal tract and a robotic arm are to be explored with the help of a caregiver. The environment is composed of three toys, one stick that can be used as a tool to move toys, and a caregiver moving around. The caregiver helps in two ways. If the agent touches a toy, the caregiver produces this toy's name, but otherwise produces a distractor word as if it was talking to another adult. If the agent produces a sound close to a toy's name, the caregiver moves this toy within agent reach 8.

Figure 8. Agent's robotic and vocal environment. Left: Agent's 3 DOF arm, controlled with 21 parameters, grabs toys with its hand, or uses the stick to reach toys. Caregiver brings a toy within reach if the agent says its name. Right: Agent's vocal environment representing sounds as trajectories in the two first formants space. Agent's simulated vocal tract produces sounds given 28 parameters. When agent touches a toy, caregiver says toy's name. Some sounds corresponding to random parameters are plotted in red, and some sounds produced when imitating caregiver's /uye/ word in blue (best imitation in bold, error 0.3).
IMG/cogsci2017_fig_env.png IMG/cogsci2017_fig1b.png

We show that our learning architecture based on Model Babbling allows agents to learn how to 1) use the robotic arm to grab a toy or a stick, 2) use the stick as a tool to get a toy, 3) learn to produce toy names with the vocal tract, 4) use these vocal skills to get the caregiver to bring a specific toy within reach, and 5) choose the most relevant of those strategies to retrieve a toy that can be out-of-reach. Also, the grounded exploration of toys accelerates the learning of the production of accurate sounds for toy names once the caregiver is able to recognize them and react by bringing them within reach, with respect to distractor sounds without any meaning in the environment. Our model is the first to allow the study of the early development of tool use and speech in a unified framework.

This model focuses on the role of one important form of body babbling where exploration is directed towards self-generated goals in free play, combined with imitation learning of a contingent caregiver. This model does not assume capabilities for complex sequencing and combinatorial planning which are often considered necessary for tool use. Yet, we show that the mechanisms in this model allow a learner to progressively discover how to grab objects with the hand, how to use objects as tools to reach further objects, how to produce vocal sounds, and how to leverage these vocal sounds to use a caregiver as a social tool to retrieve objects. Also, the discovery that certain sounds can be used as a social tool further guides vocal learning. This model predicts that infants learn to vocalize the name of toys in a natural play scenario faster than learning other words because they often choose goals related to those toys and engage caregiver’s help by trying to vocalize those toys’ names. We presented those results at the 39th Annual Conference of the Cognitive Science Society (CogSci 2017).

Computational Models Of Developmental Exploration Mechanisms in Vocal Babbling and Arm Reaching in Infants

Participants : Pierre-Yves Oudeyer [correspondant] , Clement Moulin-Frier, Freek Stulp, Jules Brochard.

Proximodistal Exploration in Motor Learning as an Emergent Property of Optimization

To harness the complexity of their high-dimensional bodies during sensorimotor development, infants are guided by patterns of freezing and freeing of degrees of freedom. For instance, when learning to reach, infants free the degrees of freedom in their arm proximodis-tally, i.e. from joints that are closer to the body to those that are more distant. We formulated and studied computationally the hypothesis that such patterns can emerge spontaneously as the result of a family of stochastic optimization processes (evolution strategies with covariance-matrix adaptation), without an innate encoding of a maturational schedule. In particular, we made simulated experiments with an arm where a computational learner progressively acquires reaching skills through adaptive exploration, and we showed that a proximodistal organization appears spontaneously, which we denoted PDFF (ProximoDistal Freezing and Freeing of degrees of freedom). We also compared this emergent organization between different arm morphologies – from human-like to quite unnatural ones – to study the effect of different kinematic structures on the emergence of PDFF. This work was published in the journal Developmental Science[74].

Emergent Jaw Predominance in Vocal Development through Stochastic Optimization

Infant vocal babbling is strongly relying on jaw oscillations, especially at the stage of canonical babbling, which underlies the syllabic structure of world languages. We have proposed, modelled and analyzed an hypothesis to explain this predominance of the jaw in early babbling. This hypothesis states that general stochastic optimization principles, when applied to learning sensorimotor control, automatically generate ordered babbling stages with a predominant exploration of jaw movements in early stages, just like they generate proximo-distal organization of exploration in arm reaching as described in the paragraph above. In particular, such stochastic optimization principles predominantly explore jaw movement at the beginning of vocal learning, and when close to the rest position of the vocal tract, as it impacts the auditory effects more than other articulators. This work was published in the journal IEEE Transactions on Cognitive and Developmental Systems[73].

Models of Self-organization of lexical conventions: the role of Active Learning and Active Teaching in Naming Games

Participants : William Schueller [correspondant] , Pierre-Yves Oudeyer.

How does language emerge, evolve and gets transmitted between individuals? What mechanisms underly the formation and evolution of linguistic conventions, and what are their dynamics? Computational linguistic studies have shown that local interactions within groups of individuals (e.g. humans or robots) can lead to self-organization of lexica associating semantic categories to words [165]. However, it still doesn't scale well to complex meaning spaces and a large number of possible word-meaning associations (or lexical conventions), suggesting high competition among those conventions.

In statistical machine learning and in developmental sciences, it has been argued that an active control of the complexity of learning situations can have a significant impact on the global dynamics of the learning process [120] [130] [139]. This approach has been mostly studied for single robotic agents learning sensorimotor affordances [150][40]. However active learning might represent an evolutionary advantage for language formation at the population level as well [54] [167].

Naming Games are a computational framework, elaborated to simulate the self-organization of lexical conventions in the form of a multi-agent model [166]. Through repeated local interactions between random couples of agents (designated speaker and hearer), shared conventions emerge. Interactions consist of uttering a word – or an abstract signal – referring to a topic, and evaluating the success or failure of communication.

However, in existing works processes involved in these interactions are typically random choices, especially the choice of a communication topic.

The introduction of active learning algorithms in these models produces significant improvement of the convergence process towards a shared vocabulary, with the speaker [49], [60] [109] or the hearer [61] actively controlling vocabulary growth.

Figure 9.
Definition of a local measure of convergence for the Naming Game: Local Approximated Probability of Success (LAPS), using limited memory of past interactions

In the Naming Game, one measure is usually used to represent the state of convergence of the population: the success rate, or probability of success at a given time step. It increases over time, from 0 to 1. This measure is however global, and not accessible to individual agents; in which case it would have been a perfect candidate for a functional whose maximization would drive local behavior. Several other measures have been suggested, as one based on local information gain, or entropy reduction [60]. Those measures however are either defined in a very constrained case (without synonymy and homonymy, and fixed and known numbers of words and meanings), and their minimization can actually block the process of convergence – as their evolution is not easily predictable.

Instead, we defined a local approximation of the success rate. For this, we need a representation of the state of the population. This is done by constructing an average vocabulary representing the population, using a partial memory of the past interactions. This vocabulary is then used together with the agent's own vocabulary to compute a probability of success. A key element of this measure is the time scale associated to the memory: in fact, it allows not only to define a degree of certainty of a given association, but also a degree of uncertainty at a higher level (word or meaning). This measure is local (available to an agent through only its own knowledge) but its convergence to 100% is bound to global dynamics. In other words, we can use it as a functional to maximize at the local level to reinforce agreement at the population level.

Active Topic Choice: LAPS and Multi-Armed Bandits

Usually, the topic used in an interaction of the Naming Game is picked randomly. A first way of introducing active control of complexity growth is through the mechanism of topic choice: choosing it according to past memory. It allows each agent to balance reinforcement of known associations and invention of new ones, which can be seen as an exploitation vs. exploration problem. This can speed up convergence processes, and even lower significantly local and global complexity: for example in [60], [61], where heuristics based on the number of past successful interactions were used.

However, we can now define new strategies directly maximizing the LAPS measure. At each step, the agent picking a topic will choose one that yields maximum expected increase of the LAPS measure. However, this expected value being computationally really costly, we use a Multi-Armed Bandit algorithm. At the beginning, only one machine is available, the exploration machine. When used by the agent, its parameters are updated through Thompson Sampling algorithm, and a new machine is created with the exact same parameters, corresponding to the newly explored meaning. At any time, the number of machines available to the agent is then equal to the number of already known meanings, plus one (the exploration machine).

This strategy can speed up convergence the convergence process, but also diminishes significantly the global complexity – i.e. the maximum number of distinct word-meaning association present in the population. See figure 10.

Figure 10. Measures of convergence (global probability of success) and global complexity (number of distinct word-meaning association present in the population) for simulations using Random Topic Choice and MAB LAPS maximization Topic Choice. The active topic choice strategy yields faster convergence, with less complexity. Parameters used: 60 agents, 40 meanings, 40 words, time scale for LAPS 10 interactions.
IMG/WS_mab_srtheo.png IMG/WS_mab_Nd.png
Acceptance policy: Updating or not vocabulary based on memory of past interactions

Another way to control complexity in the Naming Game is to choose whether to trust or not other agents during a given interaction, by taking into account or not their own word-meaning associations. In previous work, a purely stochastic acceptance of new information has been studied [97]. However, accepting or not new information should depend on the memory provided by past interactions to be efficient. To do so, we use a local approximation of the global agreement as a functional to optimize at each interaction, based on recent information: the LAPS measure We can show that for an appropriate time scale of this recent information, local complexity (amount of word-meaning association to be remembered) remains low, without impacting the duration of the global agreement process. The exact dependance on parameters (time scale, population size, meaning and word spaces size) is still to be explored.

Structured meaning spaces exploration

In the models we have considered so far, meanings were always in a finite number, and without any structure or relative importance. Also, the whole meaning space is accessible from the start. We studied a scenario where meanings are not all available from the beginning, but taken from a growing space: known meanings plus the Adjacent Possible [131] [172]. In practice, we consider a graph of meanings, and a starting meaning m0. The adjacent possible is the set of nodes connected to m0. Whenever a meaning from this set is explored, it is withdrawn from the adjacent possible, but all its neighbors not already known are added to it. In this case, Active Topic Choice helps to keep a quasi-linear pace of exploration, while agreeing on explored meanings. Random Topic Choice explores all available meanings before starting the agreement process: hence, on a big meaning space, possibly infinite, this is really inefficient in terms of communication success. See figure 11.

Figure 11. Comparison of Random and Active Topic Choice on a structured meaning space. The space used is a balanced tree, with initially accessible meaning being the root of the tree. On the left, evolution of global complexity (number of distinct word-meaning association present in the population): Active Topic Choice helps keeping a low complexity, with quasi linear growth, whereas Random Topic Choice first goes to a maximum way higher than the final expected value. Parameters used: 10 agents, 100 meanings, 100 words. On the right, illustration of the status of a population in both cases, after half of the interactions needed to converge to global agreement. Nodes represent meanings, their size the number of agents having at least a word for them, and their color the level of agreement between all agents of the population for the given meaning. We can see that Active Topic Choice population has not talked about all meanings, but agrees on all the one that were used; whereas in the other case all meanings were used but almost no agreement is reached.
IMG/WS_struct_Nd.png IMG/WS_structured.png
Interactive application for collaborative creation of a language: Experimenting how humans actively negotiate new linguistic conventions

How do humans agree and negotiate linguistic conventions? This question is at the root of the domain of experimental semiotics [118], which is the context of our experiment/application. Typically, the experiments of this field consist in making human subjects play a game where they have to learn how to interact/collaborate through a new unknown communication medium (such as abstract symbols). In recent years, such experiments allowed to see how new conventions could be formed and evolve in population of individuals, shading light on the origins and evolution of languages [133] [116].

We consider a version of the Naming Game [177] [140], focusing on the influence of active learning/teaching mechanisms on the global dynamics. In particular, agreement is reached sooner when agents actively choose the topic of each interaction [49], [60], [61].

Through this experiment, we confront existing topic choice algorithms to actual human behavior. Participants interact through the mediation of a controlled communication system – a web application – by choosing words to refer to objects. Similar experiments have been conducted in previous work to study the agreement dynamics on a name for a single picture [106]. Here, we make several pictures or interaction topics available, and quantify the extent to which participants actively choose topics in their interactions.

  • Individual short experiment (implemented): each user interacts for about 3-4 min ( <30 interactions) with a brand new population of 7 simulated agents. They take the role of one designated agent, and play the Naming Game as this agent. Each time they interact as speakers, they can select the topics of conversation from a set of 5 objects, and are offered 6 possible words to refer to them. Their choices influence the global emergence of a common lexical convention, reached when communications are successful. The goal is to maximize a score based on the number of successful interactions (among the 50 in total for each run). They can see a list of the past interactions, with chosen topic, chosen word, and whether the interaction was succesful or not. This experiment allows us to directly measure if there is a bias in the choice of topics, compared to random choice, based on memory of past interactions. Performance can then be compared to existing topic choice algorithms [49], [60], [61].

  • Collective creation of a language and conceptual exploration (under development): Users interact with agents picked from a population which is kept for the whole duration of the experiment, common to all users. Meanings that can be used as topics are drawn from a bigger space than for the first experiment. Word space is a combination of a few basic available syllables (to avoid direct usage of known words). Users interact with a slowly increasing subset of this population, so that newcomers have the same level of influence within their own part of the experiment as people who interacted at the beginning of the day. Successfully communicating about certain meanings/objects unlocks new available meanings, and therefore we can observe the whole process of collective conceptual exploration. Linguistic conventions are set and learned/shared by users, through the interaction with simulated agents. Users never interact directly with each other, therefore no synchronization is needed. In other words, if one user decides not to finish the current interaction, it will not affect other users. We can measure in this scenario statistical properties of the language like frequency distribution, rate of exploration as well as degree of convergence.

Figure 12. Example of a game with the interface already existing. Play it here: http://naming-game.bordeaux.inria.fr

The experiment – available at http://naming-game.bordeaux.inria.fr – was presented at the Kreyon Conference in Rome, in september 2017, during a talk and as part of interactive installation consisting in numerous scientific experiments. Insufficient data was collected to get significant results. To recruit more players and collect a large amount of data, we plan to use crowdsourcing platforms.