The Flowers project-team, at Inria, University of Bordeaux and Ensta ParisTech, studies models of open-ended development and learning. These models are used as tools to help us understand better how children learn, as well as to build machines that learn like children, i.e. developmental artificial intelligence, with applications in educational technologies, automated discovery, robotics and human-computer interaction.
A major scientific challenge in artificial intelligence and cognitive sciences is to understand how humans and machines can efficiently acquire world models, as well as open and cumulative repertoires of skills over an extended time span. Processes of sensorimotor, cognitive and social development are organized along ordered phases of increasing complexity, and result from the complex interaction between the brain/body with its physical and social environment.
To advance the fundamental understanding of mechanisms of development, the FLOWERS team develops computational models that leverage advanced machine learning techniques such as intrinsically motivated deep reinforcement learning, in strong collaboration with developmental psychology and neuroscience. In particular, the team focuses on models of intrinsically motivated learning and exploration (also called curiosity-driven learning), with mechanisms enabling agents to learn to represent and generate their own goals, self-organizing a learning curriculum for efficient learning of world models and skill repertoire under limited resources of time, energy and compute. The team also studies how autonomous learning mechanisms can enable humans and machines to acquire grounded language skills, using neuro-symbolic architectures for learning structured representations and handling systematic compositionality and generalization.
Beyond leading to new theories and new experimental paradigms to understand human development in cognitive science, as well as new fundamental approaches to developmental machine learning, the team explores how such models can find applications in robotics, human-computer interaction, multi-agent systems, automated discovery and educational technologies. In robotics, the team studies how artificial curiosity combined with imitation learning can provide essential building blocks allowing robots to acquire multiple tasks through natural interaction with naïve human users, for example in the context of assistive robotics. The team also studies how models of curiosity-driven learning can be transposed in algorithms for intelligent tutoring systems, allowing educational software to incrementally and dynamically adapt to the particularities of each human learner, and proposing personalized sequences of teaching activities.
The work of FLOWERS is organized around the following axis:
Research in artificial intelligence, machine learning and pattern recognition has produced a tremendous amount of results and concepts in the last decades. A blooming number of learning paradigms - supervised, unsupervised, reinforcement, active, associative, symbolic, connectionist, situated, hybrid, distributed learning... - nourished the elaboration of highly sophisticated algorithms for tasks such as visual object recognition, speech recognition, robot walking, grasping or navigation, the prediction of stock prices, the evaluation of risk for insurances, adaptive data routing on the internet, etc... Yet, we are still very far from being able to build machines capable of adapting to the physical and social environment with the flexibility, robustness, and versatility of a one-year-old human child.
Indeed, one striking characteristic of human children is the nearly open-ended diversity of the skills they learn. They not only can improve existing skills, but also continuously learn new ones. If evolution certainly provided them with specific pre-wiring for certain activities such as feeding or visual object tracking, evidence shows that there are also numerous skills that they learn smoothly but could not be “anticipated” by biological evolution, for example learning to drive a tricycle, using an electronic piano toy or using a video game joystick. On the contrary, existing learning machines, and robots in particular, are typically only able to learn a single pre-specified task or a single kind of skill. Once this task is learnt, for example walking with two legs, learning is over. If one wants the robot to learn a second task, for example grasping objects in its visual field, then an engineer needs to re-program manually its learning structures: traditional approaches to task-specific machine/robot learning typically include engineer choices of the relevant sensorimotor channels, specific design of the reward function, choices about when learning begins and ends, and what learning algorithms and associated parameters shall be optimized.
As can be seen, this requires a lot of important choices from the engineer, and one could hardly use the term “autonomous” learning. On the contrary, human children do not learn following anything looking like that process, at least during their very first years. Babies develop and explore the world by themselves, focusing their interest on various activities driven both by internal motives and social guidance from adults who only have a folk understanding of their brains. Adults provide learning opportunities and scaffolding, but eventually young babies always decide for themselves what activity to practice or not. Specific tasks are rarely imposed to them. Yet, they steadily discover and learn how to use their body as well as its relationships with the physical and social environment. Also, the spectrum of skills that they learn continuously expands in an organized manner: they undergo a developmental trajectory in which simple skills are learnt first, and skills of progressively increasing complexity are subsequently learnt.
A link can be made to educational systems where research in several domains have tried to study how to provide a good learning or training experience to learners. This includes the experiences that allow better learning, and in which sequence they must be experienced. This problem is complementary to that of the learner who tries to progress efficiently, and the teacher here has to use as efficiently the limited time and motivational resources of the learner. Several results from psychology 75 and neuroscience 94 have argued that the human brain feels intrinsic pleasure in practicing activities of optimal difficulty or challenge. A teacher must exploit such activities to create positive psychological states of flow 86 for fostering the indivual engagement in learning activities. A such view is also relevant for reeducation issues where inter-individual variability, and thus intervention personalization are challenges of the same magnitude as those for education of children.
A grand challenge is thus to be able to build machines that possess this capability to discover, adapt and develop continuously new know-how and new knowledge in unknown and changing environments, like human children. In 1950, Turing wrote that the child's brain would show us the way to intelligence: “Instead of trying to produce a program to simulate the adult mind, why not rather try to produce one which simulates the child's” 152. Maybe, in opposition to work in the field of Artificial Intelligence who has focused on mechanisms trying to match the capabilities of “intelligent” human adults such as chess playing or natural language dialogue 100, it is time to take the advice of Turing seriously. This is what a new field, called developmental (or epigenetic) robotics, is trying to achieve 116156. The approach of developmental robotics consists in importing and implementing concepts and mechanisms from developmental psychology 121, cognitive linguistics 85, and developmental cognitive neuroscience 104 where there has been a considerable amount of research and theories to understand and explain how children learn and develop. A number of general principles are underlying this research agenda: embodiment 78134, grounding 98, situatedness 148, self-organization 150129, enaction 154, and incremental learning 81.
Among the many issues and challenges of developmental robotics, two of them are of paramount importance: exploration mechanisms and mechanisms for abstracting and making sense of initially unknown sensorimotor channels. Indeed, the typical space of sensorimotor skills that can be encountered and learnt by a developmental robot, as those encountered by human infants, is immensely vast and inhomogeneous. With a sufficiently rich environment and multimodal set of sensors and effectors, the space of possible sensorimotor activities is simply too large to be explored exhaustively in any robot's life time: it is impossible to learn all possible skills and represent all conceivable sensory percepts. Moreover, some skills are very basic to learn, some other very complicated, and many of them require the mastery of others in order to be learnt. For example, learning to manipulate a piano toy requires first to know how to move one's hand to reach the piano and how to touch specific parts of the toy with the fingers. And knowing how to move the hand might require to know how to track it visually.
Exploring such a space of skills randomly is bound to fail or result at best on very inefficient learning 130. Thus, exploration needs to be organized and guided. The approach of epigenetic robotics is to take inspiration from the mechanisms that allow human infants to be progressively guided, i.e. to develop. There are two broad classes of guiding mechanisms which control exploration:
In infant development, one observes a progressive increase of the complexity of activities with an associated progressive increase of capabilities 121, children do not learn everything at one time: for example, they first learn to roll over, then to crawl and sit, and only when these skills are operational, they begin to learn how to stand. The perceptual system also gradually develops, increasing children perceptual capabilities other time while they engage in activities like throwing or manipulating objects. This make it possible to learn to identify objects in more and more complex situations and to learn more and more of their physical characteristics.
Development is therefore progressive and incremental, and this might be a crucial feature explaining the efficiency with which children explore and learn so fast. Taking inspiration from these observations, some roboticists and researchers in machine learning have argued that learning a given task could be made much easier for a robot if it followed a developmental sequence and “started simple” 6889. However, in these experiments, the developmental sequence was crafted by hand: roboticists manually build simpler versions of a complex task and put the robot successively in versions of the task of increasing complexity. And when they wanted the robot to learn a new task, they had to design a novel reward function.
Thus, there is a need for mechanisms that allow the autonomous control and generation of the developmental trajectory. Psychologists have proposed that intrinsic motivations play a crucial role. Intrinsic motivations are mechanisms that push humans to explore activities or situations that have intermediate/optimal levels of novelty, cognitive dissonance, or challenge 758688. Futher, the exploration of critical role of intrinsic motivation as lever of cognitive developement for all and for all ages is today expanded to several fields of research, closest to its original study, special education or cognitive aging, and farther away, neuropsychological clinical research. The role and structure of intrinsic motivation in humans have been made more precise thanks to recent discoveries in neuroscience showing the implication of dopaminergic circuits and in exploration behaviours and curiosity 87101145. Based on this, a number of researchers have began in the past few years to build computational implementation of intrinsic motivation 13013214372102118144. While initial models were developed for simple simulated worlds, a current challenge is to manage to build intrinsic motivation systems that can efficiently drive exploratory behaviour in high-dimensional unprepared real world robotic sensorimotor spaces 132, 130, 133, 142. Specific and complex problems are posed by real sensorimotor spaces, in particular due to the fact that they are both high-dimensional as well as (usually) deeply inhomogeneous. As an example for the latter issue, some regions of real sensorimotor spaces are often unlearnable due to inherent stochasticity or difficulty, in which case heuristics based on the incentive to explore zones of maximal unpredictability or uncertainty, which are often used in the field of active learning 8499 typically lead to catastrophic results. The issue of high dimensionality does not only concern motor spaces, but also sensory spaces, leading to the problem of correctly identifying, among typically thousands of quantities, those latent variables that have links to behavioral choices. In FLOWERS, we aim at developing intrinsically motivated exploration mechanisms that scale in those spaces, by studying suitable abstraction processes in conjunction with exploration strategies.
Social guidance is as important as intrinsic motivation in the cognitive development of human babies 121. There is a vast literature on learning by demonstration in robots where the actions of humans in the environment are recognized and transferred to robots 67. Most such approaches are completely passive: the human executes actions and the robot learns from the acquired data. Recently, the notion of interactive learning has been introduced in 151, 77, motivated by the various mechanisms that allow humans to socially guide a robot 139. In an interactive context the steps of self-exploration and social guidance are not separated and a robot learns by self exploration and by receiving extra feedback from the social context 151, 109, 119.
Social guidance is also particularly important for learning to segment and categorize the perceptual space. Indeed, parents interact a lot with infants, for example teaching them to recognize and name objects or characteristics of these objects. Their role is particularly important in directing the infant attention towards objects of interest that will make it possible to simplify at first the perceptual space by pointing out a segment of the environment that can be isolated, named and acted upon. These interactions will then be complemented by the children own experiments on the objects chosen according to intrinsic motivation in order to improve the knowledge of the object, its physical properties and the actions that could be performed with it.
In FLOWERS, we are aiming at including intrinsic motivation system in the self-exploration part thus combining efficient self-learning with social guidance 126, 127. We also work on developing perceptual capabilities by gradually segmenting the perceptual space and identifying objects and their characteristics through interaction with the user 117 and robots experiments 103. Another challenge is to allow for more flexible interaction protocols with the user in terms of what type of feedback is provided and how it is provided 114.
Exploration mechanisms are combined with research in the following directions:
FLOWERS develops machine learning algorithms that can allow embodied machines to acquire cumulatively sensorimotor skills. In particular, we develop optimization and reinforcement learning systems which allow robots to discover and learn dictionaries of motor primitives, and then combine them to form higher-level sensorimotor skills.
In order to harness the complexity of perceptual and motor spaces, as well as to pave the way to higher-level cognitive skills, developmental learning requires abstraction mechanisms that can infer structural information out of sets of sensorimotor channels whose semantics is unknown, discovering for example the topology of the body or the sensorimotor contingencies (proprioceptive, visual and acoustic). This process is meant to be open- ended, progressing in continuous operation from initially simple representations towards abstract concepts and categories similar to those used by humans. Our work focuses on the study of various techniques for:
FLOWERS studies how adequate morphologies and materials (i.e. morphological computation), associated to relevant dynamical motor primitives, can importantly simplify the acquisition of apparently very complex skills such as full-body dynamic walking in biped. FLOWERS also studies maturational constraints, which are mechanisms that allow for the progressive and controlled release of new degrees of freedoms in the sensorimotor space of robots.
FLOWERS studies mechanisms that allow a robot to infer structural information out of sets of sensorimotor channels whose semantics is unknown, for example the topology of the body and the sensorimotor contingencies (proprioceptive, visual and acoustic). This process is meant to be open-ended, progressing in continuous operation from initially simple representations to abstract concepts and categories similar to those used by humans.
FLOWERS studies how populations of interacting learning agents can collectively acquire cooperative or competitive strategies in challenging simulated environments. This differs from "Social learning and guidance" presented above: instead of studying how a learning agent can benefit from the interaction with a skilled agent, we rather consider here how social behavior can spontaneously emerge from a population of interacting learning agents. We focus on studying and modeling the emergence of cooperation, communication and cultural innovation based on theories in behavioral ecology and language evolution, using recent advances in multi-agent reinforcement learning.
Over the past decade, the progress in the field of curiosity-driven learning generates a lot of hope, especially with regard to a major challenge, namely the inter-individual variability of developmental trajectories of learning, which is particularly critical during childhood and aging or in conditions of cognitive disorders. With the societal purpose of tackling of social inegalities, FLOWERS deals to move forward this new research avenue by exploring the changes of states of curiosity across lifespan and across neurodevelopemental conditions (neurotypical vs. learning disabilities) while designing new educational or rehabilitative technologies for curiosity-driven learning. The information gaps or learning progress, and their awareness are the core mechanisms of this part of research program due to high value as brain fuel by which the individual's internal intrinsic state of motivation is maintained and leads him/her to pursue his/her cognitive efforts for acquisitions /rehabilitations. Accordingly, a main challenge is to understand these mechanisms in order to draw up supports for the curiosity-driven learning, and then to embed them into (re)educational technologies. To this end, two-ways of investigations are carried out in real-life setting (school, home, work place etc): 1) the design of curiosity-driven interactive systems for learning and their effectiveness study ; and 2) the automated personnalization of learning programs through new algorithms maximizing learning progress in ITS.
Neuroscience, Developmental Psychology and Cognitive Sciences The computational modelling of life-long learning and development mechanisms achieved in the team centrally targets to contribute to our understanding of the processes of sensorimotor, cognitive and social development in humans. In particular, it provides a methodological basis to analyze the dynamics of the interaction across learning and inference processes, embodiment and the social environment, allowing to formalize precise hypotheses and later on test them in experimental paradigms with animals and humans. A paradigmatic example of this activity is the Neurocuriosity project achieved in collaboration with the cognitive neuroscience lab of Jacqueline Gottlieb, where theoretical models of the mechanisms of information seeking, active learning and spontaneous exploration have been developed in coordination with experimental evidence and investigation, see https://
Personal and lifelong learning assistive agents Many indicators show that the arrival of personal assistive agents in everyday life, ranging from digital assistants to robots, will be a major fact of the 21st century. These agents will range from purely entertainment or educative applications to social companions that many argue will be of crucial help in our society. Yet, to realize this vision, important obstacles need to be overcome: these agents will have to evolve in unpredictable environments and learn new skills in a lifelong manner while interacting with non-engineer humans, which is out of reach of current technology. In this context, the refoundation of intelligent systems that developmental AI is exploring opens potentially novel horizons to solve these problems. In particular, this application domain requires advances in artificial intelligence that go beyond the current state-of-the-art in fields like deep learning. Currently these techniques require tremendous amounts of data in order to function properly, and they are severely limited in terms of incremental and transfer learning. One of our goals is to drastically reduce the amount of data required in order for this very potent field to work when humans are in-the-loop. We try to achieve this by making neural networks aware of their knowledge, i.e. we introduce the concept of uncertainty, and use it as part of intrinsically motivated multitask learning architectures, and combined with techniques of learning by imitation.
Educational technologies that foster curiosity-driven and personalized learning. Optimal teaching and efficient teaching/learning environments can be applied to aid teaching in schools aiming both at increase the achievement levels and the reduce time needed. From a practical perspective, improved models could be saving millions of hours of students' time (and effort) in learning. These models should also predict the achievement levels of students in order to influence teaching practices. The challenges of the school of the 21st century, and in particular to produce conditions for active learning that are personalized to the student's motivations, are challenges shared with other applied fields. Special education for children with special needs, such as learning disabilities, has long recognized the difficulty of personalizing contents and pedagogies due to the great variability between and within medical conditions. More remotely, but not so much, cognitive rehabilitative carers are facing the same challenges where today they propose standardized cognitive training or rehabilitation programs but for which the benefits are modest (some individuals respond to the programs, others respond little or not at all), as they are highly subject to inter- and intra-individual variability. The curiosity-driven technologies for learning and STIs could be a promising avenue to address these issues that are common to (mainstream and specialized)education and cognitive rehabilitation.
Automated discovery in science. Machine learning algorithms integrating intrinsically-motivated goal exploration processes (IMGEPs) with flexible modular representation learning are very promising directions to help human scientists discover novel structures in complex dynamical systems, in fields ranging from biology to physics. The automated discovery project lead by the FLOWERS team aims to boost the efficiency of these algorithms for enabling scientist to better understand the space of dynamics of bio-physical systems, that could include systems related to the design of new materials or new drugs with applications ranging from regenerative medicine to unraveling the chemical origins of life. As an example, Grizou et al. 96 recently showed how IMGEPs can be used to automate chemistry experiments addressing fundamental questions related to the origins of life (how oil droplets may self-organize into protocellular structures), leading to new insights about oil droplet chemistry. Such methods can be applied to a large range of complex systems in order to map the possible self-organized structures. The automated discovery project is intended to be interdisciplinary and to involve potentially non-expert end-users from a variety of domains. In this regard, we are currently collaborating with Poietis (a bio-printing company) and Bert Chan (an independant researcher in artificial life) to deploy our algorithms. To encourage the adoption of our algorithms by a wider community, we are also working on an interactive software which aims to provide tools to easily use the automated exploration algorithms (e.g. curiosity-driven) in various systems.
Human-Robot Collaboration. Robots play a vital role for industry and ensure the efficient and competitive production of a wide range of goods. They replace humans in many tasks which otherwise would be too difficult, too dangerous, or too expensive to perform. However, the new needs and desires of the society call for manufacturing system centered around personalized products and small series productions. Human-robot collaboration could widen the use of robot in this new situations if robots become cheaper, easier to program and safe to interact with. The most relevant systems for such applications would follow an expert worker and works with (some) autonomy, but being always under supervision of the human and acts based on its task models.
Environment perception in intelligent vehicles. When working in simulated traffic environments, elements of FLOWERS research can be applied to the autonomous acquisition of increasingly abstract representations of both traffic objects and traffic scenes. In particular, the object classes of vehicles and pedestrians are if interest when considering detection tasks in safety systems, as well as scene categories (”scene context”) that have a strong impact on the occurrence of these object classes. As already indicated by several investigations in the field, results from present-day simulation technology can be transferred to the real world with little impact on performance. Therefore, applications of FLOWERS research that is suitably verified by real-world benchmarks has direct applicability in safety-system products for intelligent vehicles.
AI is a field of research that currently requires a lot of computational resources, which is a challenge as these resources have an environmental cost. In the team we try to address this challenge in two ways:
Our research activities are organized along two fundamental research axis (models of human learning and algorithms for developmental machine learning) and one application research axis (involving multiple domains of application, see the Application Domains section). This entails different dimensions of potential societal impact:
The team made major progress in developping the new application domain of automated discovery in the sciences. We formalized this new research area, and introduced proof-of-concept results showing how intrinsically motivated goal exploration algorithms can be used as a tool to explore, map and learn to represent a diversity of self-organized patterns in complex dynamical systems. This opens stimulating perspectives in domains ranging from biology to chemistry and physiscs.
This work was presented in two papers accepted for oral presentation (< 1.5 % acceptance rate) at Neurips and ICLR 2020 conference. The work was achieved by Mayalen Etcheverry
(CIFRE PhD with the Poïetis company, for ICLR and Neurips) and Chris Reinke (Postdoc, for ICLR), and co-supervised
by C. Moulin-Frier (Neurips) and PY Oudeyer (ICLR and Neurips). See 38 (ICLR paper)
and 35, as well as the blog post https://
The team made major advances in developmental machine learning, introducing techniques enabling autonomous agents to use language as a cognitive tool to imagine goals in intrinsically motivated exploration. This enables new forms of creative exploration, where agents can imagine goals that are outside the distribution of goals known so far. This approach, conceptually rooted in developmental psychology ideas from Vygotsky also leverages modular deep learning techniques enabling agents to generalize its understanding of new sentences. This work was published at Neurips 43. This work was published at Neurips 2020 43 First authors were Cécric Colas, Tristan Karch, Nicolas Lair, with co-supervision from PY Oudeyer, PF. Dominey, C. Moulin-Frier.
Together with the edTech industrial consortium Adaptiv'Maths (https://
PY Oudeyer was awarded an individual ANR Chair in Artificial Intelligence, and elected as Distinguised speaker of the IEEE Computational Ingelligence Society. C Moulin-Frier obtained an ANR JCJC grant, an Inria Exploratory Action and an Inria Cordi PhD grant in 2020 (see 10 for detail).
An important challenge in developmental robotics is how robots can be intrinsically motivated to learn efficiently parametrized policies to solve parametrized multi-task reinforcement learning problems, i.e. learn the mappings between the actions and the problem they solve, or sensory effects they produce. This can be a robot learning how arm movements make physical objects move, or how movements of a virtual vocal tract modulates vocalization sounds. The way the robot will collects its own sensorimotor experience have a strong impact on learning efficiency because for most robotic systems the involved spaces are high dimensional, the mapping between them is non-linear and redundant, and there is limited time allowed for learning. If robots explore the world in an unorganized manner, e.g. randomly, learning algorithms will be often ineffective because very sparse data points will be collected. Data are precious due to the high dimensionality and the limited time, whereas data are not equally useful due to non-linearity and redundancy. This is why learning has to be guided using efficient exploration strategies, allowing the robot to actively drive its own interaction with the environment in order to gather maximally informative data to optimize the parametrized policies. In the recent year, work in developmental learning has explored various families of algorithmic principles which allow the efficient guiding of learning and exploration.
Explauto is a framework developed to study, model and simulate curiosity-driven learning and exploration in real and simulated robotic agents. Explauto’s scientific roots trace back from Intelligent Adaptive Curiosity algorithmic architecture 131, which has been extended to a more general family of autonomous exploration architectures by 70 and recently expressed as a compact and unified formalism 123. The library is detailed in 125. In Explauto, interest models are implementing the strategies of active selection of particular problems / goals in a parametrized multi-task reinforcement learning setup to efficiently learn parametrized policies. The agent can have different available strategies, parametrized problems, models, sources of information, or learning mechanisms (for instance imitate by mimicking vs by emulation, or asking help to one teacher or to another), and chooses between them in order to optimize learning (a processus called strategic learning 128). Given a set of parametrized problems, a particular exploration strategy is to randomly draw goals/ RL problems to solve in the motor or problem space. More efficient strategies are based on the active choice of learning experiments that maximize learning progress using bandit algorithms, e.g. maximizing improvement of predictions or of competences to solve RL problems 131. This automatically drives the system to explore and learn first easy skills, and then explore skills of progressively increasing complexity. Both random and learning progress strategies can act either on the motor or on the problem space, resulting in motor babbling or goal babbling strategies.
This library provides high-level API for an easy definition of:
The library comes with several built-in environments. Two of them corresponds to simulated environments: a multi-DoF arm acting on a 2D plan, and an under-actuated torque-controlled pendulum. The third one allows to control real robots based on Dynamixel actuators using the Pypot library. Learning parametrized policies involves machine learning algorithms, which are typically regression algorithms to learn forward models, from motor controllers to sensory effects, and optimization algorithms to learn inverse models, from sensory effects, or problems, to the motor programs allowing to reach them. We call these sensorimotor learning algorithms sensorimotor models. The library comes with several built-in sensorimotor models: simple nearest-neighbor look-up, non-parametric models combining classical regressions and optimization algorithms, online mixtures of Gaussians, and discrete Lidstone distributions. Explauto sensorimotor models are online learning algorithms, i.e. they are trained iteratively during the interaction of the robot in theenvironment in which it evolves. Explauto provides also a unified interface to define exploration strategies using the InterestModel class. The library comes with two built-in interest models: random sampling as well as sampling maximizing the learning progress in forward or inverse predictions.
Explauto environments now handle actions depending on a current context, as for instance in an environment where a robotic arm is trying to catch a ball: the arm trajectories will depend on the current position of the ball (context). Also, if the dynamic of the environment is changing over time, a new sensorimotor model (Non-Stationary Nearest Neighbor) is able to cope with those changes by taking more into account recent experiences. Those new features are explained in Jupyter notebooks.
This library has been used in many experiments including:
Explauto is crossed-platform and has been tested on Linux, Windows and Mac OS. It has been released under the GPLv3 license.
The Poppy Project team develops open-source 3D printed robots platforms based on robust, flexible, easy-to-use and reproduce hardware and software. In particular, the use of 3D printing and rapid prototyping technologies is a central aspect of this project, and makes it easy and fast not only to reproduce the platform, but also to explore morphological variants. Poppy targets three domains of use: science, education and art.
In the Poppy project we are working on the Poppy System which is a new modular and open-source robotic architecture. It is designed to help people create and build custom robots. It permits, in a similar approach as Lego, building robots or smart objects using standardized elements.
Poppy System is a unified system in which essential robotic components (actuators, sensors...) are independent modules connected with other modules through standardized interfaces:
Our ambition is to create an ecosystem around this system so communities can develop custom modules, following the Poppy System standards, which can be compatible with all other Poppy robots.
Poppy Ergo Jr is an open hardware robot developed by the Poppy Project to explore the use of robots in classrooms for learning robotic and computer science.
It is available as a 6 or 4 degrees of freedom arm designed to be both expressive and low-cost. This is achieved by the use of FDM 3D printing and low cost Robotis XL-320 actuators. A Raspberry Pi camera is attached to the robot so it can detect object, faces or QR codes.
The Ergo Jr is controlled by the Pypot library and runs on a Raspberry pi 2 or 3 board. Communication between the Raspberry Pi and the actuators is made possible by the Pixl board we have designed.
The Poppy Ergo Jr robot has several 3D printed tools extending its capabilities. There are currently the lampshade, the gripper and a pen holder.
With the release of a new Raspberry Pi board early 2016, the Poppy Ergo Jr disk image was updated to support Raspberry Pi 2 and 3 boards. The disk image can be used seamlessly with a board or the other.
Until recently, curiosity driven exploration algorithms were based on classic learning algorithms, unable to handle large dimensional problems (see explauto). Recent advances in the field of deep learning offer new algorithms able to handle such situations.
Deep explauto is an experimental library, containing reference implementations of curiosity driven exploration algorithms. Given the experimental aspect of exploration algorithms, and the low maturity of the libraries and algorithms using deep learning, proposing black-box implementations of those algorithms, enabling a blind use of those, seem unrealistic.
Nevertheless, in order to quickly launch new experiments, this library offers an set of objects, functions and examples, allowing to kickstart new experiments.
Ochestra is a set of tools meant to help in performing experimental campaigns in computer science. It provides you with simple tools to:
+ Organize a manual experimental workflow, leveraging git and lfs through a simple interface. + Collaborate with other peoples on a single experimental campaign. + Execute pieces of code on remote hosts such as clusters or clouds, in one line. + Automate the execution of batches of experiments and the presentation of the results through a clean web ui.
A lot of advanced tools exists on the net to handle similar situations. Most of them target very complicated workflows, e.g. DAGs of tasks. Those tools are very powerful but lack the simplicity needed by newcomers. Here, we propose a limited but very simple tool to handle one of the most common situation of experimental campaigns: the repeated execution of an experiment on variations of parameters.
In particular, we include three tools: + expegit: a tool to organize your experimental campaign results in a git repository using git-lfs (large file storage). + runaway: a tool to execute code on distant hosts parameterized with easy to use file templates. + orchestra: a tool to automate the use of the two previous tools on large campaigns.
Codebase from our CoRL2019 paper https://arxiv.org/abs/1910.07224
This github repository provides implementations for the following teacher algorithms: - Absolute Learning Progress-Gaussian Mixture Model (ALP-GMM), our proposed teacher algorithm - Robust Intelligent Adaptive Curiosity (RIAC), from Baranes and Oudeyer, R-IAC: robust intrinsically motivated exploration and active learning. - Covar-GMM, from Moulin-Frier et al., Self-organization of early vocal development in infants and machines: The role of intrinsic motivation.
Codebase from our arxiv paper https://arxiv.org/abs/2011.08463
This github repository provides implementations for AGAIN (Alp-Gmm and Inferred Progress Niches), our proposed Meta automatic curriculum learning teacher algorithm.
This project involves a collaboration between the Flowers team and the Cognitive Neuroscience Lab of J. Gottlieb at Columbia Univ. (NY, US), on the understanding and computational modeling of mechanisms of curiosity, attention and active intrinsically motivated exploration in humans.
It is organized around the study of the hypothesis that subjective meta-cognitive evaluation of information gain (or control gain or learning progress) could generate intrinsic reward in the brain (living or artificial), driving attention and exploration independently from material rewards, and allowing for autonomous lifelong acquisition of open repertoires of skills. The project combines expertise about attention and exploration in the brain and a strong methodological framework for conducting experimentations with monkeys, human adults and children together with computational modeling of curiosity/intrinsic motivation and learning.
Such a collaboration paves the way towards a central objective, which is now a central strategic objective of the Flowers team: designing and conducting experiments in animals and humans informed by computational/mathematical theories of information seeking, and allowing to test the predictions of these computational theories.
Curiosity can be understood as a family of mechanisms that evolved to allow agents to maximize their knowledge (or their control) of the useful properties of the world - i.e., the regularities that exist in the world - using active, targeted investigations. In other words, we view curiosity as a decision process that maximizes learning/competence progress (rather than minimizing uncertainty) and assigns value ("interest") to competing tasks based on their epistemic qualities - i.e., their estimated potential allow discovery and learning about the structure of the world.
Because a curiosity-based system acts in conditions of extreme uncertainty (when the distributions of events may be entirely unknown) there is in general no optimal solution to the question of which exploratory action to take 115, 133, 141. Therefore we hypothesize that, rather than using a single optimization process as it has been the case in most previous theoretical work 95, curiosity is comprised of a family of mechanisms that include simple heuristics related to novelty/surprise and measures of learning progress over longer time scales 13071, 122. These different components are related to the subject's epistemic state (knowledge and beliefs) and may be integrated with fluctuating weights that vary according to the task context. Our aim is to quantitatively characterize this dynamic, multi-dimensional system in a computational framework based on models of intrinsically motivated exploration and learning.
Because of its reliance on epistemic currencies, curiosity is also very likely to be sensitive to individual differences in personality and cognitive functions. Humans show well-documented individual differences in curiosity and exploratory drives 113, 140, and rats show individual variation in learning styles and novelty seeking behaviors 90, but the basis of these differences is not understood. We postulate that an important component of this variation is related to differences in working memory capacity and executive control which, by affecting the encoding and retention of information, will impact the individual's assessment of learning, novelty and surprise and ultimately, the value they place on these factors 135, 149, 66, 153. To start understanding these relationships, about which nothing is known, we will search for correlations between curiosity and measures of working memory and executive control in the population of children we test in our tasks, analyzed from the point of view of a computational models of the underlying mechanisms.
A final premise guiding our research is that essential elements of curiosity are shared by humans and non-human primates. Human beings have a superior capacity for abstract reasoning and building causal models, which is a prerequisite for sophisticated forms of curiosity such as scientific research. However, if the task is adequately simplified, essential elements of curiosity are also found in monkeys 113, 107 and, with adequate characterization, this species can become a useful model system for understanding the neurophysiological mechanisms.
Our studies have several highly innovative aspects, both with respect to curiosity and to the traditional research field of each member team.
We provide empirical evidence that humans are sensitive to variation learniing progress (LP) by means of a novel experimental paradigm and computational modeling. We showed that while humans rely on competence information to avoid easy tasks, models that include an LP component provide the best fit to task selection data. These results provide a new bridge between research on artificial and biological curiosity, reveal strategies that are used by humans but have not been considered in computational research, and provide new tools for probing how humans become intrinsically motivated to learn and acquire interests and skills on extended time scales. The results were submitted to the journal Nature Communications and are currently under revision.
Participants (
Our key questions were (1) how people self-organize their exploration over a set of activities of variable difficulty, and (2) whether they spontaneously adopt learning maximization objectives when they do not receive explicit instructions. To examine these questions, we manipulated the difficulty of the available activities as a within-participant variable, and the instructions participants received as an across-participant variable. Difficulty was controlled by the complexity of the categorization rule governing the food preferences in each activity. In the easiest activity (A1), individual monster-family members differed in only one feature and that feature governed their food preference (e.g., a short octopus liked steak and a tall octopus liked broccoli; 1-dimensional categorization). In the next easiest level (A2), family members varied along two features but only one feature determined preference (1-dimensional with an irrelevant feature). In the most difficult learnable activity (A3) food preferences were determined by a conjunction of 2 variable features (2-dimensional categorization). Finally, the 4th activity (A4) was random and unlearnable: individual monsters had two variable features, but their food preferences were assigned randomly each time a new monster was sampled.
Learning objectives were manipulated across two randomly selected participant groups. Participants in the “external goal” group (EG;
As shown in Fig. 5, our difficulty manipulation worked as expected in both groups. Average accuracy on the more difficult activites was lower than on easier activities.
To investigate whether LP played a role in self-determined study curriculum we fit the participants’ activity choices with a simple softmax choice model in which the utility of an activity was a linear combination of PC and LP:
PC and LP were dynamically evaluated for each activity
The bivariate form of the model that included both LP and PC (eq. 1) provided a superior fit to the data (average AIC score of
The fitted bivariate models were successful at qualitatively reproducing time-allocation patterns across learning activities and groups of participants (Fig. 6b). As shown in Fig. 6b, the fitted models captured the behavioral tendencies of individual subgroups in our data.
Next, we examined the fitted model coefficients from the bivariate models. The normalized coefficients
These differences are consistent with computational studies suggesting that sensitivities to PC and LP play distinct roles. A sensitivity to PC may motivate people to learn by steering them away from overly easy activities, while a sensitivity to LP may steer learners away from overly difficult or impossible activities. To examine this hypothesis, we analyzed how the two coefficients correlated with individual preferences for challenging over easier activities when the more challenging activity was, respectively, learnable or unlearnable. Figure 6 shows that while PC helped participant in engaging with more challening activities, LP seemed to have this effect only in the context of A3 vs A1, but not A4 vs A3, suggesting that LP does not lead people to the unlearnability trap as PC does.
Our previous work left open the question of what is an optimal time period for which humans can accurately estimate their progress in learning a skill that requires a prolonged exposure or practice time to be mastered. Thus, this ongoing project aims to investigate veridical time scales for estimating progress in competence in humans. For this, we designed a behavioral sensorimotor task that requires some extended amount of time to be learned. The task is presented in the form of a video game, similar to the arcade game Lunar Lander, where the objective is to control a spacecraft and safely land it on the surface. As participants practice the task, we will record their performance (e.g. time to complete a single trial, success rate etc.), which will allow us to examine the accuracy of their subjective judgments about their performance. The subjective judgments of performance, along with different control measures, will be obtained via verbal questionnaires.
Crucially, we control the time frames in which we sollicit the participant's subjective judgments of learning in order examine the link between how much time has passed (5 minutes, 2 days, 5 days) and the accuracy of participants' beliefs about their ongoing progress. We designed a tool which can be used to administer this experiment remotely via the internet. Participants will be asked to log in and complete part of the task on 3 different days, each day consisting the same 3 phases: (1) 5 minute of practice (2) questionnaire (3) optional practice. The last, optional practice phase can be engaged by participants but is not required as part of the experiment. This will let us measure behaviorally, the level of interest / enthusiasm participants have for playing the game, which is hypothesized to be a function of the learning progress they experience.
Building autonomous machines that can explore open-ended environments, discover possible interactions and autonomously build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by autonomous and intrinsically motivated learning agents that can generate, select and learn to solve their own problems. In recent years, we have seen a convergence of developmental approaches, and developmental robotics in particular, with deep reinforcement learning (RL) methods, forming the new domain of developmental machine learning. Within this new domain, we review here a set of methods where deep RL algorithms are trained to tackle the developmental robotics problem of the autonomous acquisition of open-ended repertoires of skills. Intrinsically motivated goal-conditioned RL algorithms train agents to learn to represent, generate and pursue their own goals. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions, which results in new challenges compared to traditional RL algorithms designed to tackle pre-defined sets of goals using external reward signals. This project 58 proposes a typology of these methods at the intersection of deep RL and developmental approaches, surveys recent approaches and discusses future avenues.
Finding algorithms that allow agents to discover a wide variety of skills efficiently and autonomously, remains a challenge of Artificial Intelligence. Intrinsically Motivated Goal Exploration Processes (IMGEPs) have been shown to enable real world robots to learn repertoires of policies producing a wide range of diverse effects. They work by enabling agents to autonomously sample goals that they then try to achieve. In practice, this strategy leads to an efficient exploration of complex environments with high-dimensional continuous actions. Until recently, it was necessary to provide the agents with an engineered goal space containing relevant features of the environment. In this article we show that the goal space can be learned using deep representation learning algorithms, effectively reducing the burden of designing goal spaces. Our results pave the way to autonomous learning agents that are able to autonomously build a representation of the world and use this representation to explore the world efficiently. We present experiments in two environments using population-based IMGEPs. The first experiments are performed on a simple, yet challenging, simulated environment. Then, another set of experiments tests the applicability of those principles on a real-world robotic setup, where a 6-joint robotic arm learns to manipulate a ball inside an arena, by choosing goals in a space learned from its past experience. This work was published in 28
Autonomous agents using novelty based goal exploration are often efficient in environments that require exploration. However, they get attracted to various forms of distracting unlearnable regions. To solve this problem, Absolute Learning Progress (ALP) has been used in reinforcement learning agents with predefined goal features and access to expert knowledge. This work extends those concepts to unsupervised image-based goal exploration.
We present the GRIMGEP framework: it provides a learned robust goal sampling prior that can be used on top of current state-of-the-art novelty seeking goal exploration approaches, enabling them to ignore noisy distracting regions while searching for novelty in the learnable regions. It clusters the goal space and estimates ALP for each cluster. These ALP estimates can then be used to detect the distracting regions, and build a prior that enables further goal sampling mechanisms to ignore them.
We construct an image based environment with distractors, on which we show that wrapping current state-of-the-art goal exploration algorithms with our framework allows them to concentrate on interesting regions of the environment and drastically improve performances.
This work is available as a preprint in 60 and the source code is available at https://
In this project, we investigate how autonomous multi-goal reinforcement learning agents can use language as a cognitive tool in order to creatively explore their environment and grow repertoires of skills. We follow a developmental approach inspired by how children learn to manipulate language, using it as a way to represent goals and to make plans in their heads.
We develop an algorithm called IMAGINE 43 enabling an intrinsically motivated agent to build a repertoire of skills only from natural language descriptions given by a Social Partner. In our setup, the agent starts without knowing any potential goal and acts randomly. As it reaches outcomes that are meaningful for the social partner, the social partner provides descriptions of the scene in natural language. The agent then converts these natural descriptions into targetable goals and learns to reach them.
This new learning algorithm offers several benefits over previous intrinsically motivated multi-goal reinforcement learning agents that do not use language to describe goals.
First, using linguistic descriptions as sole supervision helps get rid of the need to define hand-crafted reward functions for each of the reachable goals in the environment. In curious, for instance, the agent needed to have access to the description of each of the goal types as well as their associated reward functions in order to reach them. In IMAGINE, the agent builds its own internal reward function mapping natural language descriptions to binary rewards and uses this signal to train a goal-conditioned policy.
Second, using language to represent goals enables the agent to leverage language compositionality so as to imagine new goals, assembling pieces of descriptions communicated by the social partner in order to form new targetable goals. For instance, consider an agent that received the following descriptions: “Grasp red cat”, “Grow red cat” and “Grasp red plant”. This agent can imagine the goal “Grow red plant” and use it as a target in order to discover new outcomes in its environment. We call this mechanism goal imagination. We argue that goal imagination is key to be able to make creative discoveries because the corresponding targeted behaviors are out of the distribution of the outcomes communicated by the social partner. This sort of out-of-distribution goal generation can only be achieved with goals represented as language.
We carried out experiments in order to evaluate the benefits from goal imagination in intrinsically motivated learning. Experiments are split into two phases. In the first one, the agent interacts with the social partners, collects descriptions of goals and stores them in a set of known goal descriptions. The agent uses these descriptions paired with its observations in order to learn an internal reward function that detects when the goal represented by the descriptions are achieved in a given scene. Once this internal reward function is obtained, the agent uses its output (the reward signal) in order to train a goal-conditioned policy enabling it to reach any goal.
In the second phase, the social partner disappears and the agent starts imagining new goals by composing the descriptions stored in the set of known goals. The agent then targets these new goals and by doing so, discovers new interactions. This creative goal exploration process can only be efficient if imagined goal descriptions have a sufficient probability to be meaningful in the environment. As a result, we leveraged the construction grammar framework used to model child language acquisition with discovery of word equivalence classes in order to make sure that imagined goals follow the same construction rules as the descriptions communicated by the social partner. It is also important to note, that in order for goal imagination to work, the internal reward function trained from the social partner’s description must generalize. In other words, it should be able to detect if imagined goals are reached without receiving any new description from the social partner. To this end, we developed an object-factored learning architecture coupled with attention mechanisms 47 that facilitates generalization to new descriptions.
Finally, we measured the success rate of agents on a wide set of different skills and observed that agents that do not imagine goals (that stop at phase 1) master a smaller set of skills than agents that do imagine goals.
We are interested in the autonomous acquisition of repertoires of skills. Language-conditioned reinforcement learning (lc-rl) approaches are great tools in this quest, as they allow us to express abstract goals as sets of constraints on the states. However, most lc-rl agents are not autonomous and cannot learn without external instructions and feedback. Besides, their direct language condition cannot account for the goal-directed behavior of pre-verbal infants and strongly limits the expression of behavioral diversity for a given language input. To resolve these issues, we propose a new conceptual approach to language-conditioned rl: the Language-Goal-Behavior architecture (lgb). lgb decouples skill learning and language grounding via an intermediate semantic representation of the world—see Figure 13. To showcase the properties of lgb, we present a specific implementation called decstr. decstr is an intrinsically motivated learning agent endowed with an innate semantic representation describing spatial relations between physical objects–see Figure 14. In a first stage (gb), it freely explores its environment and targets self-generated semantic configurations. In a second stage (
In open-ended continuous environments, robots need to learn multiple parameterised control tasks in hierarchical reinforcement learning. We hypothesise that the most complex tasks can be learned more easily by transferring knowledge from simpler tasks, and faster by adapting the complexity of the actions to the task. We propose a task-oriented representation of complex actions, called procedures, to learn online task relationships and unbounded sequences of action primitives to control the different observables of the environment. Combining both goal-babbling with imitation learning, and active learning with transfer of knowledge based on intrinsic motivation, our algorithm self-organises its learning process. It chooses at any given time a task to focus on; and what, how, when and from whom to transfer knowledge. We show with a simulation and a real industrial robot arm, in cross-task and cross-learner transfer settings, that task composition is key to tackle highly complex tasks. Task decomposition is also efficiently transferred across different embodied learners and by active imitation, where the robot requests just a small amount of demonstrations and the adequate type of information. The robot learns and exploits task dependencies so as to learn tasks of every complexity.
This work lead to a publication in MDPI Applied Siences 45.
In deep reinforcement learning, especially in approaches operating in symbolic observation spaces (the inputs are not images but the list of all object's x-y positions for instance), it is common to feed the agent's networks with a vector of the concatenation of all the symbolic features. However, in practice there is a lot of redundant structure in this observation space: if the first object has a feature describing it as "red" or if the second object has a feature describing it as "red", there should be a prior (or inductive bias) in the architecture reflecting the fact that these two situations should be processed in the same way. All objects share the same semantics no matter in what order they are listed. We can call this the object-centered prior. In addition to that, for acting on collections of objects, an agent often has to process information about the relations between objects. We can call this the relational prior (or inductive bias). A detailed discussion of these inductive biases can be found in 73.
Since the structure "objects + relations" is naturally present in the world, a good idea is to implement it into the neural networks we are training. Set structures can be used for representing collections of objects, and the Deep Set architecture is well-suited for learning on sets. Graph structures can be used for representing collections of objects and their relations; the Graph Neural Network (GNN) family is well-suited for learning on graphs. Additionnally, we should observe differences between performance and sample efficiency of architectures having only the object-centered prior versus the ones that have the object-centered and relational priors in tasks that require processing of relational information.
We have tested this hypothesis in the case of learning to recognize spatial configurations of symbolic objects. For this purpose, we have created a benchmark dataset called SpatialSim that defines two tasks. The first task, called Identification, is learning to recognize a reference configuration of objects (up to an affine transformation) from a scene with the same objects but with their positions randomly reshuffled. The second task, called Comparison, consists in comparing two different configurations of objects and deciding if they are the same (up to an affine transformation).
In this context, we have trained architectures implementing increasing levels of relational computation: Deep Sets, Recurrent Deep Sets and Message-Passing GNNs. We have observed that the models with more relational computation perform better, especially in the Comparison task where Deep Set performance is very poor. This suggests that relational models are crucial for learning to compare configurations of objects.
This work has been presented as a spotlight talk at the Bridge Between Perception and Reasoning, Graph Neural Networks and Beyond workshop at ICLR 2020 63.
The previous work was concerned with symbolic objects described by their features such as position, orientation, etc. In a realistic setting we need to be able to learn to extract these object representations directly from raw images in an unsupervised representation learning scheme, and in a disentangled manner, such that each object is represented by a unique vector, and that each of that vector's coordinates represents a unique factor of variation (such as x or y position, color, etc). In the best case, this would recover the symbolic representations such as the ones used in the approach above.
Two architectures for object-centered unsupervised representation learning have been investigated: MONet 79 (an object-based variational autoencoder) and Contrastive-Structured World Models 108 (an architecture learning to extract objects from images by learning a world model expressed as an interaction graph). Integrating these approaches (along with mechanisms for object permanence) into an intrinsically motivated deep RL setting is still ongoing work.
The impact of object-centered architectures in a deep RL setting has also been investigated. We have benchmarked their importance in the language-imagination deep rl setting given in 8.2.4. We have observed dramatic improvements in sample efficiency in this setting when we use Deep Sets as opposed to flat, unstructured architectures (such as regular Multi-Layer Perceptrons).
In addition to that, we observe increased generalization performance in this setting (see Figures 15 and 16), suggesting that the bias that all objects should be represented and processed in the same way (and the weight-sharing that is implied by this bias in the neural networks) is helpful for transferring skills across objects.
These object-based architectures are robust to the number of objects, contrary to their flat counterparts. Additionally, architectures that present biases for encoding relations between objects demonstrate increased performance in tasks that require interaction between objects, such as grasping objects that are identified by their position relative to another object.
This work was presented at the Beyond Tabula Rasa in RL ICLR 2020 workshop 47.
In this work we considered the problem of how a teacher algorithm can enable an unknown Deep Reinforcement Learning (DRL) student to become good at a skill over a wide range of diverse environments. To do so, we studied how a teacher algorithm can learn to generate a learning curriculum, whereby it sequentially samples parameters controlling a stochastic procedural generation of environments. Because it does not initially know the capacities of its student, a key challenge for the teacher is to discover which environments are easy, difficult or unlearnable, and in what order to propose them to maximize the efficiency of learning over the learnable ones. To achieve this, this problem is transformed into a surrogate continuous bandit problem where the teacher samples environments in order to maximize absolute learning progress of its student. We presented ALP-GMM (see figure 17), a new algorithm modeling absolute learning progress with Gaussian mixture models. We also adapted existing algorithms and provided a complete study in the context of DRL. Using parameterized variants of the BipedalWalker environment, we studied their efficiency to personalize a learning curriculum for different learners (embodiments), their robustness to the ratio of learnable/unlearnable environments, and their scalability to non-linear and high-dimensional parameter spaces. Videos and code are available at https://
Overall, this work demonstrated that LP-based teacher algorithms could successfully guide DRL agents to learn in difficult continuously parameterized environments with irrelevant dimensions and large proportions of unfeasible tasks. With no prior knowledge of its student's abilities and only loose boundaries on the task space, ALP-GMM, our proposed teacher, consistently outperformed random heuristics and occasionally even expert-designed curricula (see figure 18). This work was presented at CoRL 2019 138.
ALP-GMM, which is conceptually simple and has very few crucial hyperparameters, opens-up exciting perspectives inside and outside DRL for curriculum learning problems. Within DRL, it could be applied to previous work on autonomous goal exploration through incremental building of goal spaces 111. In this case several ALP-GMM instances could scaffold the learning agent in each of its autonomously discovered goal spaces. Another domain of applicability is assisted education, for which current state of the art relies heavily on expert knowledge 83 and is mostly applied to discrete task sets.
In this work we identified that a major challenge in the Deep RL (DRL) community is to train agents able to generalize their control policy over situations never seen in training. Training on diverse tasks has been identified as a key ingredient for good generalization, which pushed researchers towards using rich procedural task generation systems controlled through complex continuous parameter spaces. In such complex task spaces, it is essential to rely on some form of Automatic Curriculum Learning (ACL) to adapt the task sampling distribution to a given learning agent, instead of randomly sampling tasks, as many could end up being either trivial or unfeasible. Since it is hard to get prior knowledge on such task spaces, many ACL algorithms explore the task space to detect progress niches over time, a costly tabula-rasa process that needs to be performed for each new learning agents, although they might have similarities in their capabilities profiles.
To address this limitation, we introduced the concept of Meta-ACL (see fig. 19, and formalized it in the context of black-box RL learners, i.e. algorithms seeking to generalize curriculum generation to an (unknown) distribution of learners. We then presented AGAIN (see fig. 10), a first instantiation of Meta-ACL, and showcased its benefits for curriculum generation over classical ACL in multiple simulated environments including procedurally generated parkour environments with learners of varying morphologies. Videos and code are available at https://sites.google.com/view/meta-acl
This work is available as preprint 62 and will be submitted to ICML 2021. In future work, AGAIN could be improved by using adaptive approaches to build compact pre-test sets, e.g. using decision tree based test pruning methods, or by combining curriculum priors from multiple previously trained learners. While AGAIN is built on top of an existing ACL algorithm, developing an end-to-end Meta-ACL algorithm that generates curricula using a DRL teacher policy trained across multiple students is also a promising line of work to follow.
Additionally, this work opens-up exciting new perspectives in transferring Meta-ACL methods to educational data-mining, e.g. in MOOC scenarios, given a previously trained pilot classroom, one could use Meta-ACL to infer adaptive curricula for new students.
Training autonomous agents able to generalize to a multiplicity of environments/tasks is a key desiderata in current Deep Reinforcement Learning (DRL) research.
In parallel to searching for DRL architectures able to learn sets of tasks, many works on Automatic Curriculum Learning (ACL) studied how to emancipate from the usual random task curriculum and instead use
teacher algorithms
In this work, we identified the following key challenges faced by ACL algorithms:
Based on these, we presented TeachMyAgent (TA), a benchmark of current ACL algorithms including 1) skill-specific unit-tests using variants of a procedural Box2D bipedal walker environment, and 2) a new procedural Parkour environment combining most ACL challenges, making it ideal for global performance assessment.
We then leveraged TeachMyAgent to conduct a comparative study of existing approaches, showcasing the competitiveness of expert-knowledge-free ACL approaches, and showing that our Parkour environment remains an open problem.
In order to propose a comparison of the different ACL methods on each of the challenges introduced above, we leveraged the Stump Tracks environment introduced in 138 to create five experiments, each of them designed to highlight the ability of a teacher in one the first five ACL challenges. Additionnaly, in order to analyse separately the expert knowledge challenge, we performed each of the experiments above in three prior knowledge conditions:
Using these 15 experiments, we compared 7 ACL methods in addition of a baseline teacher sampling tasks uniformly random over the task space. In each experiment, a DRL student is trained for 20 millions steps using an ACL algorithm to set the procedural generation of the environment at every episode. We monitored the DRL student's performances on a pre-defined test set composed of 100 tasks every 500000 steps and reported the average percentage of mastered tasks (i.e. task on which the agent obtained an episodic reward greater than 230). Results are gathered in figure 21 and presented as an order of magnitude of the Random teacher (using 32 seeds per experiment).
While these results highlighted the strenghs and weaknesses of each method (e.g. the inability of ADR to deal with rugged difficulty landscapes or the inertia of GoalGAN in adapting the curriculum to forgetting students), it also showed how competitive expert-knowledge-free methods like ALP-GMM are, even when compared to methods having access to a high amount of prior knowledge.
We then assessed the different ACL methods' performances on our Box2D Parkour track environment (see figure 22) which features most of the previously discussed ACL challenges: 1) most tasks are unfeasible, 2) Before each run, unknown to the teacher, the student's embodiment is uniformly sampled among three morphologies (bipedal walker, fish and chimpanzee), requiring the teacher to adapt curriculum generation to its current student's abilities, and 3) tasks are generated through a CCPN-based PCG, creating a rich task space with rugged difficulty landscape and hardly-definable prior knowledge.
We trained a DRL student for 20 millions steps with 48 different seeds (16 per morphology) and monitored the percentage of mastered tasks as in our Stump Tracks experiments. As few expert knowledge is accessible, our results (figure 23) showed to poor performances from expert-knowledge-depend teachers (e.g. ADR or SPDL). Additionally, overall results awerere quite low, especially for the seeds using our chimpanzee embodiment, where none of the ACL methods managed to master more than of the test set. This thus leaves our Parkour track as an open challenge for future design of ACL methods.
We study perception in the scenario of an embodied agent equipped with first-person sensors and a continuous motor space with multiple degrees of freedom. We consider the commutative properties of action sequences with respect to sensory information perceived by such an embodied agent. We introduce the Sensory Commutativity Probability (SCP) criterion which measures how much an agent’s degree of freedom affects the environment in embodied scenarios. We show how to compute this criterion in different environments, including realistic robotic setups. We empirically illustrate how SCP and the commutative properties of action sequences can be used to learn about objects in the environment and improve sample efficiency in Reinforcement Learning.
Our research was published in the Workshop on Learning in Artificial Open Worlds at ICML20 42 and NeurIPS 2020 Workshop on BabyMind 41.
Transferring as fast as possible the functioning of our brain to artificial intelligence is an ambitious goal that would help advance the state of the art in AI and robotics. It is in this perspective that we propose to start from hypotheses derived from an empirical study in a human-robot interaction and to verify if they are validated in the same way for children as for a basic reinforcement learning algorithm 40. Thus, we check whether receiving help from an expert when solving a simple close-ended task (the Towers of Hanoï) allows to accelerate or not the learning of this task, depending on whether the intervention is canonical or requested by the player. Our experiences have allowed us to conclude that, whether requested or not, a Q-learning algorithm benefits in the same way from expert help as children do.
One of the most ambitious goal in Artificial Intelligence (AI) is the realization
of a so-called Artificial General Intelligence (AGI), i.e. AI that is not
limited to the realization of a predefined set of tasks but is able to generalize
its capabilities to any cognitive task that can be solved by human intelligence.
This is obviously a long-term objective but recent advances in AI have revived
research in this field, with the vast majority of contributions focusing on
.
However, although AGI is fundamentally related to the characteristics of human intelligence, research in this field rarely considers the processes that may
have guided the emergence of complex cognitive capacities during the evolution
of the species. Research in Human Behavioral Ecology (HBE) 76 seeks to understand how the behaviors characterizing human nature can be conceived as
adaptive responses to major changes in the structure of our ecological niche.
However, very little work in AI proposes to study how this long-term environmental dynamics can potentially guide and improve the acquisition of complex behaviors in artificial systems (see however recent contributions 155, including from our research group 138, 24). Moreover, to our knowledge, modern AI methods for learning behaviors in sequential environments have not yet been applied to test hypotheses in HBE (although it has been recently proposed 92).
As a first step in our project, we conducted a targeted yet extensive literature review on HBE, in particular works studying the effect that climate complexity has had on the emergence of adaptability, cooperation and cultural repertoire in human evolution. In parallel, we have reviewed the state-of-the-art in the study of open-ended skill acquisition in, in particular, the AI sub-fields of multi-agent reinforcement and meta reinforcement learning. We have compiled our review in a position paper that summarizes the project's objectives 61.
An important objective at this stage was to justify the proposed exchange of ideas between the two fields by identifying their commonalities in terms of research challenges. In Figure 24, we introduce a conceptual framework that recognizes important ecological components, as well as the feedforward and feedback links that relate them. This figure was presented in our preprint 61.
In our next steps, we plan to work on the lines of improving the state-of-the-art in meta RL and multi-agent RL by leveraging hypotheses from HBE. Simultaneously, in a similar spirit to our group's proposal of using multi-agent RL as a computational tool for studying language development 124, we will employ RL as a computational tool for evaluating HBE hypotheses. In particular, our review has identified the following research challenges:
Kidlearn is a research project studying how machine learning can be applied to intelligent tutoring systems. It aims at developing methodologies and software which adaptively personalize sequences of learning activities to the particularities of each individual student. Our systems aim at proposing to the student the right activity at the right time, maximizing concurrently his learning progress and his motivation. In addition to contributing to the efficiency of learning and motivation, the approach is also made to reduce the time needed to design ITS systems.
We continued to develop an approach to Intelligent Tutoring Systems which adaptively personalizes sequences of learning activities to maximize skills acquired by students, taking into account the limited time and motivational resources. At a given point in time, the system proposes to the students the activity which makes them progress faster. We introduced two algorithms that rely on the empirical estimation of the learning progress,
RiARiT
The system is based on the combination of three approaches. First, it leverages recent models of intrinsically motivated learning by transposing them to active teaching, relying on empirical estimation of learning progress provided by specific activities to particular students. Second, it uses state-of-the-art Multi-Arm Bandit (MAB) techniques to efficiently manage the exploration/exploitation challenge of this optimization process. Third, it leverages expert knowledge to constrain and bootstrap initial exploration of the MAB, while requiring only coarse guidance information of the expert and allowing the system to deal with didactic gaps in its knowledge. The system was evaluated in several large-scale experiments relying on a scenario where 7-8 year old schoolchildren learn how to decompose numbers while manipulating money 83. Systematic experiments were also presented with simulated students.
An experiment was held between mars 2018 and July 2019 in order to test the Kidlearn framework in classrooms in Bordeaux Metropole. 600 students from Bordeaux Metropole participated in the experiment. This study had several goals. The first goal was to evaluate the impact of the Kidlearn framework on motivation and learning compared to an Expert Sequence without machine learning. The second goal was to observe the impact of using learning progress to select exercise types within the ZPDES algorithm compared to a random policy. The third goal was to observe the impact of combining ZPDES with the ability to let children make different kinds of choices during the use of the ITS.
The last goal was to use the psychological and contextual data measures to see if correlation can be observed between the students psychological state evolution, their profile, their motivation and their learning. The different observations showed that generally, algorithms based on ZPDES provided a better learning experience than an expert sequence. In particular, they provide a more motivating and enriching experience to self-determined students. Details of these new results, as well as the overall results of this project, are presented in Benjamin Clément PhD thesis 82 and are currently being processed to be published.
The algorithms developed during the Kidlearn project and Benjamin Clement thesis 82 are being used in an innovation partnership for the development of a pedagogical assistant based on artificial intelligence intended for teachers and students of cycle 2. The algorithms are being written in typescript for the need of the project. The expertise of the team in creating the pedagogical graph and defining the graph parameters used for the algorithms is also a crucial part of the role of the team for the project. One of the main goal of the team here is to transfer technologies developed in the team in a project with the perspective of industrial scaling and see the impact and the feasibility of such scaling.
Few digital interventions targeting numeracy skills have been evaluated with individuals with autism spectrum disorder (ASD) 55120. Yet, some children and adolescents with ASD have learning difficulties and/or a significant academic delay in mathematics. While ITS are successfully developed for typically developed students to personnalize learning curriculum and then to foster the motivation-learning coupling, they are not or fewly proposed today to student with specific needs. The objective of this pilot study is to test the feasibility of a digital intervention using an STI with high school students with ASD and/or intellectual disability. This application (KidLearn) provides calculation training through currency exchange activities, with a dynamic exercise sequence selection algorithm (ZPDES). 24 students with ASD and/or DI enrolled in specialized classrooms were recruited and divided into two groups: 14 students used the KidLearn application, and 10 students received a control application. Pre-post evaluations show that students using KidLearn improved their calaculation performance, and had a higher level of motivation at the end of the intervention than the control group. These results encourage the use of an STI with students with specific needs to teach numeracy skills, but need to be replicated on a larger scale. Suggestions for adjusting the interface and teaching method are suggested to improve the impact of the application on students with autism. (Paper is submitted).
Because of its cross-cutting nature to all cognitive activities such as learning tasks, attention is a hallmark of good cognitive health throughout life and more particularly in the current context of societal crisis of attention. Recent works have shown the great potential of computerized attention training for an example of attention training, with efficient training transfers to other cognitive activities, and this, over a wide spectrum of individuals (children, elderly, individuals with cognitive pathology such as Attention Deficit and Hyperactivity Disorders). Despite this promising result, a major hurdle is challenging: the high inter-individual variability in responding to such interventions. Some individuals are good responders (significant improvement) to the intervention, others respond variably, and finally some respond poorly, not at all, or occasionally. A central limitation of computerized attention training systems is that the training sequences operate in a linear, non-personalized manner: difficulty increases in the same way and along the same dimensions for all subjects. However, different subjects require in principle a progression at a different, personalized pace according to the different dimensions that characterize attentional training exercises.
To tackle the issue of inter-individual variability, the present project proposes to apply some principles from intelligent tutorial systems (ITS) to the field of attention training. In this context, we have already developed automatic curriculum learning algorithms such as those developed in the KidLearn project, which allow to customize the learner's path according to his/her progress and thus optimize his/her learning trajectory while stimulating his/her motivation by the progress made. ITS are widely identified in intervention research as a successful way to address the challenge of personalization, but no studies to date have actually been conducted for attention training. Thus, whether ITS, and in particular personalization algorithms, can optimize the number of respondents to an attention training program remains an open question.
To investigate this question, a web platform has been designed for planning and implementing remote behavioural studies. This tool provides means for registering recruited participants remotely and executing complete experimental protocols: from presenting instructions and obtaining informed consents, to administering behavioural tasks and questionnaires, potentially throughout multiple sessions spanning days or weeks. As a result, several studies using this tool will be conducted during the following months.
Since 2019 via the renewal of the Idex cooperation fund (between the University of Bordeaux and the University of Waterloo, Canada) led by the Flowers team and also involving F. Lotte from the Potioc team, we continue our work on the development of new curiosity-driven interaction systems.
Although experimentations have been slowed down by sanitary conditions, progress has been made in this area of application of FLOWERS works. In particular, three studies have been completed.
The first study regards a new interactive educational application to foster curiosity-driven question-asking in children. This study has been performed during the Master 2 internship of Mehdi Alaimi co-supervised by H. Sauzéon, E. Law and PY Oudeyer. It addresses a key challenge for 21st-century schools, i.e., teaching diverse students with varied abilities and motivations for learning, such as curiosity within educational settings. Among variables eliciting curiosity state, one is known as « knowledge gap », which is a motor for curiosity-driven exploration and learning. It leads to question-asking which is an important factor in the curiosity process and the construction of academic knowledge. However, children questions in classroom are not really frequent and don’t really necessitate deep reasoning. Determined to improve children’s curiosity, we developed a digital application aiming to foster curiosity-related question-asking from texts and their perception of curiosity. To assess its efficiency, we conducted a study with 95 fifth grade students of Bordeaux elementary schools. Two types of interventions were designed, one trying to focus children on the construction of low-level question (i.e. convergent) and one focusing them on high-level questions (i.e. divergent) with the help of prompts or questions starters models. We observed that both interventions increased the number of divergent questions, the question fluency performance, while they did not significantly improve the curiosity perception despite high intrinsic motivation scores they have elicited in children. The curiosity-trait score positively impacted the divergent question score under divergent condition, but not under convergent condition. The overall results supported the efficiency and usefulness of digital applications for fostering children’s curiosity that we need to explore further.
The overall results are published in CHI'20 33.
The second study investigates the neurophysiological underpinnings of curiosity and the opportunities of their use for Brain-computer interactions 34. Understanding the neurophysiological mechanisms underlying curiosity and therefore being able to identify the curiosity level of a person, would provide useful information for researchers and designers in numerous fields such as neuroscience, psychology, and computer science. A first step to uncovering the neural correlates of curiosity is to collect neurophysiological signals during states of curiosity, in order to develop signal processing and machine learning (ML) tools to recognize the curious states from the non-curious ones. Thus, we ran an experiment in which we used electroencephalography (EEG) to measure the brain activity of participants as they were induced into states of curiosity, using trivia question and answer chains. We used two ML algorithms, i.e. Filter Bank Common Spatial Pattern (FBCSP) coupled with a Linear Discriminant Algorithm (LDA), as well as a Filter Bank Tangent Space Classifier (FBTSC), to classify the curious EEG signals from the non-curious ones. Global results indicate that both algorithms obtained better performances in the 3-to-5s time windows, suggesting an optimal time window length of 4 seconds to go towards curiosity states estimation based on EEG signals.
Finally, the third study investigates the role of intrinsic motivation in spatial learning in children (paper in progress). In this study, the state curiosity is manipulated as a preference for a level of uncertainty during the exploration of new environments. To this end, a series of virtual environments have been created and is presented to children. During encoding, participants explore routes in environments according the three levels of uncertainty (low, medium, and high), thanks to a virtual reality headset and controllers and, are later asked to retrace their travelled routes . The exploration area and the wayfinding. ie the route overlap between encoding and retrieval phase, (an indicator of spatial memory accuracy) are measured. Neuropsychological tests are also performed. Preliminary results showed that there are better performances under the medium uncertainty condition in terms of exploration area and wayfinding score. These first results supports the idea that curiosity states are a learning booster.
At the end of 2020, we started an industrial collaboration project with EvidenceB on this topic (CIFRE contract of Rania Abdelghani currently submitted to the ANRT). The overall objective of the thesis is to propose new educational technologies driven by epistemic curiosity, and allowing children to express themselves more and learn better. To this end, a central question of the work will be to specify the impact of self-questioning aroused by states of curiosity about student performance. Another objective will be to create and study the pedagogical impact of new educational technologies in real situations (schools) promoting an active education of students based on their curiosity.
Integrating computer science (CS) into school curricula has become a worldwide preoccupation. Therefore, we present a CS and Robotics integration model and its validation through a large-scale pilot study in the administrative region of the Canton Vaud in Switzerland. Approximately 350 primary school teachers followed a mandatory CS continuing professional development program (CPD) of adapted format with a curriculum scaffolded by instruction modality. This included CS Unplugged activities that aim to teach CS concepts without the use of screens, and Robotics Unplugged activities that employed physical robots, without screens, to learn about robotics and CS concepts. Teachers evaluated positively the CPD and their representation of CS improved. Voluntary adoption rates reached 97 percent during the CPD and 80 percent the following year. These results combined with the underpinning literature support the generalisability of the model to other contexts. This work was published in
23 and led by our colleagues at EPFL.
With the outlook of improving communication and social abilities of people with ASD, we propose to extend the paradigm of robot-based imitation games to ASD teenagers. In this paper 52, we present an interaction scenario adapted to ASD teenagers, propose a computational architecture using the latest machine learning algorithm Openpose for human pose detection, and present the results of our basic testing of the scenario with human caregivers. These results are preliminary due to the number of session (1) and participants (4). They include a technical assessment of the performance of Openpose, as well as a preliminary user study to confirm our game scenario could elicit the expected response from subjects.
Intrinsically motivated goal exploration algorithms (IMGEPs) enable machines to discover repertoires of action policies that produce a diversity of effects in complex environments.
In robotics, these exploration algorithms have been shown to allow real world robots to acquire skills such as tool use 9170.
In other domains such as chemistry and physics, they open the possibility to automate the discovery of novel chemical or physical structures produced by complex dynamical systems 137.
However, they have so far assumed that self-generated goals are sampled in a specifically engineered feature space, limiting their autonomy. Recent work has shown how unsupervised deep learning approaches could be used to learn goal space representations 136 but they have used precollected data to learn the representations.
This project studies how IMGEPs can be extended and used for automated discovery of behaviours of dynamical systems in physics or chemistry without using assumptions or knowledge about such systems.
As a first step towards this goal we choose Lenia 80, a simulated high-dimensional complex dynamical system, as a target system.
Lenia is a continuous cellular automaton where diverse visual structures can self-organize (Fig.26, c).
It consists of a two-dimensional grid of cells where the state of each cell is a real-valued scalar activity .
The state of cells evolves over discrete time steps .
The activity change is computed by integrating the activity of neighbouring cells.
Lenia's behavior is controlled by its initial pattern and several settings that control the dynamics of the activity change.
Lenia can produce diverse patterns with different dynamics.
Most interesting, spatially localized coherent patterns that resemble in their shapes microscopic
We could successfully accomplish this goal 38 based on two key contributions of our research:
1) the usage of compositional pattern producing networks (CPPNs) for the generation of initial states for Lenia, and
2) the development of a novel IMGEP algorithm that learns goal representations online during the exploration of the system.
A key role in the generation of patterns in dynamical systems is their initial state .
IMGEPs sample these initial states and apply random perturbations to them during the exploration.
For Lenia this state is a two-dimensional grid with cells.
Performing directly a random sampling of the grid cells results in initial patterns that resemble white noise.
Such random states result mainly in the emergence of global patterns that spread over the whole state space, complicating the search for spatially localized patterns.
We solved the sampling problem for the initial states by using compositional pattern producing networks (CPPNs) 147.
CPPNs are recurrent neural networks that allow the generation of structured initial states (Fig.26, a).
The CPPNs are used as part of the system parameters which are explored by the algorithms.
They are defined by their network structure (number of neurons, connections between neurons) and their connection weights.
They include a mechanism for random mutation of the weights and structure.
We proposed an online goal space learning IMGEP (IMGEP-OGL), which learns the goal space incrementally during the exploration process.
A variational autoencoder (VAE) is used to encode Lenia patterns into a 8-dimensional latent representation used as goal space.
The training procedure of the VAE is integrated in the goal sampling exploration process by first initializing the VAE with random weights.
The VAE network is then trained every explorations for epochs on the previously idetnfied patterns during the exploration.
We evaluated the performance of the novel IMGEP-OGL to other exploration algorithms by comparing the diversity of their identified patterns. Diversity is measured by the spread of the exploration in an
analytic behavior space
We compared different exploration algorithms to the novel IMGEP-OGL:
1) Random exploration of system parameters,
2) IMGEP-HGS: IMGEP with a hand-defined goal space,
3) IMGEP-PGL: IMGEP with a learned goal space via an VAE by a precollected dataset of Lenia patterns, and
4) IMGEP-RGS: IMGEP with a VAE with random weights that defines the goal space.
The system parameters consisted of a CPPN that generates the initial state for Lenia and 6 further settings defining Lenia's dynamics: .
The CPPNs were initialized and mutated by a random process that defines their structure and connection weights as done.
The random initialization of the other Lenia settings was done by an uniform distribution and their mutation by a Gaussian distribution around the original values.
The diversity of identified patterns in the analytic behavior space show that IMGEP approaches with learned goal spaces via VAEs (PGL, OGL) could identify the highest diversity of patterns overall (Fig. 27, a).
They were followed by the IMGEP with a hand-defined goal space (HGS).
The lowest performance had the random exploration and the IMGEP with a random goal space (RGS).
The advantage of learned goals space approaches (PGL, OGL) over all other approaches was even stronger for the diversity of animal patterns, i.e. the main goal of our exploration (Fig. 27, b).
Our goal was to investigate new techniques based on intrinsically motivated goal exploration for the automated discovery of patterns and behaviors in complex dynamical systems.
We introduced a new algorithm (IMGEP-OGL) which is capable of learning unsupervised goal space representations during the exploration of an unknown system.
Our results for Lenia, a high-dimensional complex dynamical system, show its superior performance over hand-defined goal spaces or random exploration.
It shows the same performance as a learned goal space based on precollected data, showing that such a precollection of data is not necessary.
We furthermore introduced the usage of CPPNs for the successful initialization of the intial states of the dynamical systems.
Both advances allowed us to explore an unknown and high-dimensional dynamical system which shares many similarities with different physical or chemical systems.
This work is published at ICLR 2020 38. The project website with videos and additional results can be found at https://automated-discovery.github.io/, and the source code is available at https://github.com/flowersteam/automated_discovery_of_lenia_patterns.
In the previous paper 38, the problem of automated
diversity-driven
In this project, we follow the proposed experimental testbed of Reinke et al.(2020) 38 on a continuous game-of-life system (Lenia, 80). We provide empirical evidence that the discoveries of an IMGEP operating in a
monolithic
To address these limits, the contributions of this project are threefold.
First, we formulate the problem of meta-diversity search as follows: an artificial “discovery assistant” incrementally learns a set of diverse BC spaces in an outer loop; and searches to discover diverse patterns within each of them in an inner loop. With minimal external feedback, a successful discovery assistant should be able to efficiently specialize the exploration strategy toward a particular type of diversity, corresponding to the initially unknown preferences of the human evaluator.
Second, we present
HOLMES
Finally, we show how this architecture can easily be leveraged to
drive
To conclude, this work shows that integrating flexible modular representation learning with intrinsically-motivated goal exploration processes for
meta-diversity
Initial version of this work was presented at ICLR 2021 workshop "Beyond tabula rasa in Reinforcement Learning" 46. The final version of this work is published at NeurIPS 2020 35. The project website with videos and additional results can be found at http://mayalenE.github.io/holmes/, and the source code is available at http://mayalenE.github.io/holmes/.
This work was led by Daniel Cattaert, Aymar de Rugy and their collaborators at Incia, with contributions from Pierre-Yves Oudeyer.
Objective. Neuro-mechanical models are essential to increase our understanding of the fundamental mechanisms underlying natural sensorimotor control, and to foster robotic designs using them. Yet, the complexity of those models is such that current optimization methods are unsuited to establish the range of useful behaviors they could produce, and their associated parameter settings. Our goal is to provide both using recent advances in developmental machine learning.
Approach. We designed a simplified neuro-mechanical model that nevertheless has the complexity that make current optimization fail. This model consists of a single (elbow) joint actuated by two muscles and their associated spindles, alpha and gamma motoneurons, receiving simple (non-dynamic) step commands. To establish the range of movements this system is capable of doing, a goal exploration process was used that built a repertoire of valid actions through iterative sampling of target behaviors, combined with stochastic variation on the parameter settings that elicited their closest behaviors in this repertoire. Results obtained with this process were compared to those obtained with alternative optimization methods.
Main results. The goal exploration was found to widely outperform optimization methods in terms of its capacity to rapidly establish a repertoire of valid actions, and to find a large range of behaviors not otherwise found. The resulting repertoire also provides diverse parameter sets for any given actions, akin to what is observed in natural control. Families of solutions originating from few initial seeds should also be exploitable to generate novel behaviors through interpolation.
Significance. The proposed method provides rich perspectives to explore the structure and settings of lower-level neural circuitry, and their associated descending commands, to produce a wide range of useful behaviors. Comparison of behavioral space obtained after selective manipulation of various elements of neuro-mechanical models should also help understand natural control, and promote its emulation in robotics.
We have written an article under review.
Additionally, 35 also showed that adding human in the exploration loop can be a key to obtain interesting mappings. Designing interactive algorithms is thus an important step towards the adoption of automated exploration and discovery of complex systems, as users previously using hand-made heuristics would still need to add their expert knowledge in the exploration process.
Following these, we propose to design an interactive software which aims to provide tools to easily use exploration algorithms (e.g. curiosity-driven) in various systems. Several challenges are to be faced in this project among the possibility to use any complex system (numeric or physical), the need of a scalable architecture or having an user-friendly interface with efficient and modular visualisation tools.
We propose to use a microservice architecture and leverage Docker to make the software easily installable and modifiable by non-computer scientist users. We separate the front-end application on which the user will create experiments and interact with them from the automated discovery process (making the scalability issues easier to deal). We choose to use Python for Machine Learning code (as it offers a large community and efficient tools) as well as recent web tools (e.g. Angular) to provide user-friendly interfaces. See figure 30 for an overview of the functional architecture of our software.
In the context of the project 8.7, we started a collaboration with Bert Chan, an independant researcher on Artificial Life and author of the Lenia system 80. During this collaboration, Bert Chan will help us design versions of IMGEP usable by scientists (non ML-experts) end-users, which is the aim of project 8.7.4. Having himself created the Lenia system, he is highly-interested to use our algorithms to automatically explore the space of possible emerging structures and will provide valuable insights into end-user habits and concerns. Additionally, we will be working with him to expand the set of discoveries of possible structures in continuous CAs, as a continuation of the project 8.7.2.
Together with Segula Technologies and Sorbonne Université, ENSTA Paris has been working on eXplainable Artificial Intelligence (XAI) in order to make machine learning more interpretable.
While opaque decision systems such as Deep Neural Networks have great generalization and prediction skills, their functioning does not allow obtaining detailed explanations of their behaviour. The objective is to fight the trade-off between performance and explainability by combining connectionist and symbolic paradigms 74.
Broad consensus exists on the importance of interpretability for AI models. However, since the domain has only recently become popular, there is no collective agreement on the different definitions and challenges that constitute XAI. The first step is therefore to summarize previous efforts made in this field. We presented a taxonomy of XAI techniques in 19 and we are currently working on a prediction model that generates itself an explanation of its rationale in natural language while keeping performance as close as possible to the the state of the art 74.
Symbolic artificial intelligence methods are experiencing a come-back in order to provide deep representation methods the explainability they lack. In this area, a survey on RDF stores to handle ontology-based triple databases has been contributed 106, as well as the use of neural-symbolic tools that aim at integrating both neural and symbolic representations 74.
A large set of the explainable Artificial Intelligence (XAI) literature is emerging on feature relevance techniques to explain a deep neural network (DNN) output or explaining models that ingest image source data. However, assessing how XAI techniques can help understand models beyond classification tasks, e.g. for reinforcement learning (RL), has not been extensively studied. With this project 26, we review recent works in the direction to attain Explainable Reinforcement Learning (XRL), a relatively new subfield of Explainable Artificial Intelligence, intended to be used in general public applications, with diverse audiences, requiring ethical, responsible and trustable algorithms. In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box. We evaluate mainly studies directly linking explainability to RL, and split these into two categories according to the way the explanations are generated: transparent algorithms and post-hoc explainaility. We also review the most prominent XAI works from the lenses of how they could potentially enlighten the further deployment of the latest advances in RL, in the demanding present and future of everyday problems. We published this review
In the last years, Artificial Intelligence (AI) has achieved a notable momentum that may deliver the best of expectations over many application sectors across the field. For this to occur, the entire community stands in front of the barrier of explainability, an inherent problem of AI techniques brought by sub-symbolism (e.g. ensembles or Deep Neural Networks) that were not present in the last hype of AI. Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is acknowledged as a crucial feature for the practical deployment of AI models. This overview 19 examines the existing literature in the field of XAI, including a prospect toward what is yet to be reached. We summarize previous efforts to define explainability in Machine Learning, establishing a novel definition that covers prior conceptual propositions with a major focus on the audience for which explainability is sought. We then propose and discuss about a taxonomy of recent contributions related to the explainability of different Machine Learning models, including those aimed at Deep Learning methods for which a second taxonomy is built. This literature analysis serves as the background for a series of challenges faced by XAI, such as the crossroads between data fusion and explainability. Our prospects lead toward the concept of Responsible Artificial Intelligence, namely, a methodology for the large-scale implementation of AI methods in real organizations with fairness, model explainability and accountability at its core. Our ultimate goal is to provide newcomers to XAI with a reference material in order to stimulate future research advances, but also to encourage experts and professionals from other disciplines to embrace the benefits of AI in their activity sectors, without any prior bias for its lack of interpretability.
The use of artificial intelligence (AI) in a variety of researchfields is speeding up multiple digital revolutions, from shifting paradigms in healthcare, precision medicine and wearable sensing, to public services and education offered to the masses around the world, to futurecities made optimally efficient by autonomous driving. When a revolution happens, the consequences are not obvious straight away, and to date, there is no uniformly adaptedframework to guide AI research to ensure a sustainable societal transition. To answer this need, here we analyze three key challenges to interdisciplinary AI research, and deliver three broad conclusions 27: 1) future development of AI should not only impact other scientific domains but should also take inspiration and benefit from other fields of science, 2) AI research must be accompanied by decision explainability, dataset bias transparency aswell as development of evaluation methodologies and creation of regulatory agencies toensure responsibility, and 3) AI education should receive more attention, efforts and innovation from the educational and scientific communities. Our analysis is of interest notonly to AI practitioners but also to other researchers and the general public as it offers ways to guide the emerging collaborations and interactions toward the most fruitful outcomes.
Ethics Guidelines for Trustworthy AI advocate for AI technology that is, among other things, more inclusive. Explainable AI (XAI) aims at making state of the art opaque models more transparent, and defends AI-based outcomes endorsed with a rationale explanation, i.e., an explanation that has as target the non-technical users. XAI and Responsible AI principles defend the fact that the audience expertise should be included in the evaluation of explainable AI systems. However, AI has not yet reached all public and audiences , some of which may need it the most. One example of domain where accessibility has not much been influenced by the latest AI advances is cultural heritage. In this project 44, we propose including minorities as special user and evaluator of the latest XAI techniques. In order to define catalytic scenarios for collaboration and improved user experience, we pose some challenges and research questions yet to address by the latest AI models likely to be involved in such synergy.
This project is a collaboration with the SISTM team from Inria Bordeaux. Modelling the dynamics of epidemics helps proposing control strategies based on pharmaceutical and non-pharmaceutical interventions (contact limitation, lock down, vaccination, etc). Hand-designing such strategies is not trivial because of the number of possible interventions and the difficulty to predict long-term effects. This task can be cast as an optimization problem where state-of-the-art machine learning algorithms such as deep reinforcement learning, might bring significant value. However, the specificity of each domain – epidemic modelling or solving optimization problem – requires strong collaborations between researchers from different fields of expertise.
This is why we introduce EpidemiOptim, a Python toolbox that facilitates collaborations between researchers in epidemiology and optimization. EpidemiOptim turns epidemiological models and cost functions into optimization problems via a standard interface commonly used by optimization practitioners (
OpenAI Gym
Together with the Hybrid team at INCIA, CNRS (Sébastien Mick, Daniel Cattaert, Florent Paclet, Aymar de Rugy) and Pollen Robotics (Matthieu Lapeyre, Pierre Rouanet), the Flowers team continued to work on a project related to the design and study of myoelectric robotic prosthesis. The ultimate goal of this project is to enable an amputee to produce natural movements with a robotic prosthetic arm (open-source, cheap, easily reconfigurable, and that can learn the particularities/preferences of each user). This will be achieved by 1) using the natural mapping between neural (muscle) activity and limb movements in healthy users, 2) developing a low-cost, modular robotic prosthetic arm and 3) enabling the user and the prosthesis to co-adapt to each other, using machine learning and error signals from the brain, with incremental learning algorithms inspired from the field of developmental and human-robot interaction.
We investigated how participants controlling a humanoid robotic arm's 3D endpoint position by moving their own hand are influenced by the robot's postures. We hypothesized that control would be facilitated (impeded) by biologically plausible (implausible) postures of the robot. Background: Kinematic redundancy, whereby different arm postures achieve the same goal, is such that a robotic arm or prosthesis could theoretically be controlled with less signals than constitutive joints. However, congruency between a robot's motion and our own is known to interfere with movement production. Hence, we expect the human-likeness of a robotic arm's postures during endpoint teleoperation to influence controllability. Method: Twenty-two able-bodied participants performed a target-reaching task with a robotic arm whose endpoint's 3D position was controlled by moving their own hand. They completed a two-condition experiment corresponding to the robot displaying either biologically plausible or implausible postures. Results: Upon initial practice in the experiment's first part, endpoint trajectories were faster and shorter when the robot displayed human-like postures. However, these effects did not persist in the second part, where performance with implausible postures appeared to have benefited from initial practice with plausible ones. Conclusion: Humanoid robotic arm endpoint control is impaired by biologically implausible joint coordinations during initial familiarization but not afterwards, suggesting that the human-likeness of a robot's postures is more critical for control in this initial period. Application: These findings provide insight for the design of robotic arm teleoperation and prosthesis control schemes, in order to favor better familiarization and control from their users. This work was published in citemick:hal-03001362.
For a vehicle to navigate autonomously, it needs to perceive its surroundings and estimate the future state of the relevant traffic-agents with which it might interact as it navigates across public road networks. Predicting the future state of the perceived entities is a challenge, as these might appear to move in a stochastic manner. However, their motion is constrained to an extent by context, in particular the road network structure. Conventional machine learning methods are mainly trained using data from the perceived entities without considering roads, as a result trajectory prediction is difficult. In this paper, the notion of maps representing the road structure are included into the machine learning process. For this purpose, 3D LiDAR points and maps in the form of binary masks are used. These are used on a recurrent artificial neural network, the LSTM encoder-decoder based architecture to predict the motion of the interacting traffic agents. A comparison between the proposed solution with one that is only sensor driven (LiDAR) is included. For this purpose, NuScenes dataset is utilised, that includes annotated 3D point clouds. The results have demonstrated the importance of context to enhance our prediction performance as well as the capability of our machine learning framework to incorporate map information.
Our results were published at the 2020 VTC conference 49
Image captioning models have been able to generate grammatically correct and human understandable sentences. However most of the captions convey limited information as the model used is trained on datasets that do not caption all possible objects existing in everyday life. Due to this lack of prior information most of the captions are biased to only a few objects present in the scene, hence limiting their usage in daily life. In this paper 39, we attempt to show the biased nature of the currently existing image captioning models and present a new image captioning dataset, Egoshots, consisting of 978 real life images with no captions. We further exploit the state of the art pre-trained image captioning and object recognition networks to annotate our images and show the limitations of existing works. Furthermore , in order to evaluate the quality of the generated captions, we propose a new image captioning metric, object based Semantic Fidelity (SF). Existing image captioning metrics can evaluate a caption only in the presence of their corresponding annotations; however, SF allows evaluating captions generated for images without annotations, making it highly useful for real life generated captions.
Deep reinforcement learning (DRL) techniques give robotics research an AI boost in many applications. In order to simultaneously accommodate the complex robotic behaviour simulation and DRL algorithm verification, a new simulation platform, namely the RobotDrlSim, is proposed 51. First, we design a standardized API interfacing mechanism for coordinating diverse environments on RobotDrlSim platform, where PyBullet simulator is equipped with an API to form a physical engine for robotics simulation. Second, benchmark DRL models are included in the baseline library for evaluation. Third, real-time human-robot interactions can be captured and imported to drive the RobotDrlSim tasks, which provide big data-stream for reinforcement learning. Experimentations show that cutting-edge DRL algorithms developed in python can be seamlessly deployed to the robots, and human interactions can be availed in training the robots. RobotDrlSim is valid for efficiently developing DRL algorithms for artificial intelligence models of robots, and it is especially suitable for the robot educational purposes.
We developed planning algorithms for a autonomous electric car for Renault SAS in the continuation of the previous ADCC project. We improved our planning algorithm in order to go toward navigation on open roads, in particular with the ability to reach higher speed than previously possible, deal with more road intersection case (roundabouts), and with multiple lane roads (overtake, insertion...).
Financing of a postdoc grant for a 2 year project with Ubisoft and Région Aquitaine.
Financing of the PhD grant of Rémy Portelas by Microsoft Research.
Financing of the CIFRE PhD grant of Adrien Bennetot by Segula Technologies.
Financing of the CIFRE PhD grant of Mayalen Etcheverry by Poietis.
Financing (in progress) of the CIFRE PhD grant of Maxime Adolph by Onepoint.
Financing of the CIFRE PhD grant of Rania Adolph by EvidenceB.
Financing of the CIFRE PhD grant of Vyshakh Palli-Thazha by Renault.
Financing of the CIFRE PhD grant of Florence Carton by CEA.
Financing of the CIFRE PhD grant of Hugo Caselles-Dupré by Softbank Robotics.
Financing of one year-postdoctoral position (recruitment in progress) and the app. development by the International Foundation for Applied Research on Disability (FIRAH).
The School+ project consists of a set of educational technologies to promote inclusion for children with Autism Spectrum Disorder (ASD). School+ primary aims at encouraging the acquisition of socio-adaptive behaviours at school while promoting self-determination (intrinsic motivation), and has been created according to the methods of the User-Centred Design (UCD).
Requested by the stakeholders (child, parent, teachers, and clinicians) of school inclusion, Flowers team works to the adding of an interactive tool for a collaborative and shared monitoring of school inclusion of each child with ASD. This new app will be assessed in terms of user experience (usability and elicited intrinsic motivation), self-efficacy of each stakeholder and educational benefit for child. This project includes the Academie de Bordeaux –Nouvelle Aquitaine, the CRA (Health Center for ASD in Aquitania), and the ARI association.
Idex Bordeaux-Univ. Waterloo
Didier Roy and PY Oudeyer have created a collaboration with LSRO EPFL and Pr Francesco Mondada, about Robotics and education.
Didier Roy has created a collaboration with HEP Vaud (Teachers High School) and Bernard Baumberger and Morgane Chevalier, about Robotics and education. Scientific discussions and shared professional training.
Didier Roy has created a collaboration with Biorob - EPFL, LEARN - EPFL, and Canton de Vaud, about Robotics and Computer Science education. Scientific discussions and shared professional training.
PY Oudeyer and H Sauzéon started a collaboration with Daphne Bavelliers's research group at the University of Geneva on using machine learning for personalizing exercises in attention training educational software.
PY Oudeyer started a collaboration with Maxime Gasse (MILA, Montreal, Canada), Damien Grasset and Guillaume Gaudron (IRT Saint-Exupery, Toulouse), in the context of the project DEEL, on causal theory and reinforccement learning.
Kevvyn Collins-Thompson, Univ. Michigan.
Germán Kruszewski (Facebook AI Research, Title: "The quest for compositional learning"), Guillermo Valle (Univ. Oxford, UK; Title: "Simplicity bias and generalization in deep neural networks"), Ferran Alet (MIT, US, Title: "Meta-learning curiosity algorithms"), Solande Denerveaud (Univ. Geneva, Switzerland; Title: "Error monitoring during learning: Neural and behavioral comparison studies of Montessori and traditionally-schooled students"), Hugo Cisneros (CIIRC, CTU in Prague, Title: "Artificial evolution and emergence in complex systems"), Remy van Trijt (Sony CSL Paris, France, Title: "Fluid Construction Grammar").
PY Oudeyer collaborated with Aymar de Rugy, Daniel Cattaert, Mathilde Couraud, Sébastien Mick and Florent Paclet (INCIA, CNRS/Univ. Bordeaux) about the design of myoelectric robotic prostheses based on the Poppy platform, on the design of algorithms for co-adaptation learning between the human user and the prosthesis, and on the use of goal exploration algorithm to study the behaviour of models of neuromuscular systems. This was funded by a PEPS CNRS grant.
C Moulin-Frier obtained an ANR JCJC grant. The project is entitled "ECOCURL: Emergent communication through curiosity-driven multi-agent reinforcement learning". The project starts in Feb 2021 for a duration of 48 months. It will fund a PhD student (36 months) and a Research Engineer (18 months) as well as 4 Master internships (one per year).
Clément Moulin-Frier obtained an Exploratory Action from Inria. The project is entitled "ORIGINS: Grounding artificial intelligence in the origins of human behavior". The project starts in October 2020 for a duration of 24 months. It funds a post-doc position (24 months). Eleni Nisioti has been recruited on this grant.
Didier Roy is collaborator of the Inria Exploratory Action AIDE "Artificial Intelligence Devoted to Education", ported by Frédéric Alexandre (Inria Mnemosyne Project-Team), Margarida Romero (LINE Lab) and Thierry Viéville (Inria Mnemosyne Project-Team, LINE Lab). The aim of this Exploratory Action consists to explore to what extent approaches or methods from cognitive neuroscience, linked to machine learning and knowledge representation, could help to better formalize human learning as studied in educational sciences. AIDE is a four year project started middle 2020 until 2024.
https://team.inria.fr/mnemosyne/aide/
Partners:
The solution Adaptiv'Math comes from an innovation partnership for the development of a pedagogical assistant based on artificial intelligence. This partnership is realized in the context of a call for projects from the Ministry of Education to develop a pedagogical plateform to propose and manage mathematical activities intended for teachers and students of cycle 2.
The role of Flowers team is to work on the AI of the proposed solution to personalize the pedagogical content to each student. This contribution is based on the work done during the Kidlearn Project and the thesis of Benjamin Clement 82, in which algorithms have been developed to manage and personalize sequence of pedagogical activities. One of the main goal of the team here is to transfer technologies developed in the team in a project with the perspective of industrial scaling.
Clément Moulin-Frier has started a project with InriaTech and the startup Gloo, located in Bordeaux.
Julius Taylor, supervised by Clément Moulin-Frier, obtained an Inria Cordi PhD grant. He started his PhD thesis in November 2020 on model-based emergent communication in multi-agent reinforcement learning.
Hélène Sauzéon and Clément Moulin-Frier obtained an post-doctoral grant for the project entitled " Personalized Intelligent Tutorial Systems (ITS) for attention training: Modelling of personalization algorithms and effectiveness study"
Masataka Sawayama started his postdoctoral position in Janvier 2021
Clément Moulin-Frier has co-organized the 1st SMILES (Sensorimotor Interaction, Language and Embodiment of Symbols) workshop at ICDL 2020, Nov 2020, Valparaiso / Virtual, Chile. https://sites.google.com/view/smiles-workshop/
PY Oudeyer was member of the program committee for ICLR, AAAI, Neurips.
Clément Moulin-Frier has reviewed for the ICRA conference.
Cédric Colas has reviewed for the ICML, ICLR and NeurIPS conferences.
PY Oudeyer was a reviewer for ICLR, AAAI, Neurips.
Didier Roy was a reviewer for PRUNE Conference (Poitiers) and RNRE (IFE ENS Lyon).
PY Oudeyer was member of the editorial board of: IEEE Transactions on Cognitive and Developmental Systems and Frontiers in Neurorobotics.
PY Oudeyer was co-editor of a Research Topic on "Modeling Play in Early Infant Development" in Frontiers in Neurorobotics 31,
as well as of a Research Topic on "Intrinsically Motivated Open-Ended Learning in Autonomous Robots" in Frontiers in Neurorobotics 30.
Clément Moulin-Frier is co-editing a Research Topics in Frontiers:
Emergent Behavior in Animal-inspired Robotics
Clément Moulin-Frier has reviewed for
Journal of Artificial Intelligence Research (JAIR)
Mayalen Etcheverry has reviewed for the Applied Intelligence (APIN) journal.
Rémy Portelas has reviewed for the IEEE
Robotics and Automation Letters (RA-L) and the KI – Künstliche Intelligenz journal.
PY Oudeyer reviewed for the journals: IEEE Transactions on Cognitive and Developmental Systems, Journal of the Royal Society Interface, Child Development, Frontiers in Psychology, Handbook of Computational Psychology, Motivation and Emotion
Cédric Colas has given an invited talk on the EpidemiOptim project at DeepMind, in the context of an internal seminar.
PY Oudeyer gave a keynote talk at the EGC conference in Brussels, on developmental machine learning, Jan. 2020, https://www.egc.asso.fr/non-classe/conferences-invitees-egc-2020.html,
PY Oudeyer gave an invited talk at the Deep RL Workshop of Neurips 2020, on intrinsically motivated goal-conditioned reinforcement learngin, Dec. 2020, https://slideslive.com/38938095/machines-that-invent-their-own-problems?ref=account-folder-62083-folders.
PY Oudeyer gave an invited seminar at the MIT embodied AI seminar, on developmental machine learning, Deep RL and artificial curiosity, April 2020, https://www.youtube.com/watch?v=Jx6-DKXgAKU;
PY Oudeyer gave a keynote talk at the Crossmodal Learning Center Autumn School, on Developmental Machine Learning, Curiosity and Deep RL, Dec. 2020. https://www.crossmodal-learning.org/home.html
PY Oudeyer gave an invited talk at the Brain and Cognition seminar at the University of Geneva, on Curiosity-driven learning in humans and mahcines, Oct. 2020. https://listes.unige.ch/sympa/arc/brain-and-cognition/2020-09/msg00000/BC_SEPT_OCT_NOV_DEC_2020.pdf
Didier Roy has given invited talks at Adaptiv'math project webinar, Class'code AI webinar, EPFL learning sciences conference, on Flowers researches, on AI for education and education to AI.
Didier Roy has given invited talks at Réseau Canopé "Mardis du numérique" at Toulon, on computer science basics and activities to teach computer science, robotics and AI.
Didier Roy has given invited talks at IFE ENS Lyon RNRE Conference.
Didier Roy was invited to participate to the CIDREE European Expert Meeting at IFE ENS Lyon. http://www.cidree.org/cidree-expert-meeting-lyon-january-13-14-2020/. The CIDREE is CONSORTIUM OF INSTITUTIONS FOR DEVELOPMENT AND RESEARCH IN EDUCATION IN EUROPE.
PY Oudeyer was editor of the Cognitive and Developmental Systems newsletter of the Cognitive and Developmental Systems Technical Committee of the IEEE CIS Society
PY Oudeyer was elected as Distinguised speaker of the IEEE Computational Ingelligence Society
PY Oudeyer was a reviewer for the European Commission (FET program), and the ANR.
PY Oudeyer has been member of piloting committees of consortium projects Adaptiv'Maths and Perseverons (eFran) on educational technologies.
PY Oudeyer gave a course on developmental reinforcement learning at ENSEIRB master on AI and machine learning (3h), nov. 2019.
PY Oudeyer gave a course on developmental learning at CogMaster cognitive science master (8h), nov. 2019.
PY Oudeyer gave a course on developmental learning at ENSC/ENSEIRB "option robot" master (3h), dec. 2019.
During the latest academic year, Hélène Sauzéon teached 96h in the BS. and master degrees in cognitive science (Department of Mathematics & interaction, University of Bordeaux). She was (co-)responsible of 9 teaching units (3 in BS et 6 in Master).
N Díaz Rodríguez taught, at ENSTA, a total of 3.25 h in ROB313, 27h at IN104, 10.5 at IN102, 21h at IA301. She also gave 42h at IG.2410 at the engineering school ISEP, and 3h course on Continual Learning and State Representation Learning at the reinforcement learning course at ENSEIRB master on AI and machine learning (3h), nov. 2019.
Rémy Portelas and Tristan Karch gave a first year introductory course on programming at Université de Bordeaux (64h), sep. 2020 to jan. 2021.
Didier Roy gave courses on computer science basics, and on computer science, robotics and AI activities for education at Canton de Vaud teachers.
Clément Moulin-Frier gave courses on Robotics and AI at University Pompeu Fabra (Barcelona, Spain, Jan 2020, 10 hours) and Centre de Recherches Interdisciplaires (Paris, France, Apr 2020, 12 hours).
Maxime Adolphe gave courses on basics of AI (18h) at Ecole Nationale Supérieure de Cognitique (ENSC), sep. 2020 to jan.2021.
PY Oudeyer was a member of the admissibility jury of the CR1 competition at Inria Bordeaux Sud-Ouest
PY Oudeyer was a reviewer in the PhD juries of Shoko Ota (OIST, Okinawa, Japant, Title: "Intrinsic Motivation in Creative Activity"), Japan; Benoit Choffin (Univ. Paris Saclay, Title: "Algorithmes d'espacement adaptatif de l'apprentissage pour l'optimisation de la maitrise a long terme de composante de connaissance"); Alexis Jacq (EPFL, Switzerland, Title: "Mutual understanding in educational human-robot collaborations"); Thomas Moerlan (Univ, Delft, Holland, Title: "The intersection of planning and learning").
PY Oudeyer was in the PhD "comité de suivi" of Ahmed Akazia (univ. Paris VI), Alexandre ChenuU (Univ. Paris VI), Sylvia Pagliarni (Univ. Bordeaux), Arash Rashidi (Univ. Bordeaxu), Effie Segas (Univ. Bordeaux)
Hélène Sauzéon organized a selection commitee for recruitment of Assistant professor in Rehabilitative science (University of Bordeaux).
Hélène Sauzéon was external member of a selection commitee for recruitment of Assistant professor in cognitive psychology (University of Toulouse – LeMirail).
Hélène Sauzéon performed several scientific expertises for application requests such as HDR (ED SP2, University of Bordeaux) or local careers advancement (University of Bordeaux).
N. Díaz Rodríguez was invited jury (President) of the PhD thesis "Deep Learning for Abnormal Movement Detection using Wearable Sensors: Case Studies on Stereotypical Motor Movements in Autism and Freezing of Gait in Parkinson's Disease" in the University of Trento, Italy May 2019.
C Moulin-Frier was in the PhD "comité de suivi" of Marc-Antoine Georges (Université Grenoble Alpes).
C Moulin-Frier was reviewer of the PhD of Sock Ching Low, entitled "Giving Centre Stage to Top-Down Inhibitory Mechanisms for Selective Attention", University Pompeu Fabra, Spain, Dec. 2020.
Didier Roy was manager editor of a 370-pages computer science school textbook for kindergarten and elementary schools (collaboration Inria/EPFL/Canton de Vaud, Switzerland)
Didier Roy was manager editor of the EPFL MOOC "E-NUM", major contribution to train people, especially teachers, in computer science and digital sociology (collaboration Inria/EPFL/Canton de Vaud, Switzerland)
Didier Roy and PY Oudeyer published an illustrated educational book on artificial intelligence and robotics for 7-8 years old children, Nathan, see https://www.nathan.fr/catalogue/fiche-produit.asp?ean13=9782092593295 and https://dproy.wordpress.com/.
Didier Roy was one of the authors of the Inria White Paper "Education and Digital, Challenges and Issues" https://hal.inria.fr/hal-03051329
Didier Roy was interviewed by Jérémy Dres, which was reported in a chapter of his comic "Les défis de l'intelligence artificielle".
PY. Oudeyer and S. Forestier were interviewed and appeared in a video documentary on Netflix, called "Babies", on models of infant development and curiosity-driven learning, https://en.wikipedia.org/wiki/Babies_(TV_series).
PY Oudeyer was interviewed by S. Paoli, which was reported in a chapter of the book "Ce qui vient", http://www.editionslesliensquiliberent.fr/livre-Ce_qui_vient-9791020908940-1-1-0-1.html.
H. Sauzéon was interviewed to present her research activitivies on a large-audience blog post on https://www.inria.fr/fr/helene-sauzeon-psychologie-realite-virtuelle.
H. Sauzéon was interviewed to present the results of the AIANA educational software to a large public on https://www.inria.fr/fr/bilan-du-logiciel-aiana-des-resultats-dapprentissage-ameliores
C Moulin-Frier and L Chevillot wrote a large-audience web article on the new Exploratory Action ORIGINS: Grounding Artificial Intelligence in the Origins of Human Behavior. https://www.inria.fr/fr/origins-ancrer-lintelligence-artificielle-dans-les-origines-des-comportements-humains
Cédric Colas helped design a web interface for the EpidemiOptim project. Users can interact with lock-down intervention strategies trained with machine learning to mitigate health and economic costs in the context of simulated COVID-19 epidemics https://epidemioptim.bordeaux.inria.fr/. Users can see the effect of various intervention strategies, can observe how they react to different parameters (sensitivity towards health vs economic costs) and can design their own intervention strategies.
Mayalen Etcheverry wrote an interactive blogpost on the paper of Reinke et al. (2020) "Intrinsically Motivated Discovery of Diverse Patterns in Self-Organizing Systems" published at ICLR 2020. https://developmentalsystems.org/intrinsically_motivated_discovery_of_diverse_patterns.
Didier Roy has reviewed contents of the Class'code IAI MOOC.
P.-Y. Oudeyer, B. Clément and L. Teodorescu made interventions as part of the "Le Procès du robot" animation at Cap Sciences. The goal was to present in layman's terms the research done at the lab for an audience of junior high school students and to foster discussion among them around an imagined scenario, about the legal responsibility of a domestic robot having caused a minor accient in a home. The web page of the intervention can be found there: https://www.cap-sciences.net/vous-etes/espace-enseignants/proces-robot.
Pierre-Yves Oudeyer made several popular science interventions in Ecole Primaire AygueMarine (Ayguemorte-les-Graves), College de Cadillac (of which he is "parrain scientifique" in the context of "Maison des sciences").