Section: New Results

Autonomous Development of Representations

Open-ended bootstrapping of new sensorimotor representations

Participant : Alexander Gepperth.

We have explored a novel approach to the open-ended development of internal representations in autonomous agents, addressing in particular the transfer of knowledge between different modalities or abstraction levels. We propose a self-organized neural learning paradigm termed PROPRE (projection-prediction) that is driven by predictability: competitive advantages are given to those feature-sensitive elements that are inferable from activity in a reference representation, which may be innate or previously formed by learning. For generating and adapting the new induced representations, PROPRE implements a bi-directional interaction of clustering ("projection") and inference ("prediction"), the key ingredient being an easy-to-compute online measure of predictability, by which the projection step is encouraged to favor sensitivity to predictable clusters. We demonstrated the potential of this paradigm by several simulation experiments with synthetic inputs. We showed that induced representations are indeed significantly more sensitive to predictable stimuli, that they are continuously being adapted to changing input statistics and that the behavior under severe resource constraints is favorable.

The contribution of context information to object detection in intelligent vehicles

Participants : Alexander Gepperth, Michael Garcia Ortiz.

In this work package, we explored the potential contribution of multimodal context information to object detection in an "intelligent car". The used car platform incorporates several sophisticated processing subsystems, both for the detection of objects from local visual patterns as well as for the estimation of global scene properties, e.g., the shape of the road area, or the 3D position of the ground plane (sometimes denoted "scene context" or just "context"). Annotated data recorded on this platform is publicly available as the "HRI RoadTraffic" vehicle video dataset, which formed the basis for this investigation.

In order to quantify the contribution of context information, we investigated whether it can be used to infer object identity with little or no reference to local patterns of visual appearance. Using a challenging vehicle detection task based on the "HRI RoadTraffic" dataset, we trained selected algorithms to estimate object identity from context information alone. In the course of our performance evaluations, we also analyzed the effect of typical real-world conditions (added noise, high dimensions, environmental variation) on context model performance.

As a principal result, we showed that the learning of context models is feasible with all tested algorithms, and that object identity can be estimated from context information with similar accuracy as by relying on local pattern recognition methods. We also found that the use of basis function representations (also known as "population codes") allows the simplest (and therefore most efficient) learning methods to perform best in the benchmark, suggesting that the use of context is feasible even in systems operating under strong performance constraints.

Discovering object concept through developmental learning

Participants : Natalia Lyubova, David Filliat.

The goal of this work is to design a visual system for a humanoid robot. Taking inspiration from childs' perception and following the principles of developmental robotics, the robot should detect and learn objects from interactions with people and from experiments it performs with objects, avoiding the use of image databases or of a separate training phase. In our model, all knowledge is therefore iteratively acquired from low-level features and builds up hierarchical object models, which are robust to changes in the environment, background and camera motion.

In our scenario, people in front of the robot are supposed to interact with objects to encourage the robot to focus on them. We therefore assume that the robot is attracted by motion and we segment possible objects based on clustering of the optical flow. Additionally, the depth information from a Kinect is used to filter visual input, considering the constraints of the robot's working area and to refine the object contours obtained from motion segmentation.

The appearance of objects is encoded following the Bag of Visual Words approach with incremental dictionaries. We combine several complementary features to maximize the completeness of the encoded information (SURF descriptor and superpixels with associated colors) and construct pairs and triples of these features to integrate local geometry information. These features make it possible to decide if the current view has been already seen or not. A multi-view object model is then constructed by associating recognized views and views tracked during object motion.

This system is implemented on the iCub humanoid robot, which detects objects in the visual space and characterizes their appearance, their relative position and their occurrence statistics. Ten objects were presented in the current experiment; each of them was manipulated by a person during 1-2 minutes. Once the vocabulary reached a sufficient amount of knowledge, the robot was able to reliably recognize human hands and most of objects.

Scaling-up Knowledge for a Cognizant Robot

Participants : Thomas Degris, Joseph Modayil.

A cognizant robot is a robot with a deep and immediately accessible understanding of its interaction with the environment—an understanding that the robot can use to flexibly adapt to novel situations. Such a robot will need a vast amount of situated, revisable, and expressive knowledge to display flexible intelligent behaviors. Instead of relying on human-provided knowledge, we consider the case where an arbitrary robot can autonomously acquire pertinent knowledge directly from everyday interaction with the environment. We study how existing ideas in reinforcement learning theory can be used to formalize knowledge and use reinforcement learning techniques to enable a robot to maintain and improve its own knowledge. We consider robot performing a continual learning process that scales-up knowledge acquisition to cover a large number of facts, skills and predictions. This knowledge has semantics that are grounded in sensorimotor experience and can then be used for more abstract process such as planning. We see the approach of developing more cognizant robots as a necessary key step towards broadly competent robots.

Paper being published: Scaling-up Knowledge for a Cognizant Robot accepted at Designing Intelligent Robots: Reintegrating AI, AAAI Spring Symposium 2012.

Learning parallel combinations of motor primitives from demonstration and linguistic guidance with non-negative matrix factorization

Participants : Olivier Mangin, Pierre-Yves Oudeyer.

We have elaborated and experimented a novel approach to joint language and motor learning from demonstration. It enables discovery of a dictionary of motor and linguistic primitives, that can be combined in parallel to represent training data as well as novel skills in the form of combinations of known skills. These methods and the results of our experiments participate in addressing two main issues of developmental robotics: 1) symbol grounding for language learning; 2) achieving compositionality in motor-learning from demonstration, which enables re-using knowledge and thus scaling to complex tasks. In particular, we are intereseted in learning motor primitives active in parallel, a less explored way of combining such primitives. To address these challenges we have explored and studied the use of nonnegative matrix factorization to dicover motor primitives from histogram representations of data acquired from real demonstrations of dancing movements. Initial results were presented in [30] and further results are presented in an article under review.