Section: New Results

Representation Learning

Participants : David Filliat [correspondant] , Celine Craye, Yuxin Chen, Clement Masson, Adrien Matricon, Freek Stulp.

Incremental Learning of Object-Based Visual Saliency

Searching for objects in an indoor environment can be drastically improved if a task-specific visual saliency is available. We describe a method to learn such an object-based visual saliency in an intrinsically motivated way using an environment exploration mechanism. We first define saliency in a geometrical manner and use this definition to discover salient elements given an attentive but costly observation of the environment. These elements are used to train a fast classifier that predicts salient objects given large-scale visual features. In order to get a better and faster learning, we use intrinsic motivation to drive our observation selection, based on uncertainty and novelty detection. Our approach has been tested on RGB-D images, is real-time, and outperforms several state-of-the-art methods in the case of indoor object detection. We published these results in two conferences [78],[77].

Cross-situational noun and adjective learning in an interactive scenario

Learning word meanings during natural interaction with a human faces noise and ambiguity that can be solved by analysing regularities across different situations. We propose a model of this cross-situational learning capacity and apply it to learning nouns and adjectives from noisy and ambiguous speeches and continuous visual input. We compared two different topic models for this task: Non Negative Matrix Factorization and Latent Dirichlet Association. We present experiments on learning object names and color names showing the performance of these model on realistic data and show how active learning can be used to speed-up learning by letting the learner choose the objects to be described. We published these results in a conference paper [75]

Learning representation with gated auto-encoders

We investigated algorithms that would be able to learn relevant visual or multi-modal features from data recorded while the robot performed some task. Representation learning is a currently very active research field, mainly focusing on deep-learning, which investigates how to compute more meaningful features from the raw high dimensional input data, providing a more abstract representation from which it should be easier to make decision or deduction (e.g classification, prediction, control, reinforcement learning). In the context of robotics, it is notably interesting to apply representation learning in a temporal and multi-modal approach exploiting vision and proprioception so as to be able to find feature that are relevant for building models of the robot itself and of its actions and their effect on the environment. Among the many existing approaches, we decided to explore the use of gated auto-encoders [104], a particular kind of neural networks including multiplicative connections, as they seem well adapted to this problem. Preliminary experimentations have been carried out with gated auto-encoders to learn transformations between two images. We observed that Gated Auto-Encoders (GAE) can successfully find compact representations of simple transformations such as translations, rotation or scaling between two small images. This is however not directly scalable to realistic images such as ones acquired by a robot's camera because of the number of parameters, memory size and compational power it would require (unless drastically downsampling the image which induces sensible loss of information). In addition, the transformation taking an image to the next one can be the combination of transformations due to the movement of several object in the field of view, composed with the global movement of the camera. This induces the existence of an exponential number of possible transformations to model, for which the basic GAE architecture is not suited.

Incremental Learning in high dimensions

Participants : Alexander Gepperth [correspondant] , Cem Karaoguz.

Incremental learning in data spaces of high dimensionality

Currently existing incremental learning algorithms in robotics have achieved a relatively high degree of usability due to the reduction of free model parameters in such approaches LWPR. Indeed, such algorithms are usually applied to low-dimensional tasks such as graspin with very good success, as the incremental learning paradigm is very appropriate to the robotics domain in general, especially in interactive scenarios. On the other hand, the partitioning of input space that is performed by LWPR and related approaches fails to be applicable if data dimension exceeds  50 elements since the used covariance matrices grow quadratically in size w.r.t. data dimensionality. Therefore, especially the incremental treatment of visual information is difficult, particularly for recognition and classification of objects or obstacles in general. To remedy this, we developed the incremental learning algorithm PROPRE [130] of fixed model complexity that can easily deal with data dimensionalities of 10.000 and beyond, where the only assumption is the same that is explicitly made for LWPR: that the data has structure, i.e., lies on a low-dimensional sub-manifold. We demonstrated the feasibility of the algorithm on several realistic datasets, on the one hand MNIST and on the other hand a much more challenging visual pedestrian pose recognition task from the intelligent vehicle domain[65].

Incremental learning with two memory systems

In order to increase PROPRE's ability to react quickly to changes in data statistics (e.g., a newly added visual class) while at the same time avoiding fast forgetting, a second, short-term memory system was proposed for PROPRE in [65]. This short-term memory is filled when task failures occur and is used to re-train the incremental long-term memory at a later time and on a slower time scale. In this way, abrupt changes in data statistics maybe immediately reacted upon, whereas the long-term memory can retain its stability that ensures that any forgetting happens gradually, on a determined time scale.

Steps towards incremental deep learnig

Since PROPRE is a neural architecture with just one hidden layer, its capacity is limited. This is why steps were taken to create deeper hierarchies with PROPRE in afashion totally analogous to current deep learning approaches. First of all, it was shown that a deep PROPRE architecture can achieve the same classification accuracy on MNIST as a shallow one but at a significantly lower computational cost [86]. Furthermore, it was shown that a deep PROPRE architecture is capable of change detection at multiple levels, a prerequisite for incremental learning [87]. Next steps will consist of creating regular deep PROPRE architectures and testing them on curently accepted machine learning benchmark tasks.

Real-world application of incremental learning

In [88], the incremental PROPRE algorithm was applied to object recognition and detection problems in the domain of intelligent vehicles. I was shown that, by re-casting pedestrian detection as an incremental learning problem where the background class is added only after learnig the pedestrian class, the number of required model resources for representing the background is reduced, and better accuracy can be obtained.

Measuring Uncertainty in Deep Learning Networks

Participants : Florian Golemo [correspondant] , Manuel Lopes.

As precursor to the main objective of the IGLU project, we investigated methods that would enable deep neural networks to judge their knowledge about a domain.

Neural networks, especially deep ones, have been shown to be able to model arbitrarily complex problems, and thus offer powerful tools for machine learning. Yet they come with a significant flaw of not being inherently able to represent certainty of their predictions. By adding a measure of uncertainty to neural networks, this technology could be applied to autonomous exploration and open-ended learning tasks.

Thus the goal of this project was to find a method to measure how much knowledge a neural network has about about an unlabeled data item (measure of uncertainty), and to apply this new measure in an active learning context. The objective of the latter was to demonstrate the efficiency in handpicking interesting data, to optimally extend the system's own capabilities.

We were successful in finding a measure of uncertainty that would reliably distinguish data that the network has seen before, from data that was generally unfamiliar to the network. This measure was created by measuring the entropy of the network's last layer across a batch of stochastic samples generated by adding Poisson noise to the inputs.

The measure failed however to outperform random sampling in several active learning scenarios. Yarin Gal published related work as part of his dissertation [129] after this project was concluded. He elaborated that deep neural networks are very effective in canceling out input noise. The author suggested to use existing "Dropout" layers instead for stochastic sampling, but he reaches the same conclusion of using the last layer entropy as measure of uncertainty.

Learning models by minimizing complexity

We introduce COCOTTE (COnstrained Complexity Optimization Through iTerative merging of Experts), an iterative algorithm for discovering discrete, meaningful parameterized skills and learning explicit models of them from a set of behaviour examples. We show that forward-parameterized skills can be seen as smooth components of a locally smooth function and, framing the problem as the constrained minimization of a complexity measure, we propose an iterative algorithm to discover them. This algorithm fits well in the developmental robotics framework, as it does not require any external definition of a parameterized task, but discovers skills parameterized by the action from data. An application of our method to a simulated setup featuring a robotic arm interacting with an object is shown. This work was published in a conference paper [83]