Section: New Software and Platforms
Tools for robot learning, control and perception
CARROMAN
Functional Description
This software implements a control architecture for the Meka humanoid robot. It integrates the Stanford Whole Body Control in the M3 architecture provided with the Meka robot, and provides clear and easy to use interfaces through the URBI scripting language. This software provides a modular library of control modes and basic skills for manipulating objects, detecting objects and humans which other research projects can reuse, extend and enhance. An example would be to locate a cylindrical object on a table using stereo vision, and grasping it using position and force control.
Aversive++
Functional Description
Aversive++ is a C++ library that eases micro-controller programming. Its aim is to provide an interface simple enough to be able to create complex applications, and optimized enough to enable small micro-controllers to execute these applications. The other aspect of this library is to be multiplatform. Indeed, it is designed to provide the same API for a simulator (named SASIAE) and for AVR-based and ARM-based micro-controllers.
DMP-BBO
Black-Box Optimization for Dynamic Movement Primitives
Keyword: -
Functional Description
The DMP-BBO Matlab library is a direct consequence of the insight that black-box optimization outperforms reinforcement learning when using policies represented as Dynamic Movement Primitives. It implements several variants of the PIBB algorithm for direct policy search. The dmp-bbo C++ library has been extended to include the “unified model for regression”. The implementation of several of the function approximators have been made real-time compatible.
KERAS-QR
KERAS with Quick Reset
Keywords: Library - Deep learning
Multimodal
Functional Description
The python code provides a minimum set of tools and associated libraries to reproduce the experiments in [98] , together with the choreography datasets. The code is primarily intended for reproduction of the mulimodal learning experiment mentioned above. It has already been reused in several experimentations by other member of the team and is expected to play an important role in further collaborations. It is also expected that the public availability of the code encourages further experimentation by other scientists with data coming from other domains, thus increasing both the impact of the aforementioned publication and the knowledge on the algorithm behaviors.
Of 3-D point cloud
Functional Description
This software scans the 3-D point cloud of a scene to find objects and match them against a database of known objects. The process consists in 3 stages. The segmentation step finds the objects in the point cloud, the feature extraction computes discriminating properties to be used in the classification stage for object recognition.
PEDDETECT
Functional Description
PEDDETECT implements real-time person detection in indoor or outdoor environments. It can grab image data directly from one or several USB cameras, as well as from pre-recorded video streams. It detects mulitple persons in 800x600 color images at frame rates of >15Hz, depending on available GPU power. In addition, it also classifies the pose of detected persons in one of the four categories "seen from the front", "seen from the back", "facing left" and "facing right". The software makes use of advanced feature computation and nonlinear SVM techniques which are accelerated using the CUDA interface to GPU programming to achieve high frame rates. It was developed in the context of an ongoing collaboration with Honda Research Institute USA, Inc.
ThifloNet
Keywords: Deep learning - Policy Learning
Scientific Description
We created a software architecture that combines a state-of-the-art computer vision system with a policy learning framework. This system is able to perceive a visual scene, given by a still image, extract facts (“predicates”), and propose an optimal action to achieve a given goal. Both systems are chained into a pipeline that is trained by presenting images and demonstrating an optimal action. By providing this information, both the predicate recognition model and the policy learning model are updated.
Our architecture is based on the recent works of Lerer, A., Gross, S., & Fergus, R., 2016 ("Learning Physical Intuition of Block Towers by Example"). They created a large network able to identify physical properties of stacked blocks. Analogously our vision system utilizes the same network layout (without the image prediction auxiliary output), with an added output layer for predicates, based on the expected number and arity of predicates. The vision subsystem is not trained with a common cross-entropy or MSE loss function, but instead receives its loss form the policy learning subsystem. The policy learning module calculates the loss as optimal combination of predicates for the given expert action.
By using this combination of systems, the architecture as a whole requires significantly fewer data samples than other systems (which exclusively utilize neural networks). This makes the approach more feasible to real-life applciation with actual live demonstration.
Functional Description
The neural network consists of ResNet-50 (the currently best-performing computer vision system), with 50 layers, 2 layers for converting the output of ResNet to predicates and a varying amount of output neurons, corresponding to the estimated number of n-arity predicates. The network was pretrained on the ImageNet dataset. The policy learning module incorporates the ACE tree learning tool and a wrapper in Prolog.
Our example domain consists of 2-4 cubes colored in red, blue, green, and yellow and randomly stacked on top of each other in a virtual 3D environment. The dataset used for training and testing contains a total of 30000 elements, each with an image of the scene, the correct predicates, a list of blocks that are present and the corresponding expert action, that would lead to stacking the blocks to a tower.