EN FR
EN FR


Section: New Results

Deep Learning and Information Theory

Convergence proofs for recurrent networks

Pierre-Yves Massé, in his PhD, defended Dec.2017 under the supervision of Yann Ollivier [3], obtained the very first rigorous results of convergence for online training of recurrent neural networks, by viewing them from the viewpoint of dynamical systems.

Fast algorithms for recurrent networks

Corentin Tallec (in his on-going PhD) and Yann Ollivier produced a new, faster algorithm for online training of recurrent networks, UORO, which is guaranteed to converge locally, and requires only linear time [49].

An explanation for LSTMs

The LSTM structure is currently the most popular recurrent network architecture. However, it is quite complex and very much ad hoc. Corentin Tallec (in his on-going PhD) and Yann Ollivier derived this architecture from first principles in a very simple axiomatic setting, simply by requiring that the model is invariant to arbitrary time deformations (such as accelerations, decelerations) in the data.

Bayesian neural networks

The Bayesian approach to neural networks makes several suggestions. First, it suggests to artificially add a very specific amount of noise during training, as a protection against overfit. This has to be done carefully (Langevin dynamics) in relation with the Fisher information metric. Gaetan Marceau-Caron and Yann Ollivier demonstrated that this approach can be applied efficiently for neural networks [28] (Best paper award at GSI17).

Second, a Bayesian viewpoint can help select the right size for each layer in a neural network. A comparison to a theoretical model of an infinitely large network suggests ways to adapt learning rates and criteria to select or deselect neurons or even layers (Preliminary results in a preprint by Pierre Wolinski (PhD), Yann Ollivier and Guillaume Charpiat, in preparation.)

Kalman filtering and information geometry

Filtering and optimization have been brought much closer by the following result [48]: the natural gradient in optimization is mathematically fully identical to the Kalman filter, for all probabilistic (machine learning) models. Even though both methods had been known for decades and were an important reference in their respective fields, they had not been brought together. The result extends to the non-iid setting (recurrent neural networks).

Computer vision

The activity of computer vision is run jointly with the program of Looking at People (LaP) challenges [46]. We edited a book in Springer, which is a collection of tutorials and papers on gesture recognition  [54], to which we contributed a survey chapter on deep-learning methods [34] a shorter version of which was published at the FG conference [17].

Several papers were published this year analyzing past LAP challenges. The “first impressions’’ challenge aimed at detecting personality traits from a few seconds of video. In [8], we demonstrate how deep residual networks attain state-of-the art performance on that task and lend themselves well to identifying which parts of the image is responsible for the final decision (interpretability). We also analyzed last years’ challenge on apparent age estimation from in still images and proposed improvements with deep residual networks [15]. A similar methodology based on deep-residual networks was applied to apparent personality trait analysis [24], [8].

Flexible deep learning architectures suitable to genetic data

Genetic data is usually given in the form of matrices, one dimension standing for the different individuals studied and the other dimension standing for the DNA sites. These dimensions vary, depending on the indivual sample size and on the DNA sequence length. On the other side, standard deep learning architectures require data of fixed size. We consequently search for suitable, flexible architectures, with as an application the prediction of the demographic history of a population given its genetic data (changes in the number of individuals through time). Théophile Sanchez, now PhD student, presented his work at the Junior Conference on Data Science and Engineering at Paris-Saclay [33]. To our knowledge this is the first attempt in the population genetics field to learn automatically from the raw data.

Image segmentation and classification

Emmanuel Maggiori, PhD student in the Titane team, Inria Sophia-Antipolis, mainly supervised by Yuliya Tarabalka, and co-supervised by Guillaume Charpiat, defended his PhD thesis [73], on the topic of remote sensing image segmentation with neural networks. This year, an architecture for proposed to be able to deal with high resolution images; a benchmark was built and made public (as there is lack of those in the remote sensing community); and the output of segmentation predictions was turned into a vectorial representation by suitable automatic polygonization [9], [25], [26].

Through a collaboration with the company Armadillo within the ADAMme project, we have also worked on image classification with multiple tags. The database consists of 40 millions images, with thousands of different possible tags (each image is on averaged associated with 10 tags). We started from a ResNet pre-trained network and adapted it to our task. A demonstration of our results was performed at the annual review meeting of the project.

Non-rigid image alignment

Automatic image alignment was also studied. In remote sensing, the task consists in aligning satellite or aerial images with ground truth data such as OpenStreetMap's cadastrial maps. This task is crucial in that such ground truth data is actually never well registered but is spatially deformed, preventing any further use by machine learning tools. Based on the analysis of multiple scale classical frameworks, a deep learning architecture was proposed to perform this task. This work is currently under submission to CVPR. On a related topic, in a collaboration with the start-up company Therapixel, we have been studying the registration of 3D medical images, but without any ground truth or template.

Video analysis

Time coherency is usually poorly handled in video analysis with neural networks. We have studied, on 3 different applications, different ways to take it better into account. First, in a collaboration with the Vision Institute, we studied different ways of incorporating neural networks in reinforcement learning approaches for the tracking of microbes with a motorized microscope. Second, in a collaboration with the SATIE team, we worked on the incorporation of optical flow for crowd density estimation, and, finally, in a collaboration with the Parietal team, we study how to link brain fMRI signals to the videos people are watching.