Section: New Results

Statistical learning methodology and theory

Participants : Gilles Celeux, Serge Cohen, Christine Keribin, Michel Prenat, Sylvain Arlot, Benjamin Auder, Jean-Michel Poggi, Neska El Haouij, Kevin Bleakley, Matthieu Lerasle.

Sylvain Arlot wrote a book chapter about supervised statistical learning, from the mathematical point of view in 2018. This text describes the general prediction problem and the two key examples of regression and binary classification. Then, it studies two kinds of learning rules: empirical risk minimizers, which naturally lead to convex risks in classification, and local averaging rules, for which a universal consistency result can be obtained. Finally, it identifies the limits of learning in order to underline its challenges. The text ends with some useful probabilistic tools and some exercises.

Gilles Celeux and Serge Cohen have started research in collaboration with Agnès Grimaud (UVSQ) to perform clustering of hyperspectral images which respects spatial constraints. This is a one-class classification problem where distances between spectral images are given by the χ2 distance, while spatial homogeneity is associated with a single link distance. This year they have developed a hybrid hierarchical clustering procedure in which sub-clusters respecting spatial consistency are constructed. Then, these sub-clusters are merged without taking spatial constraints into account. This strategy leads to a more realistic segmentation of spectral images.

Gilles Celeux continued his collaboration with Jean-Patrick Baudry on model-based clustering. Last year, they started work on assessing model-based clustering methods on cytometry data sets. The interest of these is that they involve combining clustering and classification tasks in a unified framework. This year, this work was completed, and performed well in comparison with state-of-the-art procedures.

Gillies Celeux has continued research on missing data for model-based clustering in collaboration with Christophe Biernacki (Modal team, Inria Lille) and Julie Josse (École Polytechnique). This year, they implemented several algorithms to estimate their logistic model for mixture analysis involving not missing-at-random mixtures.

In the framework of MASSICCC, Benjamin Auder and Gilles Celeux have started research on the graphical representation of model-based clusters. The aim of this is to better-display proximity between clusters.It leads to a simple procedure to represent the proximity between clusters without any additional assumptions.

After having proved the consistency and asymptotic normality of Latent Block Model estimators with V. Brault and M. Mariadassou, Christine Keribin has worked on the behavior of the ICL and BIC model criteria in this model, and in particular on their probable asymptotic equivalence.

Christine Keribin has started a new collaboration with Christophe Biernacki (Inria Modal Team) to study the ability for co-clustering to be a good regularized method for clustering in HD, which was presented at the CMStatistics 2018 conference.

J-M. Poggi (with R. Genuer), published a survey paper dedicated to “Arbres CART et Forêts aléatoires, Importance et sélection de variables”, as a book chapter published in: “Apprentissage Statistique et Données Massives” by Technip.

J.-M. Poggi and N. El Haouij (with R. Ghozi, S. Sevestre Ghalila and M. Jaïdane) provide a random forest-based method for the selection of physiological functional variables in order to classify the stress level during real-world driving experience. The contribution of this study is twofold: on the methodological side, it considers physiological signals as functional variables and offers a procedure of data processing and variable selection. On the applied side, the proposed method provides a “blind” procedure of driver's stress level classification that does not depend on the expert-based studies of physiological signals. This work has been published in Statistical Methods & Applications.

J.-M. Poggi and N. El Haouij (with R. Ghozi, S. Sevestre Ghalila and M. Jaïdane provide a system and database to assess driver’s attention, called aAffectiveROAD. A paper presenting it has been published in the proceedings of the 33rd ACM Symposium on Applied Computing SAC'18.