Section: New Results

Data Science

  • High Energy Physics The focus of the period has been to expand the collaboration with the High Energy Physics experiments started with the success of the 2014 HiggsML challenge [18] to new issues. The subject of V. Estrade Phd is to advance domain adaptation methods in the specific context of uncertainty quantification and calibration. So far, transfer learning has been addressed only with classical, additive and differentiable objective functions as performance criteria. However, learning to discover, exemplified by HEP, relies on more global and difficult criteria, related to the Area Under Roc Curve (AUC) and Neymann-Pearson learning. CERN funds another PhD (A. Pol), on anomaly detection. Another promising theme has emerged with the ongoing organization of a Tracking Challenge (TrackML) [56], [72], which focuses on extreme scaling of ML image processing.

  • Personal Semantics Our algorithm for inducing a taxonomy from a set of domain terms, that was placed first in the international Taxonomy Induction task, part of the SemEval 2015 conference in Denver, has been improved by the development of a robust technique for discovering the domain vocabulary for a new topic using a directed crawler we created. We have created hundreds of taxonomy for personal themes (hobbies, illnesses) that can be integrated into our Personal Semantics platform PTraces, and have deployed and evaluated the taxonomies. We also have introduced newer machine learning methods, such as Latent Dirichlet Allocation, for better recognition of domain vocabularies [55], [71].

  • Distributed system observation The work on distributed system automated analaysis and description has been persued thru the continued development of the GAMA multi-agent framework https://github.com/gama-platform/gama/wiki. The simulation framework has been applied to the study of a new protocol for MOOC management [6]. Philippe Caillou is associated to the young researcher ANR ACTEUR, coordinated by Patrick Taillandier (IDEES, Rouen university). With this project, the BDI cognitive agent model has been improved both in term of flexibility and ease of use for the non expert modeler [50].

  • Computational social sciences Thomas Schmitt's PhD focuses on the matching of job offers and applicant CVs. An informal collaboration with the Qapa agency (FUI proposal underway) gave us access to the 2012-2016 logs of their activity (CVs, job announcements and application clicks). This wealth of data delivered some unexpected findings, e.g., as to the differences between people's practice (the clicks) and their say (the documents). In [49], with Philippe Caillou and Michèle Sebag, a deep NN system MAJORE (MAtching JObs and REsumes) was proposed, trained to match the metric properties extracted from the collaborative filtering matrix, and address the cold start problem. A further research perspective, in collaboration with J.-P. Nadal from EHESS, is to build an observatory of the job demand dynamics.

    The Cartolabe project, started in Feb. 2016 (F. Louistisserand's engineer stint), applies machine learning techniques to build an interpretable representation from vast amounts of scientific articles. The goal is to use raw textual data, and the results of the pre-processing chain achieved by ANHALYTICS, to define a topology on authors, scientific themes, and teams, and enforce its 2D projection in a semantically admissible way. The collaboration with AVIZ is key to enable the scalable and navigable exploitation of this map. The perspective for 2017 is to build a visual interrogation of the map (locating all author names relevant to a given request) and to display the temporal evolution of the research activities.

    Amiqap studies the relation between quality of life at work and company performance, using both survey data on individual workers (collected by DARES, the statistical service of the French Ministry of labor, in 2013) and administrative data on companies provided by SECAFI, a union body. The study is run by a team within TAO (Philippe Caillou, Isabelle Guyon, Michèle Sebag and Paola Tubaro, plus post-doctoral researcher Olivier Goudet and intern Diviyan Kalainathan) in collaboration with Mines ParisTech social science and economics (SES) department, the RITM economics research center (Univ. Paris Sud) and the think-tank La Fabrique de l'Industrie. In its first stage, the exploratory analysis delivered some unexpected results, e.g. as to the existence of a "industry worker cluster", or the non-monotonous relationship between autonomy, salary and subjective satisfaction. A summary of these findings has been released online on the website of La Fabrique de l'Industrie, as a complement to their book on the same topic (published in October 2016). The exploratory analysis of the SECAFI data (yet unpublished) complements the above and shows how workers' satisfaction correlates with companies' financial and social performance indicators, though with marked differences across industries. The key question regards the nature of this relationship: cause, effect or due to a confounder feature (the industrial sector). Further research (Diviyan Kalainathan's PhD, O. Goudet post-doc) will focus on the use and extension of causal modelling algorithms on this issue; these perspectives attract quite some interest from the ministry (DARES) and big industrial players, willing to assess the relevance of their HR policies.