EN FR
EN FR


Section: New Results

Data Science

  • High Energy Physics The success of the 2014 HiggsML challenge has created a willingness for structured collaboration from the High Energy Physics experiments. A working group has been set up and new challenges are currently explored. A yearly workshop has been decided, with a first edition at CERN 9-13 Nov. 2015, DataScience@LHC.

    The challenge exemplifies a new machine learning task [58][56] : learning to discover evaluating the significance of a scientific discovery. It can be formally casted into a two-class classification problem, but with two major departures from a regular setting. 1) Discovery: labeled training examples of the positive class (signals) are not available and must be obtained from simulation. The learning machine can then address the “inverse problem” of predicting which events are signals in real data. 2) Evaluation: because the classes are enormously imbalanced and overlapping, the objective function of the classifier is a metric of a statistical test.

  • Personal Semantics Our algorithm for inducing a taxonomy from a set of domain terms placed first in the international Taxonomy Induction task, part of the SemEval 2015 conference in Denver. Since then, we have developed a robust technique for discovering the domain vocabulary for a new topic using a directed crawler we created. We are currently creating hundreds of taxonomy for personal themes (hobbies, illnesses) that can be integrated into our Personal Semantics platform PTraces. The challenges for the coming year will be deploying and evaluating the taxonomies, and introducing newer machine learning methods, such as Latent Dirichlet Allocation, for better recognizing domain vocabularies.

  • Distributed system observation The work on distributed system automated analaysis and description [59] [60] , has been persued thru the continued development of the GAMA multi-agent framework https://github.com/gama-platform/gama/wiki . The simulation framework has been applied to the study of a new anytime reverse auctions protocol [53] . Philippe Caillou is associated to the young researcher ANR ACTEUR, coordinated by Patrick Taillandier (IDEES, Rouen university). With this project, a new BDI cognitive agent model, designed to be easy to use for non computer scientist, has been proposed [29] and applied to Rouen traffic simulation [57] . Finaly, agent behavior has been extracted from human player logs to study the perception of emotive behaviors in board games [37] .

  • Digital humanities Amiqap and Cartolabe projects both start in 2016. The Cartolabe project applies machine learning techniques to determine comprehensible structures in unstructured data. The goal is to use raw textual data, and underspecified ontologies, to provide intuitive access to pertinent research activities in a large research organisation. Amiqap studies the relation between worker well-being and company performance, in collaboration with Mines ParisTech sociology department and La Fabrique de l'Industrie for research, Secafi and DARES for the data.

    These activities will benefit from Paola Tubaro's arrival (researcher CNRS in sociology and economy) in 2016.