Section: Application Domains
High Energy Physics (HEP)
This is joint work with The Laboratoire de l’Accelerateur Lineaire (LAL) https://www.lal.in2p3.fr and the ATLAS and CMS collaborations at CERN. Our principal collaborators at LAL are David Rousseau and Balazs Kegl. The project started in 2015 with the organization of a large world-wide challenge in machine learning that attracted nearly 2000 participants. The theme of the challenge was to improve the statistical significance of the discovery of the Higgs Boson in a particular decay channel, using machine learning. The outcome of the challenge impacted very importantly the methodology used by HEP researchers, introducing new ways of conducting cross-validation to avoid over-fitting and state-of-the-art learning machines, such as XGboost and deep neural networks. The setting of the challenge was purposely simplified to attract easily participants with no prior knowledge of physics. Following the success of the challenge, we decided to dig deeper and re-introduce into the problem more difficulties, including systematic noise.
SystML. (Cécile Germain, Isabelle Guyon, Michèle Sebag, Victor Estrade, Arthur Pesah): Preliminary explorations were conducted by an intern from ENSTA (Arthur Pesah) and Victor Estrade as an M2 intern. Victor Estrade started in September 2016 his PhD on this subject. The SystML project aims at tackling this problem from 3 angles:
Exploratory work was performed by Arthur Pesah and Victor Estrade to align the distributions generated by simulators and real data using Siamese networks and adversarial learning. Although good results were obtained on toy data and bioinformatics data, disappointing results were obtained on HEP data. Victor Estrade is now turning to another technique: tangent propagation. This method allows training neural networks, which are robust to “noise” in given directions of feature space.
TrackML. (Isabelle Guyon): A new challenge is in preparation with LAL and the ATLAS and CMS collaborations. The instantaneous luminosity of the Large Hadron Collider at CERN is expected to increase so that the amount of parasitic collisions can reach a level of 200 interaction per bunch crossing, almost a factor of 10 w.r.t the current luminosity. In addition, the experiments plan a 10-fold increase of the readout rate. This will be a challenge for the ATLAS and CMS experiments, in particular for the tracking, which will be performed with a new all Silicon tracker in both experiments. In terms of software, the increased combinatorial complexity will have to be dealt with within flat budget at best. To reach out to Computer Science specialists, a Tracking Machine Learning challenge (TrackML) is being set up for 2017, building on the experience of the successful Higgs Boson Machine Learning challenge in 2015. The problem setting is to provide participants with coordinates of “hits” that are excitations of detectors along particle trajectories. The goal of the challenge is to devise fast software to “connect the dots” and guess particle trajectories. TAO contributes preparing the challenge platform using Codalab and preparing the challenge protocol and baseline methods.