Section: Application Domains

High Energy Physics (HEP)

The project started in 2015 with the organization of the Higgs boson ML challenge, in collaboration with the Laboratoire de l’Accelerateur Lineaire (LAL) (David Rousseau and Balazs Kègl) and the ATLAS and CMS projects at CERN. These collaborations have been at the forefront of the broadening interaction between Machine Learning and High Energy Physics, with co-organisation of the Weizmann Hammers and Nails 2017 workshops [44], DataScience@HEP at Fermilab and the Connecting The Dots series.

  1. SystML (Cécile Germain, Isabelle Guyon, Michèle Sebag, Victor Estrade, Arthur Pesah): Experimental data involve two types of uncertainties: statistical uncertainty (due to natural fluctuations), and systematic uncertainty (due to "known unknowns" such as the imprecise characterization of physics parameters). The SystML project aims to deal with experimental uncertainties along three approaches: i) better calibrating simulators; ii) learning post-processors aimed to filter out the system noise; iii) anticipating the impacts of systematic noise (e.g., on statistical tests) and integrating this impact in the decision process.

    V. Estrade's PhD, focusing on the second approach, searches for new data representations insensitive to system-related uncertainty. Taking inspiration from the domain adaptation literature, two strategies have been investigated: i) an agnostric approach based on adversarial supervised learning is used to design an invariant representation (w.r.t. the physics parameters); ii) a prior knowledge-based approach.

  2. TrackML (Cécile Germain, Isabelle Guyon):

    A Tracking Machine Learning challenge (TrackML) [79], [51] is being set up for 1T 2018. Current methods used employed for tracking particles at the LHC (Large Hadron Collider) at CERN will be soon outdated, due to the improved detector apparatus and the associated combinatorial complexity explosion. The LAL and the TAU team have taken a leading role in stimulating both the the ML and HEP communities to renew the toolkit of physicists in preparation for the advent of the next generation of particle detectors.

    TrackML refers to recognizing trajectories in the 3D images of proton collisions at the Large Hadron Collider (LHC) at CERN. Think of this as the picture of a fireworks: the time information is lost, but all particle trajectories have roughly the same origin and therefore there is a correspondence between arc length and time ordering. Given the coordinates of the impact of particles on detectors (3D points), the problem is to “connect the dots” or rather the points, i.e. return all sets of points belonging to alleged particle trajectories [16]. From the machine learning point of view, beyond simple clustering, the problem can be treated as a latent variable problem, a tracking problem, or a pattern de-noising problem. A very large dataset (100GB) has been built by the Atlas and CMS collaborations specifically for the challenge.

    TrackML will be conducted in 2 phases, the first one favoring innovation over efficiency and the second one aiming at real-time reconstruction. The challenge is supported by Kaggle.