Section: New Results
Challenges
Participants: Cécile Germain, Isabelle Guyon, Adrien Pavao, Anne-Catherine Letournel, Michèle Sebag
PhD: Zhengying Liu, Lisheng Sun, Balthazar Donon
Collaborations: D. Rousseau (LAL), André Elisseeff (Google Zurich), Jean-Roch Vilmant (CERN), Antoine Marot and Benjamin Donnot (RTE), Kristin Bennett (RPI), Magali Richard (Université de Grenoble).
The Tau group uses challenges (scientific competitions) as a means of stimulating research in machine learning and engage a diverse community of engineers, researchers, and students to learn and contribute advancing the state-of-the-art. The Tau group is community lead of the open-source Codalab platform, hosted by Université Paris-Saclay. The project had grown in 2019 and includes now an engineer dedicated full time to administering the platform and developing challenges (Adrien Pavao), financed by a new project just starting with the Région Ile-de-France. This project will also receive the support of the Chaire Nationale d'Intelligence Artificielle of Isabelle Guyon for the next four years.
Following the highly successful ChaLearn AutoML Challenges (NIPS 2015 – ICML 2016 [111] – PKDD 2018 [113]), a series of challenges on the theme of AutoDL [129] was run in 2019 (see http://autodl.chalearn.org, addressing the problem of tuning the hyperparameters of Deep Neural Networks, including the topology of the network itself. Co-sponsored by Google Zurich, it required participants to upload their code on the Codalab platform. The series included two challenges in computer vision called AutoCV and AutoCV2, to promote automatic machine learning for image and video processing, in collaboration with University of Barcelona [45]. It also included challenges in speech processing (AutoSpeech), text processing (AutoNLP), weakly supervised learning (AutoWeakly) and times series (AutoSeries), co-organized with 4Paradigm. It culminated with launching the AutoDL challenge combining multiple modalities (presently on-going). The winners of each challenge open-sourced their code. GPU cloud resources were donated by Google. AutoDL was an official NeurIPS 2020 competition.
Part of the High Energy Physics activities of the team, TrackML [79], [80] first phase was run and co-sponsored by Kaggle, until September 2018. The second phase has been run on Codalab until March 2019, requiring code submission; algorithms were then ranked by combining accuracy and speed. The best submissions largely outperform the existing solutions. The challenge has been presented at NeurIPS [46], and at a CERN workshop(https://indico.cern.ch/event/813759/). I. Guyon and C. Germain are in the organizing committee, and M. Schoenauer is member of the Advisory Committee. The TAU team, in collaboration with CERN, has taken a leading role in stimulating both the ML and HEP communities to address the combinatorial complexity explosion created by the next generation of particle detectors.
A new challenge series in Reinforcement Learning was started with the company RTE France, one the theme “Learning to run a power network” [134] (L2RPN, http://l2rpn.chalearn.org). The goal is to test the potential of Reinforcement Learning to solve a real world problem of great practical importance: controlling electricity transportation in smart grids while keeping people and equipment safe. The first edition was run in Spring 2019 and was part of the official selection of the IJCNN 2019 conference. It ran on the Codalab platform coupled with the open source PyPower simulator of power grids interfaced with the Opengym RL framework, developed by OpenAI. In this gamified environment, the participants had to create a proper controller of a small grid of 14 nodes. Not all of them used RL, but some combinations of RL and human expertise proved to be competitive. In 2020, we will launch a new edition of the challenge with a more powerful simulator rendering the grid more realistic and capable of simulating a 118-node grid within our computational constraints. This competition was already accepted as part of the official program of IJCNN 2020.
The HADACA project (EIT Health) aims to run a series of challenges to promote and encourage innovations in data analysis and personalized medicine. Université de Grenoble organized a challenge on matrix factorization (https://www.medinfo-lyon.org/en/matrixen) using Codalab. The challenge gathered transdisciplinary instructors (researchers and professors), students, and health professionals (clinicians). The HADACA project contributed to create a large dataset to assess tumor heterogeneity in cancer research as well as developing innovative pedagogical methods to sensitize students to big data analysis in health. One of the products of HADACA is the ChaGrade platform (https://chagrade.lri.fr/), a tool allowing instructors to easily use challenges in the classroom, grading them as homework, and monitoring submissions and progress. HADACA will be pursued in 2020 by a sequel project also funded by EIT Health, called COMETH. The objective of COMETH will be to create an environment to conduct systematic benchmarks, based on Codalab. As a synergistic activity, Tau is also engaged in a collaboration with the Rensselaer Polytechnic Institute (RPI, New-York, USA) to use challenges in the classroom, as part of their health-informatics curriculum.
It is important to introduce challenges in ML teaching. This has been done (and is on-going) in I. Guyon's Licence and Master courses [38] : some assignments to Master students are to design small challenges, which are then given to Licence students in labs, and both types of students seem to love it. Codalab has also been used to implement reinforcement learning homework in the form of challenges by Victor Berger and Heri Rakotoarison for the class of Michèle Sebag. Along similar line, F. Landes proposed a challenge in the context of S. Mallat's course, at Collège de France. Finally, in collaboration with aiforgood.org, and Heri Rakotoarison has put in place a hackathon for the conference Data Science Africa (https://codalab.lri.fr/competitions/522)
In terms of dissemination, four books were published in 2019 in the Springer series on challenges in machine learning, see http://www.chalearn.org/books.html.