Section: New Results

Algorithm Selection and Configuration

Automatic algorithm selection and configuration (hyper-parameter selection) depending on the problem instance at hand is a pervasive research topic in TAO, for both fundamental and practical reasons: in order to automatically deliver a peak performance on (nearly) every new problem instance, and to understand the specifics of a problem instance and the algorithm skills w.r.t. these specifics.

Algorithm recommendation

A collaborative filtering approach called Alors (Algorithm Recommender System) has been proposed to achieve algorithm selection [10], considering after [81] that a problem instance "likes better" algorithms with good performances on this instance. Alors, tackling a cold-start recommendation problem, enables to independently assess the quality of the benchmark data (representativity of the problem instances w.r.t. the algorithm portfolio) and the quality of the meta-features used to describe the problem instances. Experiments on SAT, CSP and ML benchmarks yield state-of-art performances in the former two domains; these good results contrast with the poor results obtained on the ML domain, blamed on the comparatively poor quality of the ML meta-features.

AutoML and AutoDL

Isabelle Guyon has organized the AutoML challenge (paper in preparation), proposing a series of algorithm selection and configuration problems of increasing difficulty. Following this successful challenge, a new challenge will be organized in collaboration with Google Zurich, specifically targeting the selection of deep network architectures (AutoDL: Automatic Deep learning) in five domains: Image; Video; Audio; Text; Customer demographic descriptors.

The expected result of the challenge is to alleviate the burden on data scientists to design a good architecture ("black art"), and to enforce the reproducibility of the results. In particular, this challenge will encourage advances regarding a few key research questions:

  • How to make optimization algorithms more efficient without introducing more tunable parameters?

  • How to efficiently automate the tuning of many hyper-parameters?

  • How to automatically design or optimize a network architecture for a particular problem?

  • How to further automate the learning process by directly learning how to learn?

Per Instance Algorithm Configuration for Continuous Optimization

Nacim Belkhir's PhD thesis (defended on Nov. 30., 2017) was centered on PIAC (Per Instance Algorithm Configuration) in the context of continuous optimization. After a detailed study of features that had been proposed in the litterature, he studied the dependency of the PIAC results on the size of the sample on which they are computed. The rationale is that you must take into account the number of function evaluations that are used to compute the features when addressing a new target instance. He demonstrated that PIAC based on very small sample sets (down to 50 times the dimention) can nevertheless help improving the overall results of the optimization procedure [18], in particular by winning the single-objective track of the GECCO 2017 Black Box Competition.

Feature-based Algorithm Selection in Combinatorial Optimization

In the first part of his PhD (to be defended in Feb. 2018, see also Section 4.2), François Gonard designed ASAP, an Algorithm Selection algorithm that combines a global pre-scheduler and a per instance algorithm selector, to take advantage of the diversity of the problem instances on one hand and of the algorithms on the other hand. ASAP participated to two competitions: the 2016 ICON challenge [35], in which it obtained a Special Mention for its originality (and obtained excellent results on half of the problems); the 2017 OASC challenge where two versions of ASAP obtained the first overall best performances [23].

Deep Learning calibration

In a starting collaboration with Olivier Teytaud (who left TAO for Google Zurich in 2016), we proposed [40] an online scheme for Deep Learning hyper-parameter tuning that detects and early-stops unpromising runs using extrapolation of learning curves [64], taking advantage of the parallelism, and offering optimality guarantees within the multiple hypothesis testing framework.

Learning Rate Adaptation in Stochastic Gradient Descent

Based on an analogy with CMA-ES step-size adaptation (comparison with random walks), an original mechanism was proposed for adapting the learning rate of the stochastic gradient descent [52]. As increasing the learning rate can increase the number of catastrophic events (exploding gradients or loss values), a change detection test is used to detect such events and backtrack to safe regions. First experiments on small size problems (MNIST and CIFAR10) validate the approach. Interestingly, the same mechanism can be applied to the Adam optimizer and also improves on its basic version.

Domain Adaptation

The subject of V. Estrade's PhD is to advance domain adaptation methods in the specific context of uncertainty quantification and calibration in High Energy Physics analysis. The problem consists of learning a representation that is insensitive to perturbations induced by nuisance parameters. The need for the adversarial techniques, assuming a completely knowledge-free approach, has been questioned. Our results [32], [43] contrast the superior performance of incorporating a priori knowledge (Tangent Propagation approach) on a well separated classes problem (MNIST data) with a real case setting in HEP.