Critical problems of the 21st century like the search for highly energy efficient or even carbon-neutral, and cost-efficient systems, or the design of new molecules against extensively drug-resistant bacteria crucially rely on the resolution of challenging numerical optimization problems. Such problems typically depend on noisy experimental data or involve complex numerical simulations where derivatives are not useful or not available and the function is seen as a black-box.

Many of those optimization problems are in essence multiobjective—one needs to optimize simultaneously several conflicting objectives like minimizing the cost of an energy network and maximizing its reliability—and most of the challenging black-box problems
are non-convex and non-smooth and they combine difficulties related to ill-conditioning, non-separability, and ruggedness (a term that characterizes functions that can be non-smooth but also noisy or multi-modal). Additionally, the objective function can be expensive to evaluate, that is one function evaluation can take several minutes to hours (it can involve for instance a CFD simulation).

In this context, the use of randomness combined with proper adaptive mechanisms that particularly satisfy several invariance properties (affine invariance, invariance to monotonic transformations) has proven to be one key component for the design of robust global numerical optimization algorithms 53, 37.

The field of adaptive stochastic optimization algorithms has witnessed some important progress over the past 15 years. On the one hand, subdomains like medium-scale unconstrained optimization may be considered as “solved” (particularly, the CMA-ES algorithm, an instance of Evolution Strategy (ES) algorithms, stands out as state-of-the-art method) and considerably better standards have been established in the way benchmarking and experimentation are performed. On the other hand, multiobjective population-based stochastic algorithms became the method of choice to address multiobjective problems when a set of some best possible compromises is thought for.
In all cases, the resulting algorithms have been naturally transferred to industry (the CMA-ES algorithm is now regularly used in companies such as Bosch, Total, ALSTOM, ...) or to other academic domains where difficult problems need to be solved such as physics, biology 58, geoscience 46, or robotics 49).

Very recently, ES algorithms attracted quite some attention in Machine Learning with the OpenAI article Evolution Strategies as a Scalable Alternative to Reinforcement Learning. It is shown that the training time for difficult reinforcement learning benchmarks could be reduced from 1 day (with standard RL approaches) to 1 hour using ES 55.1 A few years ago, another impressive application of CMA-ES, how “Computer Sim Teaches Itself To Walk Upright” (published at the conference SIGGRAPH Asia 2013) was presented in the press in the UK.

Several of these important advances around adaptive stochastic optimization algorithms are relying to a great extent on works initiated or achieved by the founding members of RandOpt, particularly related to the CMA-ES algorithm and to the Comparing Continuous Optimizer (COCO) platform.

Yet, the field of adaptive stochastic algorithms for black-box optimization is relatively young compared to the “classical optimization” field that includes convex and gradient-based optimization. For instance, the state-of-the art algorithms for unconstrained gradient based optimization like quasi-Newton methods (e.g. the BFGS method) date from the 1970s 35 while the stochastic derivative-free counterpart, CMA-ES dates from the early 2000s 39. Consequently, in some subdomains with important practical demands, not even the most fundamental and basic questions are answered:

Additionally, the development of stochastic adaptive methods for black-box optimization has been mainly
driven by heuristics and practice—rather than a general theoretical framework—validated by intensive computational simulations. Undoubtedly, this has been an asset as the scope of
possibilities for design was not restricted by mathematical frameworks for proving convergence.
In effect, powerful stochastic adaptive algorithms for unconstrained optimization like the CMA-ES algorithm emerged from this approach.
At the same time, naturally, theory strongly lags behind practice. For instance, the striking performances of CMA-ES empirically observed contrast with how little is theoretically proven on the method.
This situation is clearly not satisfactory. On the one hand, theory generally lifts performance assessment from an empirical level to a conceptual one, rendering results independent from the problem instances where they have been obtained. On the other hand, theory typically provides insights that change perspectives on some algorithm components. Also theoretical guarantees generally increase the trust in the reliability of a method and facilitate the task to make it accepted by wider communities.

Finally, as discussed above, the development of novel black-box algorithms strongly relies on scientific experimentation, and it is quite difficult to conduct proper and meaningful experimental analysis. This is well known for more than two decades now and summarized in this quote from Johnson in 1996

“the field of experimental analysis is fraught with pitfalls. In many ways, the implementation of an algorithm is the easy part. The hard part is successfully using that implementation to produce meaningful and valuable (and publishable!) research results.”
42

Since then, quite some progress has been made to set better standards in conducting scientific experiments and benchmarking. Yet, some domains still suffer from poor benchmarking standards and from the generic problem of the lack of reproducibility of results. For instance, in multiobjective optimization, it is (still) not rare to see comparisons between algorithms made by solely visually inspecting Pareto fronts after a fixed budget. In Bayesian optimization, good performance seems often to be due to insider knowledge not always well described in papers.

In the context of black-box numerical optimization previously described, the scientific positioning of the RandOpt ream is at the intersection between theory, algorithm design, and applications. Our vision is that the field of stochastic black-box optimization should reach the same level of maturity than gradient-based convex mathematical optimization. This entails major algorithmic developments for constrained, multiobjective and large-scale black-box optimization and major theoretical developments for analyzing current methods including the state-of-the-art CMA-ES.

The specificity in black-box optimization is that methods are intended to solve problems characterized by "non-properties"—non-linear, non-convex, non-smooth, non-Lipschitz. This contrasts with gradient-based optimization
and poses on the one hand some challenges when developing theoretical frameworks but also makes it compulsory to complement theory with empirical investigations.

Our ultimate goal is to provide software that is useful for practitioners. We see that theory is a means for this end (rather than an end in itself) and we also firmly belief that parameter tuning is part of the algorithm designer's task.

This shapes, on the one hand, four main scientific objectives for our team:

On the other hand, the above motivates our objectives with respect to dissemination and transfer:

The lines of research we intend to pursue is organized along four axis namely developing novel theoretical framework, developing novel algorithms, setting novel standards in scientific experimentation and benchmarking and applications.

Stochastic black-box algorithms typically optimize non-convex, non-smooth functions. This is possible because the algorithms rely on weak mathematical properties of the underlying functions: the algorithms do not use the derivatives—hence the function does not need to be differentiable—and, additionally, often do not use the exact function value but instead how the objective function ranks candidate solutions (such methods are sometimes called function-value-free).
(To illustrate a comparison-based update, consider an algorithm that samples

In the previous equation

The previous update moves the mean towards the

Additionally, adaptive stochastic optimization algorithms typically have a complex state space which encodes the parameters of a probability distribution (e.g. mean and covariance matrix of a Gaussian vector) and other state vectors. This state-space is a manifold. While the algorithms are Markov chains, the complexity of the state-space makes that standard Markov chain theory tools do not directly apply. The same holds with tools stemming from stochastic approximation theory or Ordinary Differential Equation (ODE) theory where it is usually assumed that the underlying ODE (obtained by proper averaging and limit for learning rate to zero) has its critical points inside the search space. In contrast, in the cases we are interested in, the critical points of the ODEs are at the boundary of the domain.

Last, since we aim at developing theory that on the one hand allows to analyze the main properties of state-of-the-art methods and on the other hand is useful for algorithm design, we need to be careful not to use simplifications that would allow a proof to be done but would not capture the important properties of the algorithms. With that respect one tricky point is to develop theory that accounts for invariance properties.

To face those specific challenges, we need to develop novel theoretical frameworks exploiting invariance properties and accounting for peculiar state-spaces. Those frameworks should allow researchers to analyze one of the core properties of adaptive stochastic methods, namely linear convergence on the widest possible class of functions.

We are planning to approach the question of linear convergence from three different complementary angles, using three different frameworks:

We expect those frameworks to be complementary in the sense that the assumptions required are different. Typically, the ODE framework should allow for proofs under the assumptions that learning rates are small enough while it is not needed for the Markov chain framework. Hence this latter framework captures better the real dynamics of the algorithm, yet under the assumption of scaling-invariance of the objective functions. Also, we expect some overlap in terms of function classes that can be studied by the different frameworks (typically convex-quadratic functions should be encompassed in the three frameworks). By studying the different frameworks in parallel, we expect to gain synergies and possibly understand what is the most promising approach for solving the holy grail question of the linear convergence of CMA-ES. We foresee for instance that similar approaches like the use of Foster-Lyapunov drift conditions are needed in all the frameworks and that intuition can be gained on how to establish the conditions from one framework to another one.

We are planning on developing algorithms in the subdomains with strong practical demand for better methods of constrained, multiobjective, large-scale and expensive optimization.

Many of the algorithm developments, we propose, rely on the CMA-ES method. While this seems to restrict our possibilities, we want to emphasize that CMA-ES became a family of methods over the years that nowadays include various techniques and developments from the literature to handle non-standard optimization problems (noisy, large-scale, ...). The core idea of all CMA-ES variants—namely the mechanism to adapt a Gaussian distribution—has furthermore been shown to derive naturally from first principles with only minimal assumptions in the context of derivative-free black-box stochastic optimization 53, 37. This is a strong justification for relying on the CMA-ES premises while new developments naturally include new techniques typically borrowed from other fields.
While CMA-ES is now a full family of methods, for visibility reasons, we continue to refer often to “the CMA-ES algorithm”.

Many (real-world) optimization problems have constraints related to technical feasibility, cost, etc.
Constraints are classically handled in the black-box setting either via rejection of solutions violating the constraints—which can be quite costly and even lead to quasi-infinite loops—or by penalization with respect to the distance to the feasible domain (if this information can be extracted) or with respect to the constraint function value 33. However, the penalization coefficient is a sensitive parameter that needs to be adapted in order to achieve a robust and general method 34.
Yet, the question of how to handle properly constraints is largely unsolved. Previous constraints handling for CMA-ES were ad-hoc techniques driven by many heuristics 34. Also, only recently it was pointed out that linear convergence properties should be preserved when addressing constraint problems 27.

Promising approaches though, rely on using augmented Lagrangians 27, 28. The augmented Lagrangian, here, is the objective function optimized by the algorithm. Yet, it depends on coefficients that are adapted online. The adaptation of those coefficients is the difficult part: the algorithm should be stable and the adaptation efficient. We believe that the theoretical frameworks developed (particularly the Markov chain framework) will be useful to understand how to design the adaptation mechanisms. Additionally, the question of invariance will also be at the core of the design of the methods: augmented Lagrangian approaches break the invariance to monotonic transformation of the objective functions, yet understanding the maximal invariance that can be achieved seems to be an important step towards understanding what adaptation rules should satisfy.

In the large-scale setting, we are interested to optimize problems with the order of

In this context, algorithms with a quadratic scaling (internal and in terms of number of function evaluations needed to optimize the problem) cannot be afforded. In CMA-ES-type algorithms, we typically need to restrict the model of the covariance matrix to have only a linear number of parameters to learn such that the algorithms scale linearly in terms of internal complexity, memory and number of function evaluations to solve the problem. The main challenge is thus to have rich enough models for which we can efficiently design proper adaptation mechanisms. Some first large-scale variants of CMA-ES have been derived. They include the online adaptation of the complexity of the model 26, 25. Yet, the type of Hessian matrices they can learn is restricted and not fully satisfactory. Different restricted families of distributions are conceivable and it is an open question which can be effectively learned and which are the most promising in practice.

Another direction, we want to pursue, is exploring the use of large-scale variants of CMA-ES to solve reinforcement learning problems 55.

Last, we are interested to investigate the very-large-scale setting. One approach consists in doing optimization in subspaces. This entails the efficient identification of relevant spaces and the restriction of the optimization to those subspaces.

Multiobjective optimization, i.e., the simultaneous optimization of multiple objective functions, differs from single-objective optimization in particular in its optimization goal. Instead of aiming at converging to the solution with the best possible function value, in multiobjective optimization, a set of solutions 3 is sought. This set, called Pareto-set, contains all trade-off solutions in the sense of Pareto-optimality—no solution exists that is better in all objectives than a Pareto-optimal one. Because converging towards a set differs from converging to a single solution, it is no surprise that we might lose many good convergence properties if we directly apply search operators from single-objective methods. However, this is what has typically been done so far in the literature. Indeed, most of the research in stochastic algorithms for multiobjective optimization focused instead on the so called selection part, that decides which solutions should be kept during the optimization—a question that can be considered as solved for many years in the case of single-objective stochastic adaptive methods.

We therefore aim at rethinking search operators and adaptive mechanisms to improve existing methods. We expect that we can obtain orders of magnitude better convergence rates for certain problem types if we choose the right search operators. We typically see two angles of attack: On the one hand, we will study methods based on scalarizing functions that transform the multiobjective problem into a set of single-objective problems. Those single-objective problems can then be solved with state-of-the-art single-objective algorithms. Classical methods for multiobjective optimization fall into this category, but they all solve multiple single-objective problems subsequently (from scratch) instead of dynamically changing the scalarizing function during the search. On the other hand, we will improve on currently available population-based methods such as the first multiobjective versions of the CMA-ES. Here, research is needed on an even more fundamental level such as trying to understand success probabilities observed during an optimization run or how we can introduce non-elitist selection (the state of the art in single-objective stochastic adaptive algorithms) to increase robustness regarding noisy evaluations or multi-modality. The challenge here, compared to single-objective algorithms, is that the quality of a solution is not anymore independent from other sampled solutions, but can potentially depend on all known solutions (in the case of three or more objective functions), resulting in a more noisy evaluation as the relatively simple function-value-based ranking within single-objective optimizers.

In the so-called expensive optimization scenario, a single function evaluation might take several minutes or even hours in a practical setting. Hence, the available budget in terms of number of function evaluation calls to find a solution is very limited in practice. To tackle such expensive optimization problems, it is needed to exploit the first few function evaluations in the best way. To this end, typical methods couple the learning of a surrogate (or meta-model) of the expensive objective function with traditional optimization algorithms.

In the context of expensive optimization and CMA-ES, which usually shows its full potential when the number

Numerical experimentation is needed as a complement to theory to test novel ideas, hypotheses, the stability of an algorithm, and/or to obtain quantitative estimates. Optimally, theory and experimentation go hand in hand, jointly guiding the understanding of the mechanisms underlying optimization algorithms. Though performing numerical experimentation on optimization algorithms is crucial and a common task, it is non-trivial and easy to fall in (common) pitfalls as stated by J. N. Hooker in his seminal paper 40.

In the RandOpt team we aim at raising the standards for both scientific experimentation and benchmarking.

On the experimentation aspect, we are convinced that there is common ground over how scientific experimentation should be done across many (sub-)domains of optimization, in particular with respect to the visualization of results, testing extreme scenarios (parameter settings, initial conditions, etc.), how to conduct understandable and small experiments, how to account for invariance properties, performing scaling up experiments and so forth. We therefore want to formalize and generalize these ideas in order to make them known to the entire optimization community with the final aim that they become standards for experimental research.

Extensive numerical benchmarking, on the other hand, is a compulsory task for evaluating and comparing the performance of algorithms. It puts algorithms to a standardized test and allows to make recommendations which algorithms should be used preferably in practice. To ease this part of optimization research, we have been developing the Comparing Continuous Optimizers platform (COCO) since 2007 which allows to automatize the tedious task of benchmarking. It is a game changer in the sense that the freed time can now be spent on the scientific part of algorithm design (instead of implementing the experiments, visualization, statistical tests, etc.) and it opened novel perspectives in algorithm testing. COCO implements a thorough, well-documented methodology that is based on the above mentioned general principles for scientific experimentation.

Also due to the freely available data from 300+ algorithms benchmarked with the platform, COCO became a quasi-standard for single-objective, noiseless optimization benchmarking. It is therefore natural to extend the reach of COCO towards other subdomains (particularly constrained optimization, many-objective optimization) which can benefit greatly from an automated benchmarking methodology and standardized tests without (much) effort. This entails particularly the design of novel test suites and rethinking the methodology for measuring performance and more generally evaluating the algorithms. Particularly challenging is the design of scalable non-trivial testbeds for constrained optimization where one can still control where the solutions lies. Other optimization problem types, we are targeting are expensive problems (and the Bayesian optimization community in particular, see our AESOP project), optimization problems in machine learning (for example parameter tuning in reinforcement learning), and the collection of real-world problems from industry.

Another aspect of our future research on benchmarking is to investigate the large amounts of benchmarking data, we collected with COCO during the years. Extracting information about the influence of algorithms on the best performing portfolio, clustering algorithms of similar performance, or the automated detection of anomalies in terms of good/bad behavior of algorithms on a subset of the functions or dimensions are some of the ideas here.

Last, we want to expand the focus of COCO from automatized (large) benchmarking experiments towards everyday experimentation, for example by allowing the user to visually investigate algorithm internals on the fly or by simplifying the set up of algorithm parameter influence studies.

Applications of black-box algorithms occur in various domains. Industry but also researchers in other academic domains have a great need to apply black-box algorithms on a daily basis. Generally, we do not target a specific application domain and are interested in black-box applications stemming from various origins. This is to us intrinsic to the nature of the methods we develop that are general purpose algorithms. Hence our strategy with respect to applications can be considered as opportunistic and our main selection criteria when approached by colleagues who want to develop a collaboration around an application is whether we find the application interesting and valuable: that means the application brings new challenges and/or gives us the opportunity to work on topics we already intended to work on, and it brings, in our judgement, an advancement to society in the application domain.

The concrete applications related to industrial collaborations we are currently dealing with are:

We are concerned about CO2 footprint. This year we attended online conferences only and had all our international collaboration meetings online. We discourage oversea conferences when far away. In case the situation with respect to covid goes back to normal with respect to be able to travel, we will still be happy to travel less than in the past and to be able to attend some conferences from home.

We develop general purpose optimization methods that apply in difficult optimization contexts where little is required on the function to be optimized. Application domains include optimization and design of renewable systems and climate change.

Our main method CMA-ES is transferred and widely used. The code stemming from the team is largely downlowded (see Section 7). Among the usage of our method and our code, we find naturally problems in the domain of energy to capture carbon dioxide 51, 48, 52, solar energy 44, 45, or wind-thermal power systems 54.

Those publications witness the impact of our research results with respect to research questions and engineering design related to climate change and renewable energy.

In the context of our collaboration with the company Storengy, we can report another, even more immediate environmental impact. Storengy plans to use our CMA-ES method (and variants developed during the project) to optimize the underground storage of hydrogen—as a CO2-neutral alternative to natural gas.

We published the paper 13 that summarizes our work on benchmarking over ten years and its implementation in the COCO platform. It highlights our impact of this research by providing usage and dataset numbers. We report 300+ algorithm data sets available online, 140+ workshop papers written with the support of the software, and 1800+ citations to the various papers, documenting software, test problems, and experimental procedure.

We have been invited to write an article for the ACM SIGEVO newsletter 24 after receiving the ACM impact award for our 2010 GECCO paper 38. It acknowledges the impact of research on benchmarking, the COCO platform and the organization of our Black-Box-Optimization workshops.

The RandOpt team maintains and further develops the two software libraries CMA-ES and COCO with around 149 commits in 2021. The shutdown of gforge.inria.fr created a considerable overhead in particular for the maintenance of COCO.

As an indicator of the research and software impact, the Figure 1 shows weekly downloads during the second half of 2021 of software packages developed by the RandOpt team and of the cmaes package developed by Masashi Shibata (as a direct competitor of the cma module more tailored to machine learning applications but also based on RandOpt research results). The cma module has currently almost 30,000 weekly downloads, the cmaes module has close to 200,000 weekly downloads. The cma module from RandOpt has been downloaded overall more than 2 million times.

External collaborators: Youhei Akimoto (Tsukuba University), Tobias Glasmachers (Ruhr University, Bochum)

We have proven the global linear convergence of the (1+1)-ES with one-fifth success rule as step-size adaptation on a class of functions that embed smooth strongly convex functions and positively homogeneous functions. Because of the invariance to monotonic transformations, the study holds for non-continuous and non-convex functions. Arguably, our study provides the first proof of the linear convergence of an adaptive evolution strategy without modifying its underlying updates (to make a proof work) on such a wide class of functions 20.

Over the past years, we have developed a methodology to analyze the linear convergence of adaptive comparison-based algorithms including Evolution Strategies by studying the stability of underlying Markov chains. This methodology allows to derive convergence on so-called scaling-invariant functions. Yet this class of functions has not been studied in the past such that we needed to derive important mathematical properties that are needed to conduct our convergence studies. Based on the work of the master thesis of Armand Gissler, we published a theoretical analysis of the link between scaling-invariant functions and positively homogeneous ones 14.

We have analyzed the linear convergence of a step-size adaptive

We investigated the combination of Augmented Lagrangian methods with Evolution Strategies for constrained optimization problems in 16. We experimented on a small set of benchmark problems and exhibited failure cases of the Augmented Lagrangian technique in this context. These preliminary results suggest that surrogate modeling of the constraints may overcome some of these difficulties.

During an internship, Clément Micol from ENSTA investigated the distribution of the incumbent mean of the sample distribution in

During his internship, Jingyun Yang from Ecole Polytechnique developed a new version of the surrogate-assisted lq-CMA-ES of 36 that employs a quadratic, tri-diagonal surrogate as its forth model.

External collaborators: Youhei Akimoto (Tsukuba University), Tea Tušar (Jozef Stefan Institute)

A central theme for the team is the design, analysis, and benchmarking of multi-objective optimization algorithms. In 2021, we have progressed in several ways on those aspects.

In terms of algorithm design, we improved the COMO-CMA-ES algorithm from 57 towards an algorithm that aims at converging to the entire Pareto front. Restarts and new stopping criteria allow to also handle multi-modal objective functions effectively. A publication is in progress. In parallel, in her thesis, Eugénie Marescaux studied the convergence of the ensuing solver from a theoretical standpoint and was able to prove, under proper assumptions, that the algorithm converges towards the entire Pareto front 21. On the contrary, also lower bounds of how fast multiobjective algorithms can convergence optimally towards the optimal hypervolume indicator have been established 17. The mentioned COMO-CMA-ES algorithm has also, to a large extend, contributed to the PhD thesis of Cheikh Touré 18.

To document and promote our benchmarking efforts with the COCO platform, we have published the description of the bi-objective bbob-biobj and bbob-biobj-ext test suites in the Evolutionary Computation journal 12. New empirical benchmarking results from the classical Direct Multisearch (DMS) and MultiGLODS algorithms have been prepared and described as well 15.

Our publicly available Python modules, the MO-archiving module and the module implementing the COMO-CMA-ES algorithm, slowly started to get picked up after their release in 2020 with, on average, 1–2 downloads per day on PyPi, the python package index.

External collaborators: Olaf Mersmann (TH Köln), Raymond Ros (U Paris-Saclay), Tea Tušar (Jozef Stefan Institute)

Benchmarking is an important task in optimization in order to assess and compare the performance of algorithms as well as to motivate the design of better solvers. We are leading the benchmarking of derivative free solvers in the context of difficult problems: we have been developing methodologies and testbeds as well as assembled this into a platform automatizing the benchmarking process. This is a continuing effort that we are pursuing in the team.

The COCO platform, developed at Inria first in the TAO team and then in Randopt since 2007, aims at automatizing numerical benchmarking experiments and the visual presentation of their results. The platform consists of an experimental part to generate benchmarking data (in various programming languages) and a postprocessing module (in python), see Figure 2. At the interface between the two, we provide data sets from numerical experiments of 300+ algorithms and algorithm variants from various fields (quasi-Newton, derivative-free optimization, evolutionary computing, Bayesian optimization) and for various problem characteristics (noiseless/noisy optimization, single-/multi-objective optimization, continuous/mixed-integer, ...).

The main innovations and methodological ideas of the platform have been published in the paper 13.

We have been using the platform in the past to initiate workshop papers during the ACM-GECCO conference as well as to collect algorithm data sets from the entire optimization community (300+ so far over the different test suites). The next workshop in this series is going to take place in 20224 and we also held a workshop in 2021.

In this context, we constantly improve and extend the software and provide additional data from benchmarking experiments. In 2021, four new data sets have been collected from the BBOB-2021 participants and which are available via the COCO postprocessing module.

The largest effort, this year, went into the finalization of a new test suite for constrained optimization (work still in progress).

Due to the covid-19 pandemic, no physical visits to other teams have been made.

The three permanent members are frequent reviewers for major journals in Evolutionary Computation. Anne Auger is a frequent reviewer of mathematical optimization journal (JOGO, SIAM OPT). We additionally review papers in Machine Learning related to optimization for JMLR, Machine Learning.