EN FR
EN FR


Section: New Results

New results for stochastic bandits

M. Brégère and G. Stoltz, in collaboration with P. Gaillard (Sierra team) and Y. Goude (EDF), provided a methodology in [5] based on a modeling by linear bandits, for managing (influencing) electricity consumption by sending tariff incentives. The main result is the very modeling of the problem: consumption is modeled as a generalized additive model based on the probabilistic allocation of tariffs picked and on the context (given by the type of day, hour of the day, weather conditions, etc.). Mathematical results are, on the other hand, direct extensions of earlier results for the LinUCB algorithm (see Li et al., 2010; Chu et al., 2011; Abbasi-Yadkori et al., 2011). Simulations on realistic data are provided: for bandit algorithms, one needs a data simulator, which we created based on an open data set consisting of households in London.

A second important result was obtained by H. Hadiji: he characterized the cost of adaptation to the unknown (Höderian) smoothness payoff functions in continuum-armed bandits [14]. He first rewrote and slightly extended the regret lower bounds exhibited by Locatelli and Carpentier (2018), and then exhibited an algorithm with matching regret upper bounds. This algorithm, unlike virtually all previous algorithms in X-armed bandits, which zoomed in as time passes, zooms out as time passes. This solves a problem that had been open for several years.

Also, H. Hadiji and G. Stoltz, in collaboration with P. Ménard (SequeL team) and A. Garivier, submitted a revised version of their results of simultaneous optimality (from both a distribution-dependent and a distribution-free viewpoints) for a variant of the KL-UCB algorithm in the case of vanilla K-armed stochastic bandits [22].