Section: New Results

Sequential learning

Participants : Pierre Gaillard, Gilles Stoltz.

Bandit problems

The article [30] revisits asymptotically optimal results of Lai and Robbins, Burnetas and Katehakis in a non-asymptotic way. A preliminary attempt was mentioned in the 2011 annual report; it was concerned (essentially) with the case of Bernoulli distributions over the arms. We achieve here the stated optimality of the regret bounds for larger models: regular exponential families; finitely supported distributions.

Theoretical results for the prediction of arbitrary sequences

We generalize and unify in [24] several notions of regret under a same banner: these include adaptive regret (regret against a fixed convex combination on subintervals of the time); shifting regret (regret against a slowly evolving target sequence of convex combinations); and discounted regret (when the instances are weighted with weights depending on how recent the instances are). We recover and sometimes improve some earlier bounds.

Forecasting of the production data of oil reservoirs

We applied our sequential aggregation techniques to a new data set, with IFP Energies nouvelles as a partner. The goal was to aggregate in a sequential fashion the forecasts made by some (about 100) base experts in order to predict some behaviors (gas/oil ratio, cumulative oil extracted, water cut) of the exploitation of some oil wells. Results were obtained with the help of an intern, Charles-Pierre Astolfi, and are described in the technical report [27] (to be transformed into a regular journal / conference paper next year).