Section: New Results
Sequential learning with limited feedback; in particular, bandit problems
Participants : Gilles Stoltz, Jia Yuan Yu.
Some of the results cited below are summarized or stated as open problems in the habilitation thesis  .
We achieved three contributions. The first is described in the conference paper  : it revisits asymptotically optimal results of Lai and Robbins, Burnetas and Katehakis in a non-asymptotic way. The second is stated in the journal article  and is concerned with obtaining fast convergence rates for the regret in case of a continuum of arms (of course under some regularity and topological assumptions on the mean-payoff function ).
The third one is detailed in  and started from the following observation. Typical results in the bandit literature were of the following form: if the regularity of the mean-payoff function is known (or if a bound on it is known) then the regret is small. Actually, results were usually taking the following weaker form: when the algorithm is tuned with some parameters, then the regret is small against a certain class of stochastic environments. The question was thus to have an adaptive procedure, that, given one unknown environment (with unknown regularity), ensures that the regret is asymptotically small; even better, the desired aim was to control the regret in some uniform manner (in a distribution-free sense up to the regularity parameters). As described in this conference paper, a solution was achieved in the case of Lipschitz environments.
Approachability in games with partial monitoring
The conference paper  explains how we could re-obtain, in a simple, more straightforward, and computationally efficient manner a result proven by Perchet in his PhD thesis: the necessary and sufficient condition for the approachability of a closed convex set under partial monitoring.