Section: New Results
Sequential learning with limited feedback; in particular, bandit problems
Participants : Gilles Stoltz, Jia Yuan Yu.
Some of the results cited below are summarized or stated as open problems in the habilitation thesis [11] .
Bandit problems
We achieved three contributions. The first is described in the conference paper [27] : it revisits asymptotically optimal results of Lai and Robbins, Burnetas and Katehakis in a non-asymptotic way. The second is stated in
the journal article [19] and is concerned with obtaining fast convergence rates for the regret in case of a continuum of arms (of course under some regularity and topological assumptions on the mean-payoff function
The third one is detailed in [24] and started from the following observation.
Typical results in the bandit literature were of the following form: if the regularity of the mean-payoff function
Approachability in games with partial monitoring
The conference paper [28] explains how we could re-obtain, in a simple, more straightforward, and computationally efficient manner a result proven by Perchet in his PhD thesis: the necessary and sufficient condition for the approachability of a closed convex set under partial monitoring.