Section: New Results

Sequential learning with limited feedback; in particular, bandit problems

Participants : Gilles Stoltz, Jia Yuan Yu.

Some of the results cited below are summarized or stated as open problems in the habilitation thesis [11] .

Bandit problems

We achieved three contributions. The first is described in the conference paper [27] : it revisits asymptotically optimal results of Lai and Robbins, Burnetas and Katehakis in a non-asymptotic way. The second is stated in the journal article [19] and is concerned with obtaining fast convergence rates for the regret in case of a continuum of arms (of course under some regularity and topological assumptions on the mean-payoff function f).

The third one is detailed in [24] and started from the following observation. Typical results in the bandit literature were of the following form: if the regularity of the mean-payoff function f is known (or if a bound on it is known) then the regret is small. Actually, results were usually taking the following weaker form: when the algorithm is tuned with some parameters, then the regret is small against a certain class of stochastic environments. The question was thus to have an adaptive procedure, that, given one unknown environment (with unknown regularity), ensures that the regret is asymptotically small; even better, the desired aim was to control the regret in some uniform manner (in a distribution-free sense up to the regularity parameters). As described in this conference paper, a solution was achieved in the case of Lipschitz environments.

Approachability in games with partial monitoring

The conference paper [28] explains how we could re-obtain, in a simple, more straightforward, and computationally efficient manner a result proven by Perchet in his PhD thesis: the necessary and sufficient condition for the approachability of a closed convex set under partial monitoring.