SIERRA - 2011 - Rapport annuel d'activité

SIERRA

SIERRA - 2011

Project Team Sierra

Members

Overall Objectives

Scientific Foundations

Application Domains

Application Domains

Software

New Results

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: New Results

Deviations of Stochastic Bandit Regret

Participant : Jean-Yves Audibert.

Collaboration with: Antoine Salomon (École des Ponts)

The work [19] studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays $n$ is known beforehand by the agent, previous works exhibit a policy such that with probability at least $1 - 1 / n$ , the regret of the policy is of order $log n$ . They have also shown that such a property is not shared by the popular ucb1 policy. This work first answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms.

Previous |

Home | Next next