DYOGENE - 2013 - Rapport annuel d'activité

DYOGENE

DYOGENE - 2013

Project-Team Dyogene

Members

Overall Objectives

Research Program

Application Domains

Software and Platforms

SINR-based k-coverage probability in cellular networks

New Results

Bilateral Contracts and Grants with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: New Results

Two-target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards

In [16] , we consider an infinite-armed bandit problem with Bernoulli rewards. The mean rewards are independent, uniformly distributed over $[0, 1]$ . Rewards 0 and 1 are referred to as a success and a failure, respectively. We propose a novel algorithm where the decision to exploit any arm is based on two successive targets, namely, the total number of successes until the first failure and until the first $m$ failures, respectively, where $m$ is a fixed parameter. This two-target algorithm achieves a long-term average regret in $\sqrt{2 n}$ for a large parameter $m$ and a known time horizon $n$ . This regret is optimal and strictly less than the regret achieved by the best known algorithms, which is in $2 \sqrt{n}$ . The results are extended to any mean-reward distribution whose support contains 1 and to unknown time horizons. Numerical experiments show the performance of the algorithm for finite time horizons.

Previous |

Home | Next next