Section: Research Program
Multi-armed bandit problems, prediction with limited feedback
We are interested in settings in which the feedback obtained on the predictions is limited, in the sense that it does not fully reveal what actually happened.
Bandit problems
This is also a sequential problem in which some regret is to be minimized.
However, this problem is a stochastic problem: a large number of arms, possibly indexed by a continuous set like
A generalization of the regret: the approachability of sets
Approachability is the ability to control random walks. At each round, a vector payoff is obtained by the first player, depending on his action and on the action of the opponent player. The aim is to ensure that the average of the vector payoffs converges to some convex set. Necessary and sufficient conditions were obtained by Blackwell and others to ensure that such strategies exist, both in the full information and in the bandit cases.
Some of these results can be extended to the case of games with signals (games with partial monitoring), where at each round the only feedback obtained by the first player is a random signal drawn according to a distribution that depends on the action profile taken by the two players, while the opponent player still has a full monitoring.