MAGNET - 2018 - Rapport annuel d'activité

MAGNET

MAGNET - 2018

Project-Team Magnet

Team, Visitors, External Collaborators

Overall Objectives

Presentation

Research Program

Application Domains

Domain 1

Highlights of the Year

New Software and Platforms

New Results

On the Bernstein-Hoeffding Method
IncGraph: Incremental graphlet counting for topology optimisation
Graph sampling with applications to estimating the number of pattern embeddings and the parameters of a statistical relational model
A machine learning based framework to identify and classify long terminal repeat retrotransposons
A Distributed Frank-Wolfe Framework for Learning Low-Rank Matrices with the Trace Norm
Personalized and Private Peer-to-Peer Machine Learning
Hiding in the Crowd: A Massively Distributed Algorithm for Private Averaging with Malicious Adversaries
A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images
A Framework for Understanding the Role of Morphology in Universal Dependency Parsing
Online Reciprocal Recommendation with Theoretical Performance Guarantees
On Similarity Prediction and Pairwise Clustering
A Probabilistic Theory of Supervised Similarity Learning for Pointwise ROC Curve Optimization
Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds
Nonstochastic Bandits with Composite Anonymous Feedback

Bilateral Contracts and Grants with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: New Results

Nonstochastic Bandits with Composite Anonymous Feedback

We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over at most d consecutive steps in an adversarial way. This implies that the instantaneous loss observed by the player at the end of each round is a sum of as many as d loss components of previously played actions. Hence, unlike the standard bandit setting with delayed feedback, here the player cannot observe the individual delayed losses, but only their sum. Our main contribution is a general reduction transforming a standard bandit algorithm into one that can operate in this harder setting. We also show how the regret of the transformed algorithm can be bounded in terms of the regret of the original algorithm. Our reduction cannot be improved in general: we prove a lower bound on the regret of any bandit algorithm in this setting that matches (up to log factors) the upper bound obtained via our reduction. Finally, we show how our reduction can be extended to more complex bandit settings, such as combinatorial linear bandits and online bandit convex optimization ([10]).

Previous |

Home | Next next