SEQUEL - 2017 - Annual activity report

SEQUEL

SEQUEL - 2017

Project-Team Sequel

Personnel

Overall Objectives

Presentation

Research Program

Application Domains

Sequential decision making under uncertainty and prediction

Highlights of the Year

New Software and Platforms

New Results

Bilateral Contracts and Grants with Industry

Bilateral Contracts with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: Partnerships and Cooperations

International Initiatives

With CWI

Title: Non-parametric sequential prediction project
- Centrum Wiskunde & Informatica (CWI), Amsterdam (NL) - Peter Grünwald
Duration: 2016 - 2018
Start year: 2016
Abstract: The aim is to develop the theory of learning for sequential decision making under uncertainty problems.

In 2017, this collaboration involved D. Ryabko, É. Kaufmann, J. Ridgway, M. Valko, O. Maillard. A post-doc funded by Inria has been recruited in Fall 2016.
https://project.inria.fr/inriacwi/projects/non-parametric-sequential-prediction-project/

EduBand

Title: Educational Bandits
International Partner (Institution - Laboratory - Researcher):
- Carnegie Mellon University (United States) - Department of Computer Science, Theory of computation lab - Emma Brunskill
Start year: 2015
See also: https://project.inria.fr/eduband/
Education can transform an individual's capacity and the opportunities available to him. The proposed collaboration will build on and develop novel machine learning approaches towards enhancing (human) learning. Massive open online classes (MOOCs) are enabling many more people to access education, but mostly operate using status quo teaching methods. Even more important than access is the opportunity for online software to radically improve the efficiency, engagement and effectiveness of education. Existing intelligent tutoring systems (ITSs) have had some promising successes, but mostly rely on learning sciences research to construct hand-built strategies for automated teaching. Online systems make it possible to actively collect substantial amount of data about how people learn, and offer a huge opportunity to substantially accelerate progress in improving education. An essential aspect of teaching is providing the right learning experience for the student, but it is often unknown a priori exactly how this should be achieved. This challenge can often be cast as an instance of decision-making under uncertainty. In particular, prior work by Brunskill and colleagues demonstrated that reinforcement learning (RL) and multi-arm bandit (MAB) can be very effective approaches to solve the problem of automated teaching. The proposed collaboration is thus intended to explore the potential interactions of the fields of online education and RL and MAB. On the one hand, we will define novel RL and MAB settings and problems in online education. On the other hand, we will investigate how solutions developed in RL and MAB could be integrated in ITS and MOOCs and improve their effectiveness.

Allocate

Participants : Pierre Perrault, Julien Seznec, Michal Valko, Émilie Kaufmann, Odalric Maillard.

Title: Adaptive allocation of resources for recommender systems
Inria contact: Michal Valko
International Partner (Institution - Laboratory - Researcher):
- Univertät Potsdam, Germany A. Carpentier
Start year: 2017
We plan to improve a practical scenario of resource allocation in market surveys, such as product appraisals and music recommendation. In practice, the market is typically divided into segments: geographic regions, age groups, ...These groups are then queried for preference with some fixed rule of a number of queries per group. This testing is costly and non-adaptive. The reason is some groups are easier to estimate than others, but this is impossible to know a priori. Our challenge is adaptively allocate the optimal number of samples to each group and improve the efficient of market studies, by providing sample-efficient solutions.

Informal International Partners

Adobe Research

Branislav Kveton Collaborator
Zheng Wen Collaborator
Sharan Vaswani Collaborator
M. Valko collaborated with Adobe Research on online influence maximization in social networks. This led to a publication in NIPS 2017.

Massachusetts Institute of Technology

Victor-Emmanuel Brunel Collaborator
M. Valko collaborated with V.-E. Brunel on the estimation of low rank determinantal point processes useful for diverse recommender systems.

Univertät Potsdam

Alexandra Carpentier Collaborator
M. Valko collaborated with A. Carpentier on adaptive estimation of the block-diagonal matrices with application to market segmentations. This collaboration formalized in September 2017 by creating a north-european associate team.

University of California, Berkeley

Victor Gabillon Collaborator
M. Valko collaborated with V. Gabillon on the sample complexities in unknown type of environments.

University of Southern California

Haipeng Luo Collaborator
M. Valko collaborated with H. Luo on online submodular minimization.

Adobe Research

Mohammad Ghavamzadeh Collaborator
A. Lazaric collaborated with Adobe Research on active learning for accurate estimation of linear models. This led to a publication in ICML 2017.

Stanford University

Carlos Riquelme Collaborator
A. Lazaric collaborated with Carlos Riquelme on active learning for accurate estimation of linear models. This led to a publication in ICML 2017.

Stanford University

Emma Brunskill Collaborator
A. Lazaric collaborated with Emma Brunskill on exploration-exploitation with options in reinforcement learning. This led to a publication in NIPS 2017.

University of California, Irvine

Anima Anandkumar Collaborator
Kamyar Azzizade Collaborator
A. Lazaric collaborated with A. Anandkumar and K. Azzizade on exploration-exploitation with in reinforcement learning with state clustering. This led to a submission to AI&Stats 2018.

University of Leoben

Ronald Ortner Collaborator
A. Lazaric collaborated with R. Ortner on exploration-exploitation in reinforcement learning with regularized optimization. This will lead to a submission to ICML 2018.

Politecnico di Milano

Marcello Restelli Collaborator
Matteo Pirotta collaborate with M. Restelli on several topics in reinforcement learning. This will lead to publications to ICML 2017 and NIPS 2017.

Lancaster University

B. Balle Collaborator
O. Maillard collaborated on spectral learning of Hankel matrices. This led to a publication at ICML.

Mila, Université de Montréal

A. Courville Collaborator
F. Strub and O. Pietquin collaborate on deep reinforcement learning for language acquisition. This led to several papers at IJCAI, CVPR, and NIPS, as well as the guesswhat?! dataset and protocol, and the HOME dataset.

Uberlandia University, Brasil

C. Felicio Collaborator
Ph. Preux supervises this PhD on recommendation systems. This led to the defense of C. Felicio and a paper at UMAP.

International Initiatives

SequeL
Title: The multi-armed bandit problem
International Partner (Institution - Laboratory - Researcher):
- University of Leoben (Austria) Peter Auer
Duration: 2014 - 2018
Start year: 2014
In a nutshell, the collaboration is focusing on nonparametric algorithms for active learning problems, mainly involving theoretical analysis of reinforcement learning and bandits problems beyond the traditional settings of finite-state MDPs (for RL) or i.i.d. rewards (for bandits). Peter Auer from University of Leoben is a worldwide leader in the field, having introduced the UCB approach around 2000, along with its finite-time analysis. Today, SequeL is likely to be the largest research group working in this field in the world, enjoying worldwide recognition. SequeL and P. Auer's group have been collaborating for a couple of years now; they have co-authored papers, visited each other (sabbatical stay, post-doc), coorganized workshops; the STREP Complacs partially funds this very active collaboration.

International Initiatives

Contextual multi-armed bandits with hidden structure
Title: Contextual multi-armed bandits with hidden structure
International Partner (Institution - Laboratory - Researcher):
- IISc Bangalore (India) – Aditya Gopalan
Duration: 2015 - 2017
Recent advances in Multi-Armed Bandit (MAB) theory have yielded key insights into, and driven the design of applications in, sequential decision making in stochastic dynamical systems. Notable among these are recommender systems, which have benefited greatly from the study of contextual MABs incorporating user-specific information (the context) into the decision problem from a rigorous theoretical standpoint. In the proposed initiative, the key features of (a) sequential interaction between a learner and the users, and (b) a relatively small number of interactions per user with the system, motivate the goal of efficiently exploiting the underlying collective structure of users. The state-of-the-art lacks a wellgrounded strategy with provably near-optimal guarantees for general, low-rank user structure. Combining expertise in the foundations of MAB theory together with recent advances in spectral methods and low-rank matrix completion, we target the first provably near-optimal sequential low-rank MAB

Previous |

Home | Next next