SEQUEL - 2012 - Annual activity report

SEQUEL

SEQUEL - 2012

Project-Team Sequel

Members

Overall Objectives

Scientific Foundations

Application Domains

Software

New Results

Bilateral Contracts and Grants with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: Partnerships and Cooperations

European Initiatives

FP7 Projects

PASCAL-2

Participants: the whole SequeL team is involved.

Title: Pattern Analysis, Statistical Modeling, and Computational Learning
Type: Cooperation (ICT), Network of Excellence (NoE)
Coordinator: Univ. Southampton
Others partners: Many european organizations, universities, and research centers.
Web site: http://www.pascal-network.org/
Duration: March 2008 - February 2013

PASCAL-2 Pump Priming Programme

Participants: Mohammad Ghavamzadeh, Rémi Munos.

Title: Sparse Reinforcement Learning in High Dimensions
Type: PASCAL-2 Pump Priming Programme
Partners: Inria Lille - Nord Europe, Shie Mannor (Technion, Israel)
Web site: http://sites.google.com/site/sparserl/home
Duration: November 2009 - September 2012
Abstract: With the explosive growth and ever increasing complexity of data, developing theory and algorithms for learning with high-dimensional data has become an important challenge in statistical machine learning. Although significant advances have been made in recent years, most of the research efforts have been focused on supervised learning problems. We propose to design, analyze, and implement reinforcement learning algorithms for high-dimensional domains. We will investigate the possibility of using the recent results in l1-regularization and compressive sensing in reinforcement learning.
Activity report: The project ended early this year. The list of publications obtained within the project is listed at https://sites.google.com/site/sparserl/publications .

CompLACS

Participants: Mohammad Ghavamzadeh, Nathan Korda, Prashanth Lakshmanrao Anantha Padmanabha, Alessandro Lazaric, Rémi Munos, Philippe Preux, Daniil Ryabko, Michal Valko.

Title: Composing Learning for Artificial Cognitive Systems
Type: Cooperation (ICT), Specific Targeted Research Project (STREP)
Coordinator: University College of London
Other partners: University College London, United Kingdom (John Shawe-Taylor, Stephen Hailes, David Silver, Yee Whye Teh), University of Bristol, United Kingdom (Nello Cristianini), Royal Holloway, United Kingdom (Chris Watkins), Radboud Universiteit Nijmegen, The Netherlands (Bert Kappen), Technische Universitat Berlin, Germany (Manfred Opper), Montanuniversitat Leoben, Austria (Peter Auer), Max-Planck Institute of Biological Cybernetics, Germany (Jan Peters).
Web site: http://www.complacs.org/
Duration: March 2011 - February 2015
Abstract: One of the aspirations of machine learning is to develop intelligent systems that can address a wide variety of control problems of many different types. However, although the community has developed successful technologies for many individual problems, these technologies have not previously been integrated into a unified framework. As a result, the technology used to specify, solve and analyse one control problem typically cannot be reused on a different problem. The community has fragmented into a diverse set of specialists with particular solutions to particular problems. The purpose of this project is to develop a unified toolkit for intelligent control in many different problem areas. This toolkit will incorporate many of the most successful approaches to a variety of important control problems within a single framework, including bandit problems, Markov Decision Processes (MDPs), Partially Observable MDPs (POMDPs), continuous stochastic control, and multi-agent systems. In addition, the toolkit will provide methods for the automatic construction of representations and capabilities, which can then be applied to any of these problem types. Finally, the toolkit will provide a generic interface to specifying problems and analysing performance, by mapping intuitive, human-understandable goals into machine-understandable objectives, and by mapping algorithm performance and regret back into human-understandable terms.
Activity report: We worked on WorkPackage 2 (multi-armed bandits and extensions) and we designed hierarchical bandit-based planning algorithms for MDPs and POMDPs.

Previous |

Home | Next next