Section: Partnerships and Cooperations

National Initiatives

ANR ExTra-Learn

Participants : Alessandro Lazaric, Jérémie Mary, Rémi Munos, Michal Valko.

  • Title: Extraction and Transfer of Knowledge in Reinforcement Learning

  • Type: National Research Agency (ANR-9011)

  • Coordinator: Inria Lille (A. Lazaric)

  • Duration: 2014-2018

  • Abstract: ExTra-Learn is directly motivated by the evidence that one of the key features that allows humans to accomplish complicated tasks is their ability of building knowledge from past experience and transfer it while learning new tasks. We believe that integrating transfer of learning in machine learning algorithms will dramatically improve their learning performance and enable them to solve complex tasks. We identify in the reinforcement learning (RL) framework the most suitable candidate for this integration. RL formalizes the problem of learning an optimal control policy from the experience directly collected from an unknown environment. Nonetheless, practical limitations of current algorithms encouraged research to focus on how to integrate prior knowledge into the learning process. Although this improves the performance of RL algorithms, it dramatically reduces their autonomy. In this project we pursue a paradigm shift from designing RL algorithms incorporating prior knowledge, to methods able to incrementally discover, construct, and transfer “prior” knowledge in a fully automatic way. More in detail, three main elements of RL algorithms would significantly benefit from transfer of knowledge. (i) For every new task, RL algorithms need exploring the environment for a long time, and this corresponds to slow learning processes for large environments. Transfer learning would enable RL algorithms to dramatically reduce the exploration of each new task by exploiting its resemblance with tasks solved in the past. (ii) RL algorithms evaluate the quality of a policy by computing its state-value function. Whenever the number of states is too large, approximation is needed. Since approximation may cause instability, designing suitable approximation schemes is particularly critical. While this is currently done by a domain expert, we propose to perform this step automatically by constructing features that incrementally adapt to the tasks encountered over time. This would significantly reduce human supervision and increase the accuracy and stability of RL algorithms across different tasks. (iii) In order to deal with complex environments, hierarchical RL solutions have been proposed, where state representations and policies are organized over a hierarchy of subtasks. This requires a careful definition of the hierarchy, which, if not properly constructed, may lead to very poor learning performance. The ambitious goal of transfer learning is to automatically construct a hierarchy of skills, which can be effectively reused over a wide range of similar tasks.

  • Activity Report: Research in ExTra-Learn focused on how to effectively transfer knowledge from an external expert as in apprenticeship learning. This is an important step towards automatic transfer because it digs into the problem of how knowledge of an expert can be integrated into the learning process. This investigation led to the publication of two papers at IJCAI'15. In 2015 a number of activities has also started. Ronan Fruit has been recruited for a PhD started in December. The main focus of the PhD will be related to transfer in multi-armed bandit, in particular in systems which are non-stationary where the task can change multiple times. Pierre-Victor Chaumier will start a long internship on transfer in RL with focus on applications to Atari games. Romain Warlop started in July a Cifre PhD (co-supervised by A. Lazaric, J. Mary, and Ph. Preux) with focus on how to use transfer learning in recommendation systems. We expect these activities to significantly advance the research in the project within 2016.


Participant : Olivier Pietquin.

  • Acronym: KEHATH

  • Title: Advanced Quality Methods for Post-Edition of Machine Translation

  • Type: ANR

  • Coordinator: Lingua & Machina

  • Duration: 2014-2017

  • Other partners: Univ. Lille 1, Laboratoire d'Informatique de Grenoble (LIG)

  • Abstract: The translation community has seen a major change over the last five years. Thanks to progress in the training of statistical machine translation engines on corpora of existing translations, machine translation has become good enough so that it has become advantageous for translators to post-edit machine outputs rather than translate from scratch. However, current enhancement of machine translation (MT) systems from human post-edition (PE) are rather basic: the post-edited output is added to the training corpus and the translation model and language model are re-trained, with no clear view of how much has been improved and how much is left to be improved. Moreover, the final PE result is the only feedback used: available technologies do not take advantages of logged sequences of post-edition actions, which inform on the cognitive processes of the post-editor. The KEHATH project intends to address these issues in two ways. Firstly, we will optimise advanced machine learning techniques in the MT+PE loop. Our goal is to boost the impact of PE, that is, reach the same performance with less PE or better performance with the same amount of PE. In other words, we want to improve machine translation learning curves. For this purpose, active learning and reinforcement learning techniques will be proposed and evaluated. Along with this, we will have to face challenges such as MT systems heterogeneity (statistical and/or rule-based), and ML scalability so as to improve domain-specific MT. Secondly, since quality prediction (QP) on MT outputs is crucial for translation project managers, we will implement and evaluate in real-world conditions several confidence estimation and error detection techniques previously developed at a laboratory scale. A shared concern will be to work on continuous domain-specific data flows to improve both MT and the performance of indicators for quality prediction. The overall goal of the KEHATH project is straightforward: gain additional machine translation performance as fast as possible in each and every new industrial translation project, so that post-edition time and cost is drastically reduced. Basic research is the best way to reach this goal, for an industrial impact that is powerful and immediate.


Participants : Olivier Pietquin, Bilal Piot.

  • Acronym: MaRDi

  • Title: Man-Robot Dialogue

  • Type: ANR

  • Coordinator: Univ. Lille 1 (Olivier Pietquin)

  • Duration: 2012-2016

  • Other partners: Laboratoire d'Informatique d'Avignon (LIA), CNRS - LAAS (Toulouse), Acapela group (Toulouse)

  • Abstract: In the MaRDi project, we study the interaction between humans and machines as a situated problem in which human users and machines share the same environment. Especially, we investigate how the physical environment of robots interacting with humans can be used to improve the performance of spoken interaction which is known to be imperfect and sensible to noise. To achieve this objectif, we study three main problems. First, how to interactively build a multimodal representation of the current dialogue context from perception and proprioception signals. Second, how to automatically learn a strategy of interaction using methods such as reinforcement learning. Third, how to provide expressive feedbacks to users about how the machine is confident about its behaviour and to reflect its current state (also the physical state).

National Partners

  • Inria Bordeaux - Sud-Ouest

    • B.Piot and O.Pietquin worked with T.Munzer and M.Lopes on Inverse Reinforcement Learning with Relational Domains. It led to a publication in IJCAI 2015 [24] .

  • CentraleSupélec

    • B.Piot and O.Pietquin worked with M.Geist on Inverse Reinforcement Learning with Relational Domains and Dialogue Management. It led to a conference publication in IJCAI 2015 [24] and a workshop publication in MLIS 2015 [29] .

  • Inria Nancy - Grand Est

    • J.Perolat, B.Piot and O.Pietquin worked with Bruno Scherrer on Stochastic Games. It led to a conference publication in ICML 2015 [28] .

  • CMLA - ENS Cachan.

    • Julien Audiffren Collaborator

      M. Valko, A. Lazaric, and M. Ghavamzadeh work with Julien on Semi-Supervised Apprenticeship Learning. We finalized and published a max-entropy algorithm that outperforms the approach without unlabeled data.

  • LTCI, Institut Télécom-ParisTech, France.

    • Charanpal Dhanjal, Stefan ClemençonCollaborator

      Romaric Gaudel collaborates with Charanpal and Stefan since 2010 on topics related to Matrix Factorization. In the past we applied our work to sequential recommendation and to sequential clustering. This year, the collaboration has led to a publication in AAAI'15 conference [16] .