Section: Partnerships and Cooperations
National Initiatives
ANR BOLD
Participants : Émilie Kaufmann, Michal Valko, Pierre Ménard, Xuedong Shang, Omar Darwiche Domingues.
 Title:
 Type:
 Coordinator:
 Duration:
 Abstract:

Reactive machine learning algorithms adapt to data generating processes, typically do not require large computational power and, moreover, can be translated into offline (as opposed to online) algorithms if needed. Introduced in the 30s in the context of clinical trials, online ML algorithms have been gaining a lot of theoretical interest for the last 15 years because of their applications to the optimization of recommender systems, click through rates, planning in congested networks, to name just a few. However, in practice, such algorithms are not used as much as they should, because the traditional lowlevel modelling assumptions they are based upon are not appropriate, as it appears.
Instead of trying to complicate and generalise arbitrarily a framework unfit for potential applications, we will tackle this problem from another perspective. We will seek a better understanding of the simple original problem and extend it in the appropriate directions. There are currently three main barriers to a broader development of online learning, that this project aim at overcoming. 1) The classical “one step, one decision, one reward” paradigm is unfit. 2) Optimality is defined with respect to worstcase generic lower bounds and mechanics behind online learning are not fully understood. 3) Algorithms were designed in a non strategic or interactive environment.
The project gathers four parnters: ENS ParisSaclay, University of Toulouse, Inria Lille and Université Paris Descartes.
ANR BoB
Participant : Michal Valko.
 Title:
 Type:
 Coordinator:
 Duration:
 Abstract:

Bayesian methods are a popular class of statistical algorithms for updating scientific beliefs. They turn data into decisions and models, taking into account uncertainty about models and their parameters. This makes Bayesian methods popular among applied scientists such as biologists, physicists, or engineers. However, at the heart of Bayesian analysis lie 1) repeated sweeps over the full dataset considered, and 2) repeated evaluations of the model that describes the observed physical process. The current trends to largescale data collection and complex models thus raises two main issues. Experiments, observations, and numerical simulations in many areas of science nowadays generate terabytes of data, as does the LHC in particle physics for instance. Simultaneously, knowledge creation is becoming more and more datadriven, which requires new paradigms addressing how data are captured, processed, discovered, exchanged, distributed, and analyzed. For statistical algorithms to scale up, reaching a given performance must require as few iterations and as little access to data as possible. It is not only experimental measurements that are growing at a rapid pace. Cell biologists tend to have scarce data but largescale models of tens of nonlinear differential equations to describe complex dynamics. In such settings, evaluating the model once requires numerically solving a large system of differential equations, which may take minutes for some tens of differential equations on today’s hardware. Iterative statistical processing that requires a million sequential runs of the model is thus out of the question. In this project, we tackle the fundamental costaccuracy tradeoff for Bayesian methods, in order to produce generic inference algorithms that scale favorably with the number of measurements in an experiment and the number of runs of a statistical model. We propose a collection of objectives with different riskreward tradeoffs to tackle these two goals. In particular, for experiments with large numbers of measurements, we further develop existing subsamplingbased Monte Carlo methods, while developing a novel decision theory framework that includes data constraints. For expensive models, we build an ambitious programme around Monte Carlo methods that leverage determinantal processes, a rich class of probabilistic tools that lead to accurate inference with limited model evaluations. In short, using innovative techniques such as subsamplingbased Monte Carlo and determinantal point processes, we propose in this project to push the boundaries of the applicability of Bayesian inference.
ANR Badass
Participants : OdalricAmbrym Maillard, Émilie Kaufmann.
 Title:
 Type:
 Coordinator:
 Duration:
 Abstract:

Motivated by the fact that a number of modern applications of sequential decision making require developing strategies that are especially robust to change in the stationarity of the signal, and in order to anticipate and impact the next generation of applications of the field, the BADASS project intends to push theory and application of MAB to the next level by incorporating nonstationary observations while retaining near optimality against the best not necessarily constant decision strategy. Since a nonstationary process typically decomposes into chunks associated with some possibly hidden variables (states), each corresponding to a stationary process, handling nonstationarity crucially requires exploiting the (possibly hidden) structure of the decision problem. For the same reason, a MAB for which arms can be arbitrary nonstationary processes is powerful enough to capture MDPs and even partially observable MDPs as special cases, and it is thus important to jointly address the issue of nonstationarity together with that of structure. In order to advance these two nested challenges from a solid theoretical standpoint, we intend to focus on the following objectives: (i) To broaden the range of optimal strategies for stationary MABs: current strategies are only known to be provably optimal in a limited range of scenarios for which the class of distribution (structure) is perfectly known; also, recent heuristics possibly adaptive to the class need to be further analyzed. (ii) To strengthen the literature on pure sequential prediction (focusing on a single arm) for nonstationary signals via the construction of adaptive confidence sets and a novel measure of complexity: traditional approaches consider a worstcase scenario and are thus overly conservative and nonadaptive to simpler signals. (iii) To embed the lowrank matrix completion and spectral methods in the context of reinforcement learning, and further study models of structured environments: promising heuristics in the context of e.g. contextual MABs or Predictive State Representations require stronger theoretical guarantees.
This project will result in the development of a novel generation of strategies to handle nonstationarity and structure that will be evaluated in a number of test beds and validated by a rigorous theoretical analysis. Beyond the significant advancement of the state of the art in MAB and RL theory and the mathematical value of the program, this JCJC BADASS is expected to strategically impact societal and industrial applications, ranging from personalized healthcare and elearning to computational sustainability or rainadaptive riverbank management to cite a few.
Grant of Fondation Mathématique Jacques Hadamard
Participants : Michal Valko, Ronan Fruit.
 Title:

Theoretically grounded efficient algorithms for highdimensional and continuous reinforcement learning
 Type:
 PI:
 Criteo contact:
 Duration:
 Abstract:

While learning how to behave optimally in an unknown environment, a reinforcement learning (RL) agent must trade off the exploration needed to collect new information about the dynamics and reward of the environment, and the exploitation of the experience gathered so far to gain as much reward as possible. A good measure of the agent's performance is the regret, which measures the difference between the performance of optimal policy and the actual rewards accumulated by the agent. Two common approaches to the explorationexploitation dilemma with provably good regret guarantees are the optimism in the face of uncertainty principle and Thompson Sampling. While these approaches have been successfully applied to small environments with a finite number of states and action (tabular scenario), existing approach for large or continuous environments either rely on heuristics and come with no regret guarantees, or can be proved to achieve small regret but cannot be implemented efficiently. In this project, we propose to make a significant contribution in the understanding of large and/or continuous RL problems by developing and analyzing new algorithms that perform well both in theory and practice.
This research line can have a practical impact in all the applications requiring continuous interaction with an unknown environment. Recommendation systems belong to this category and, by definition, they can be modeled has a sequence of repeated interaction between a learning agent and a large (possibly continuous) environment.
With CIRAD and CGIAR
Participants : Philippe Preux, OdalricAmbrym Maillard, Romain Gautron.
 Title:
 Duration:
 Abstract:

We study how reinforcement learning may be used to provide recommendations of practices to small farm holders in underdevelopped countries. In such countries, agriculture remains mostly a non mechanized activity, dealing with fields of very small surface.
This is a very challenging application for RL: data is scarce, recommendations made to farmers should be of quality: we can not just learn by making millions of bad recommendations to people who use them to live and feed their family. Modeling the problem as an RL is yet an other challenge.
We feel that it is very interesting to challenge RL with such complex tasks. Solving games with RL is nice and fun, but we should assess RL abilities to solve real risky tasks.
This pioneering work is done within Romain Gautron's PhD, in collaboration with CIRAD, the CGIAR, and in relation with the Africa Rising program.
Project CNRSINSERM REPOS
Participants : Émilie Kaufmann, Clémence Réda [INSERM] .
 Title:

Repositionnement de médicaments basé sur leurs effets transcriptionnels par des approches de réseaux géniques
 Type:
 PI:
 Duration:
 Abstract:

Drug repurposing consists in studying molecules already commercialized and find other therapies in which they may be efficient. The quality of therapeutic components is often assessed by their affinity to a given protein, but it can also be assessed in terms of their impact at the transciptomic level. The aim of this project is to develop a method for selecting which drugs could be used for a given disease based on their ability to inverse the transcriptomic signature of a pathological phenotype. We will propose a new method based on algorithms for sequential decision making (bandit algorithms) to adaptively select which drug should be explored, where exploring a drug means performing simulations to propagate the perturbation (using for example gene regulatory networks) and estimate the transcriptomic impact of the perturbation induced by the drug. These simulations will hinge on existing gene expression data that are already available for many drugs, but also on new transcriptomic data generated for a mouse model of a rare disease called the Ondine syndrom.
National Partners


M. Valko collaborated with V. Perchet on structured bandit problem. They cosupervise a PhD student (P. Perrault) together

OA. Maillard collaborates with V. Perchet on automated feature learning. They cosupervise a PhD student (R. Ouhamma) together

E. Kaufmann collaborated with V. Perchet and E. Boursier on MultiPlayer bandits


Institut de Mathématiques de Toulouse, then Ecole Normale Supérieure de Lyon

Participation to the Inria Project Lab (IPL) “HPC – Big Data”: Started in 2018, this IPL gathers a dozen Inria teamprojects, mixing researchers in HPC with researchers in machine learning and data science. SequeL contribution in this project is about how we can take advantage of HPC for our computational needs regarding deep learning and deep reinforcement learning, and also how such learning algorithms might be redesigned or reimplemented in order to take advantage of HPC architectures.

Participation to the Inria Project Lab (IPL) “HYAIAI”: Started in 2019, this IPL gathers Magnet and SequeL in Lille, Tau in Saclay, Lacodam in Rennes, Orpailleur and Multispeech in Nancy. The goal of this IPL is to study machine learning combining symbolic and numeric approaches, to obtain interpretable AI systems.