SequeL means “Sequential Learning”. As such, SequeL focuses on the task of learning in artificial systems (either hardware, or software) that gather information along time. Such systems are named *(learning) agents* (or learning machines) in the following.
These data may be used to estimate some parameters of a model, which in turn, may be used for selecting actions in order to perform some long-term optimization task.

For the purpose of model building, the agent needs to represent information collected so far in some compact form and use it to process newly available data.

The acquired data may result from an observation process of an agent in interaction with its environment (the data thus represent a perception). This is the case when the agent makes decisions (in order to attain a certain objective) that impact the environment, and thus the observation process itself.

Hence, in SequeL, the term **sequential** refers to two aspects:

The **sequential acquisition of data**, from which a model is learned (supervised and non supervised learning),

the **sequential decision making task**, based on the learned model (reinforcement learning).

Examples of sequential learning problems include:

tasks deal with the prediction of some response given a certain set of observations of input variables and responses. New sample points keep on being observed.

tasks deal with clustering objects, these latter making a flow of objects. The (unknown) number of clusters typically evolves during time, as new objects are observed.

tasks deal with the control (a policy) of some system which has to be optimized (see ). We do not assume the availability of a model of the system to be controlled.

In all these cases, we mostly assume that the process can be considered stationary for at least a certain amount of time, and slowly evolving.

We wish to have any-time algorithms, that is, at any moment, a prediction may be required/an action may be selected making full use, and hopefully, the best use, of the experience already gathered by the learning agent.

The perception of the environment by the learning agent (using its sensors) is generally neither the best one to make a prediction, nor to take a decision (we deal with Partially Observable Markov Decision Problem). So, the perception has to be mapped in some way to a better, and relevant, state (or input) space.

Finally, an important issue of prediction regards its evaluation: how wrong may we be when we perform a prediction? For real systems to be controlled, this issue can not be simply left unanswered.

To sum-up, in SequeL, the main issues regard:

the learning of a model: we focus on models that map some
input space

the observation to state mapping,

the choice of the action to perform (in the case of sequential decision problem),

the performance guarantees,

the implementation of usable algorithms,

all that being understood in a *sequential* framework.

SequeL is primarily grounded on two domains:

the problem of decision under uncertainty,

statistical analysis and statistical learning, which provide the general concepts and tools to solve this problem.

To help the reader who is unfamiliar with these questions, we briefly present key ideas below.

The phrase “Decision under uncertainty” refers to the problem of taking decisions when we do not have a full knowledge neither of the situation, nor of the consequences of the decisions, as well as when the consequences of decision are non deterministic.

We introduce two specific sub-domains, namely the Markov decision processes which models sequential decision problems, and bandit problems.

Sequential decision processes occupy the heart of the SequeL project; a detailed presentation of this problem may be found in Puterman's book .

A Markov Decision Process (MDP) is defined as the tuple

In the MDP (

The history of the process up to time

We move from an MD process to an MD problem by formulating the goal of the agent, that is what the sought policy

where

In order to maximize a given functional in a sequential framework, one usually applies Dynamic Programming (DP) , which introduces the optimal value function

We say that a policy *i.e.*, if

We say that a (deterministic stationary) policy

where

The goal of Reinforcement Learning (RL), as well as that of dynamic programming, is to design an optimal policy (or a good approximation of it).

The well-known Dynamic Programming equation (also called the Bellman equation) provides a relation between the optimal value function at a state

The benefit of introducing this concept of optimal value function relies on the property that, from the optimal value function

In short, we would like to mention that most of the reinforcement learning methods developed so far are built on one (or both) of the two following approaches ( ):

Bellman's dynamic programming approach, based on the introduction of the value function. It consists in learning a “good” approximation of the optimal value function, and then using it to derive a greedy policy w.r.t. this approximation. The hope (well justified in several cases) is that the performance **Approximate dynamic programming** addresses the problem of estimating performance bounds (*e.g.* the loss in performance

Pontryagin's maximum principle approach, based on sensitivity analysis of the performance measure w.r.t. some control parameters. This approach, also called **direct policy search** in the Reinforcement Learning community aims at directly finding a good feedback control law in a parameterized policy space without trying to approximate the value function. The method consists in estimating the so-called **policy gradient**, *i.e.* the sensitivity of the performance measure (the value function) w.r.t. some parameters of the current policy. The idea being that an optimal control problem is replaced by a parametric optimization problem in the space of parameterized policies. As such, deriving a policy gradient estimate would lead to performing a stochastic gradient method in order to search for a local optimal parametric policy.

Finally, many extensions of the Markov decision processes exist, among which the Partially Observable MDPs (POMDPs) is the case where the current state does not contain all the necessary information required to decide for sure of the best action.

Bandit problems illustrate the fundamental difficulty of decision making in the face of uncertainty: A decision maker must choose between what seems to be the best choice (“exploit”), or to test (“explore”) some alternative, hoping to discover a choice that beats the current best choice.

The classical example of a bandit problem is deciding what treatment to give each patient in a clinical trial when the effectiveness of the treatments are initially unknown and the patients arrive sequentially. These bandit problems became popular with the seminal paper , after which they have found applications in diverse fields, such as control, economics, statistics, or learning theory.

Formally, a K-armed bandit problem (*i.e.*, when the arm giving the highest expected reward is pulled all the time.

The name “bandit” comes from imagining a gambler playing with K slot machines. The gambler can pull the arm of any of the machines, which produces a random payoff as a result: When arm k is pulled, the random payoff is drawn from the distribution associated to k. Since the payoff distributions are initially unknown, the gambler must use exploratory actions to learn the utility of the individual arms. However, exploration has to be carefully controlled since excessive exploration may lead to unnecessary losses. Hence, to play well, the gambler must carefully balance exploration and exploitation. Auer *et al.* introduced the algorithm UCB (Upper Confidence Bounds) that follows what is now called the “optimism in the face of uncertainty principle”. Their algorithm works by computing upper confidence bounds for all the arms and then choosing the arm with the highest such bound. They proved that the expected regret of their algorithm increases at most at a logarithmic rate
with the number of trials, and that the algorithm achieves the smallest possible regret up to some sub-logarithmic factor (for the considered family of distributions).

Many of the problems of machine learning can be seen as extensions of classical problems of mathematical statistics to their (extremely) non-parametric and model-free cases. Other machine learning problems are founded on such statistical problems. Statistical problems of sequential learning are mainly those that are concerned with the analysis of time series. These problems are as follows.

Given a series of observations

Alternatively, rather than making some assumptions on the data, one can change the goal: the predicted probabilities should be asymptotically as good as those given by the best reference predictor from a certain pre-defined set.

Another dimension of complexity in this problem concerns the nature of observations

Given a series of observations of

The problem of hypothesis testing can also be studied in its general formulations: given two (abstract) hypothesis

A stochastic process is generating the data. At some point, the process distribution changes. In the “offline” situation, the statistician observes the resulting sequence of outcomes and has to estimate the point or the points at which the change(s) occurred. In online setting, the goal is to detect the change as quickly as possible.

These are the classical problems in mathematical statistics, and probably among the last remaining statistical problems not adequately addressed by machine learning methods. The reason for the latter is perhaps in that the problem is rather challenging. Thus, most methods available so far are parametric methods concerning piece-wise constant distributions, and the change in distribution is associated with the change in the mean. However, many applications, including DNA analysis, the analysis of (user) behaviour data, etc., fail to comply with this kind of assumptions. Thus, our goal here is to provide completely non-parametric methods allowing for any kind of changes in the time-series distribution.

The problem of clustering, while being a classical problem of mathematical statistics, belongs to the realm of unsupervised learning. For time series, this problem can be formulated as follows: given several samples

The online version of the problem allows for the number of observed time series to grow with time, in general, in an arbitrary manner.

Semi-supervised learning (SSL) is a field of machine learning that studies learning from both labeled and unlabeled examples. This learning paradigm is extremely useful for solving real-world problems, where data is often abundant but the resources to label them are limited.

Furthermore, *online* SSL is suitable for adaptive machine
learning systems. In the classification case, learning is viewed as a
repeated game against a potentially adversarial nature. At each step

The challenge of the game is that we only exceptionally observe the true label

Large-scale kernel ridge regression is limited by the need to store a large kernel matrix. Similarly, large-scale graph-based learning is limited by storing the graph Laplacian. Furthermore, if the data come online, at some point no finite storage is sufficient and per step operations become slow.

Our challenge is to design sparsification methods that give guaranteed approximate solutions with a reduced storage requirements.

The spectrum of applications of our research is very wide: it ranges from the core of our research, that is sequential decision making under uncertainty, to the application of components used to solve this decision making problem.

To be more specific, we work on computational advertizing and recommandation systems; these problems are considered as a sequential matching problem in which resources available in a limited amount have to be matched to meet some users' expectations. The sequential approach we advocate paves the way to better tackle the cold-start problem, and non stationary environments. More generally, these approaches are applied to the optimization of budgeted resources under uncertainty, in a time-varying environment, including constraints on computational times (typically, a decision has to be made in less than 1 ms in a recommandation system). An other field of applications of our research is related to education which we consider as a sequential matching problem between a student, and educational contents.

The algorithms to solve these tasks heavily rely on tools from machine learning, statistics, and optimization. Henceforth, we also apply our work to more classical supervised learning, and prediction tasks, as well as unsupervised learning tasks. The whole range of methods is used, from decision forests, to kernel methods, to deep learning. For instance, we have recently used deep learning on images. We also have a line of works related to software development studying how machine learning can improve the quality of software being developed. More generally, we apply our research to data science.

under the supervision of O. Pietquin and J. Mary, F. Strub and collaborators (among which University of Montreal) have introduced the **Guesswhat?!** game to study visually grounded dialogues interleaving vision and natural language. A dataset of 150k human-human dialogues has been collected and is freely available on the Internet. Supervised learning baselines and state-of-the-art reinforcement learning algoritms have been implemented and are available as open-source code. This work resulted in publications in prestigious conferences: as a spotlight at CVPR 2017, an oral at IJCAI 2017, and an other spotlight at NIPS 2017 , , . Spotlight presentations concern less than 3.5% of submisisons to NIPS, and 5% of submissions to CVPR.

under the supervision of M. Valko and A. Lazaric, D. Calandriello and collaborators have provided the first breaking quadratic barrier for nonparametric learning. An open source implementation is available on the Internet. The work has been published in prestigious conferences: AI & STATS, ICML and NIPS , , .

*Bayesian Policy Gradient and Actor-Critic Algorithms*

Keywords: Machine learning - Incremental learning - Policy Learning

Functional Description: To address this issue, we proceed to supplement our Bayesian policy gradient framework with a new actor-critic learning model in which a Bayesian class of non-parametric critics, based on Gaussian process temporal difference learning, is used. Such critics model the action-value function as a Gaussian process, allowing Bayes’ rule to be used in computing the posterior distribution over action-value functions, conditioned on the observed data. Appropriate choices of the policy parameterization and of the prior covariance (kernel) between action-values allow us to obtain closed-form expressions for the posterior distribution of the gradient of the expected return with respect to the policy parameters. We perform detailed experimental comparisons of the proposed Bayesian policy gradient and actor-critic algorithms with classic Monte-Carlo based policy gradient methods, as well as with each other, on a number of reinforcement learning problems.

Contact: Michal Valko

*GuessWhat?! Visual object discovery through multi-modal dialogue*

Keywords: Deep learning - Dialogue System

Functional Description: This project train a AI to play the GuessWhat?! game. Thus, you can train an AI to ask questions, to answer questions about images. You can also perform basic visual reasoning. This project is a testbed for future interactive dialogue system.

Partner: Universite de Montreal

Contact: Florian Strub

Publications: GuessWhat?! Visual object discovery through multi-modal dialogue - End-to-end optimization of goal-driven and visually grounded dialogue systems Harm de Vries

*Sequential sampling for kernel matrix approximation*

Keyword: Machine learning

Contact: Daniele Calandriello

URL: http://

*Optimistic Optimization in R*

Keywords: Black-box optimization - Machine learning

Contact: Mickael Binois

**Thompson Sampling for Linear-Quadratic Control Problems**,

We consider the exploration-exploitation tradeoff in linear quadratic (LQ) control problems, where the state dynamics is linear and the cost function is quadratic in states and controls. We analyze the regret of Thompson sampling (TS) (a.k.a. posterior-sampling for reinforcement learning) in the frequentist setting, i.e., when the parameters characterizing the LQ dynamics are fixed. Despite the empirical and theoretical success in a wide range of problems from multi-armed bandit to linear bandit, we show that when studying the frequentist regret TS in control problems, we need to trade-off the frequency of sampling optimistic parameters and the frequency of switches in the control policy. This results in an overall regret of

**Exploration–Exploitation in MDPs with Options**,

While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited. In this paper, we derive an upper and lower bound on the regret of a variant of UCRL using options. While we first analyze the algorithm in the general case of semi-Markov decision processes (SMDPs), we show how these results can be translated to the specific case of MDPs with options and we illustrate simple scenarios in which the regret of learning with options can be provably much smaller than the regret suffered when learning with primitive actions.

**Regret Minimization in MDPs with Options without Prior Knowledge**,

The option framework integrates temporal abstraction into the reinforcement learning model through the introduction of macro-actions (i.e., options). Recent works leveraged the mapping of Markov decision processes (MDPs) with options to semi-MDPs (SMDPs) and introduced SMDP-versions of exploration-exploitation algorithms (e.g., RMAX-SMDP and UCRL-SMDP) to analyze the impact of options on the learning performance. Nonetheless, the PAC-SMDP sample complexity of RMAX-SMDP can hardly be translated into equivalent PAC-MDP theoretical guarantees, while the regret analysis of UCRL-SMDP requires prior knowledge of the distributions of the cumulative reward and duration of each option, which are hardly available in practice. In this paper, we remove this limitation by combining the SMDP view together with the inner Markov structure of options into a novel algorithm whose regret performance matches UCRL-SMDP's up to an additive regret term. We show scenarios where this term is negligible and the advantage of temporal abstraction is preserved. We also report preliminary empirical results supporting the theoretical findings.

**Is the Bellman Residual a Bad Proxy?**,

This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual T * v

**Faut-il minimiser le résidu de Bellman ou maximiser la valeur moyenne ?**,

**Transfer Reinforcement Learning with Shared Dynamics**,

This article addresses a particular Transfer Reinforcement Learning (RL) problem: when dynamics do not change from one task to another, and only the reward function does. Our method relies on two ideas, the first one is that transition samples obtained from a task can be reused to learn on any other task: an immediate reward estimator is learnt in a supervised fashion and for each sample, the reward entry is changed by its reward estimate. The second idea consists in adopting the optimism in the face of uncertainty principle and to use upper bound reward estimates. Our method is tested on a navigation task, under four Transfer RL experimental settings: with a known reward function, with strong and weak expert knowledge on the reward function, and with a completely unknown reward function. It is also evaluated in a Multi-Task RL experiment and compared with the state-of-the-art algorithms. Results reveal that this method constitutes a major improvement for transfer/multi-task problems that share dynamics.

**Trading Off Rewards and Errors in Multi-armed Bandits**,

In multi-armed bandits, the most common objective is the maximization of the cumulative reward. Alternative settings include active exploration, where a learner tries to gain accurate estimates of the rewards of all arms. While these objectives are contrasting, in many scenarios it is desirable to trade off rewards and errors. For instance, in educational games the designer wants to gather generalizable knowledge about the behavior of the students and teaching strategies (small estimation errors) but, at the same time, the system needs to avoid giving a bad experience to the players, who may leave the system permanently (large reward). In this paper, we formalize this tradeoff and introduce the ForcingBalance algorithm whose performance is provably close to the best possible tradeoff strategy. Finally, we demonstrate on real-world educational data that ForcingBalance returns useful information about the arms without compromising the overall reward.

**Online Influence Maximization Under Independent Cascade Model with Semi-bandit Feedback**,

We study the online influence maximization problem in social networks under the independent cascade model. Specifically, we aim to learn the set of ” best influencers ” in a social network online while repeatedly interacting with it. We address the challenges of (i) combinatorial action space, since the number of feasible influencer sets grows exponentially with the maximum number of influencers, and (ii) limited feedback, since only the influenced portion of the network is observed. Under a stochastic semi-bandit feedback, we propose and analyze IMLinUCB, a computationally efficient UCB-based algorithm. Our bounds on the cumulative regret are polynomial in all quantities of interest, achieve near-optimal dependence on the number of interactions and reflect the topology of the network and the activation probabilities of its edges, thereby giving insights on the problem complexity. To the best of our knowledge, these are the first such results. Our experiments show that in several representative graph topologies, the regret of IMLinUCB scales as suggested by our upper bounds. IMLinUCB permits linear generalization and thus is both statistically and computationally suitable for large-scale problems. Our experiments also show that IMLinUCB with linear generalization can lead to low regret in real-world online influence maximization.

**Boundary Crossing for General Exponential Families**,

We consider parametric exponential families of dimension K on the real line. We study a variant of boundary crossing probabilities coming from the multi-armed bandit literature, in the case when the real-valued distributions form an exponential family of dimension K. Formally, our result is a concentration inequality that bounds the probability that B

**The Non-stationary Stochastic Multi-armed Bandit Problem**, Robin, Féraud, Maillard

**Linear Thompson Sampling Revisited**,

We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order *optimistic* parameters does control it. Thus we show that TS can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional

**Active Learning for Accurate Estimation of Linear Models**,

We explore the sequential decision-making problem where the goal is to estimate a number of linear models uniformly well, given a shared budget of random contexts independently sampled from a known distribution. For each incoming context, the decision-maker selects one of the linear models and receives an observation that is corrupted by the unknown noise level of that model. We present Trace-UCB, an adaptive allocation algorithm that learns the models' noise levels while balancing contexts accordingly across them, and prove bounds for its simple regret in both expectation and high-probability. We extend the algorithm and its bounds to the high dimensional setting , where the number of linear models times the dimension of the contexts is more than the total budget of samples. Simulations with real data suggest that Trace-UCB is remarkably robust , outperforming a number of baselines even when its assumptions are violated.

**Learning the Distribution with Largest Mean: Two Bandit Frameworks**,

Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization. This paper reviews two different sequential learning tasks that have been considered in the bandit literature ; they can be formulated as (sequentially) learning which distribution has the highest mean among a set of distributions, with some constraints on the learning process. For both of them (regret minimization and best arm identification) we present recent, asymptotically optimal algorithms. We compare the behaviors of the sampling rule of each algorithm as well as the complexity terms associated to each problem.

**On Bayesian Index Policies for Sequential Resource Allocation**,

This paper is about index policies for minimizing (frequentist) regret in a stochastic multi-armed bandit model, inspired by a Bayesian view on the problem. Our main contribution is to prove that the Bayes-UCB algorithm, which relies on quantiles of posterior distributions, is asymptotically optimal when the reward distributions belong to a one-dimensional exponential family, for a large class of prior distributions. We also show that the Bayesian literature gives new insight on what kind of exploration rates could be used in frequentist, UCB-type algorithms. Indeed, approximations of the Bayesian optimal solution or the Finite Horizon Gittins indices provide a justification for the kl-UCB+ and kl-UCB-H+ algorithms, whose asymptotic optimality is also established.

**Multi-Player Bandits Models Revisited**,

Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-player MAB algorithms. Most existing work assume that sensing information is available to the algorithm. Under this assumption, we improve the state-of-the-art lower bound for the regret of any decentralized algorithms and introduce two algorithms, RandTopM and MCTopM, that are shown to empirically outperform existing algorithms. Moreover, we provide strong theoretical guarantees for these algorithms, including a notion of asymptotic optimality in terms of the number of selections of bad arms. We then introduce a promising heuristic, called Selfish, that can operate without sensing information, which is crucial for emerging applications to Internet of Things networks. We investigate the empirical performance of this algorithm and provide some first theoretical elements for the understanding of its behavior.

**Multi-Armed Bandit Learning in IoT Networks: Learning helps even in non-stationary settings**,

Setting up the future Internet of Things (IoT) networks will require to support more and more communicating devices. We prove that intelligent devices in unlicensed bands can use Multi-Armed Bandit (MAB) learning algorithms to improve resource exploitation. We evaluate the performance of two classical MAB learning algorithms, UCB1 and Thompson Sampling, to handle the decentralized decision-making of Spectrum Access, applied to IoT networks; as well as learning performance with a growing number of intelligent end-devices. We show that using learning algorithms does help to fit more devices in such networks, even when all end-devices are intelligent and are dynamically changing channel. In the studied scenario, stochastic MAB learning provides a up to 16% gain in term of successful transmission probabilities, and has near optimal performance even in non-stationary and non-i.i.d. settings with a majority of intelligent devices.

**Efficient Tracking of a Growing Number of Experts**,

We consider a variation on the problem of prediction with expert advice, where new forecasters that were unknown until then may appear at each round. As often in prediction with expert advice, designing an algorithm that achieves near-optimal regret guarantees is straightforward, using aggregation of experts. However, when the comparison class is sufficiently rich, for instance when the best expert and the set of experts itself changes over time, such strategies naively require to maintain a prohibitive number of weights (typically exponential with the time horizon). By contrast, designing strategies that both achieve a near-optimal regret and maintain a reasonable number of weights is highly non-trivial. We consider three increasingly challenging objectives (simple regret, shifting regret and sparse shifting regret) that extend existing notions defined for a fixed expert ensemble; in each case, we design strategies that achieve tight regret bounds, adaptive to the parameters of the comparison class, while being computationally inexpensive. Moreover, our algorithms are anytime , agnostic to the number of incoming experts and completely parameter-free. Such remarkable results are made possible thanks to two simple but highly effective recipes: first the ” abstention trick ” that comes from the specialist framework and enables to handle the least challenging notions of regret, but is limited when addressing more sophisticated objectives. Second, the ” muting trick ” that we introduce to give more flexibility. We show how to combine these two tricks in order to handle the most challenging class of comparison strategies.

**Monte-Carlo Tree Search by Best Arm Identification**,

Recent advances in bandit tools and techniques for sequential learning are steadily enabling new applications and are promising the resolution of a range of challenging related problems. We study the game tree search problem, where the goal is to quickly identify the optimal move in a given game tree by sequentially sampling its stochastic payoffs. We develop new algorithms for trees of arbitrary depth, that operate by summarizing all deeper levels of the tree into confidence intervals at depth one, and applying a best arm identification procedure at the root. We prove new sample complexity guarantees with a refined dependence on the problem instance. We show experimentally that our algorithms outperform existing elimination-based algorithms and match previous special-purpose methods for depth-two trees.

**Learning Nash Equilibrium for General-Sum Markov Games from Batch Data**,

This paper addresses the problem of learning a Nash equilibrium in

**Spectral Learning from a Single Trajectory under Finite-State Policies**,

We present spectral methods of moments for learning sequential models from a single trajectory, in stark contrast with the classical literature that assumes the availability of multiple i.i.d. trajectories. Our approach leverages an efficient SVD-based learning algorithm for weighted automata and provides the first rigorous analysis for learning many important models using dependent data. We state and analyze the algorithm under three increasingly difficult scenarios: probabilistic automata, stochastic weighted automata, and reactive predictive state representations controlled by a finite-state policy. Our proofs include novel tools for studying mixing properties of stochastic weighted automata.

**Distributed Adaptive Sampling for Kernel Matrix Approximation**,

Most kernel-based methods, such as kernel regression, kernel PCA, ICA, or k-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix

**Second-Order Kernel Online Convex Optimization with Adaptive Sketching**,

Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only

**Efficient Second-order Online Kernel Learning with Adaptive Embedding**,

Online kernel learning (OKL) is a flexible framework to approach prediction problems, since the large approximation space provided by reproducing kernel Hilbert spaces can contain an accurate function for the problem. Nonetheless, optimizing over this space is computationally expensive. Not only first order methods accumulate

**Zonotope Hit-and-run for Efficient Sampling from Projection DPPs**,

Determinantal point processes (DPPs) are distributions over sets of items that model diversity using kernels. Their applications in machine learning include summary extraction and recommendation systems. Yet, the cost of sampling from a DPP is prohibitive in large-scale applications, which has triggered an effort towards efficient approximate samplers. We build a novel MCMC sampler that combines ideas from combinatorial geometry, linear programming, and Monte Carlo methods to sample from DPPs with a fixed sample cardinality, also called projection DPPs. Our sampler leverages the ability of the hit-and-run MCMC kernel to efficiently move across convex bodies. Previous theoretical results yield a fast mixing time of our chain when targeting a distribution that is close to a projection DPP, but not a DPP in general. Our empirical results demonstrate that this extends to sampling projection DPPs, i.e., our sampler is more sample-efficient than previous approaches which in turn translates to faster convergence when dealing with costly-to-evaluate functions, such as summary extraction in our experiments.

**Universality of Bayesian mixture predictors**,

The problem is that of sequential probability forecasting for finite-valued time series. The data is generated by an unknown probability distribution over the space of all one-way infinite sequences. It is known that this measure belongs to a given set C, but the latter is completely arbitrary (uncountably infinite, without any structure given). The performance is measured with asymptotic average log loss. In this work it is shown that the minimax asymptotic performance is always attainable, and it is attained by a convex combination of a countably many measures from the set C (a Bayesian mixture). This was previously only known for the case when the best achievable asymptotic error is 0. This also contrasts previous results that show that in the non-realizable case all Bayesian mixtures may be suboptimal, while there is a predictor that achieves the optimal performance.

**Hypotheses Testing on Infinite Random Graphs**,

Drawing on some recent results that provide the formalism necessary to definite stationarity for infinite random graphs, this paper initiates the study of statistical and learning questions pertaining to these objects. Specifically, a criterion for the existence of a consistent test for complex hypotheses is presented, generalizing the corresponding results on time series. As an application, it is shown how one can test that a tree has the Markov property, or,more generally, to estimate its memory.

**Independence Clustering (Without a Matrix)**,

The independence clustering problem is considered in the following formulation: given a set

**End-to-end Optimization of Goal-driven and Visually Grounded Dialogue Systems**,

End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too simplistic to render the intrinsic planning problem inherent to dialogue as well as its grounded nature , making the context of a dialogue larger than the sole history. This is why only chitchat and question answering tasks have been addressed so far using end-to-end architectures. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues , based on the policy gradient algorithm. This approach is tested on a dataset of 120k dialogues collected through Mechanical Turk and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.

**Online Learning and Transfer for User Adaptation in Dialogue Systems**,

We address the problem of user adaptation in Spoken Dialogue Systems. The goal is to quickly adapt online to a new user given a large amount of dialogues collected with other users. Previous works using Transfer for Reinforcement Learning tackled this problem when the number of source users remains limited. In this paper, we overcome this constraint by clustering the source users: each user cluster, represented by its centroid, is used as a potential source in the state-of-the-art Transfer Reinforcement Learning algorithm. Our benchmark compares several clustering approaches , including one based on a novel metric. All experiments are led on a negotiation dialogue task, and their results show significant improvements over baselines.

**GuessWhat?! Visual Object Discovery Through Multi-modal Dialogue**,

We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images. We explain our design decisions in collecting the dataset and introduce the oracle and questioner tasks that are associated with the two players of the game. We prototyped deep learning models to establish initial base-lines of the introduced tasks.

**LIG-CRIStAL System for the WMT17 Automatic Post-Editing Task**,

This paper presents the LIG-CRIStAL submission to the shared Automatic Post-Editing task of WMT 2017. We propose two neural post-editing models: a mono-source model with a task-specific attention mechanism, which performs particularly well in a low-resource scenario; and a chained architecture which makes use of the source sentence to provide extra context. This latter architecture manages to slightly improve our results when more training data is available. We present and discuss our results on two datasets (en-de and de-en) that are made available for the task.

**A Multi-Armed Bandit Model Selection for Cold-Start User Recommendation**,

How can we effectively recommend items to a user about whom we have no information? This is the problem we focus on, known as the cold-start problem. In this paper, we focus on the cold user problem.In most existing works, the cold-start problem is handled through the use of many kinds of information available about the user. However, what happens if we do not have any information?Recommender systems usually keep a substantial amount of prediction models that are available for analysis. Moreover, recommendations to new users yield uncertain returns. Assuming a number of alternative prediction models is available to select items to recommend to a cold user, this paper introduces a multi-armed bandit based model selection, named PdMS.In comparison with two baselines, PdMS improves the performance as measured by the nDCG.These improvements are demonstrated on real, public datasets.

**A Large-scale Study of Call Graph-based Impact Prediction using Mutation Testing**,

In software engineering, impact analysis consists in predicting the software elements (e.g. modules, classes, methods) potentially impacted by a change in the source code. Impact analysis is required to optimize the testing effort. In this paper, we propose a framework to predict error propagation. Based on 10 open-source Java projects and 5 classical mutation operators, we create 17000 mutants and study how the error they introduce propagates. This framework enables us to analyze impact prediction based on four types of call graph. Our results show that the sophistication indeed increases completeness of impact prediction. However, and surprisingly to us, the most basic call graph gives the highest trade-off between precision and recall for impact prediction.

**Correctness Attraction: A Study of Stability of Software Behavior under Runtime Perturbation**,

Can the execution of a software be perturbed without breaking the correctness of the output? In this paper, we devise a novel protocol to answer this rarely investigated question. In an experimental study, we observe that many perturbations do not break the correctness in ten subject programs. We call this phenomenon “correctness attraction”. The uniqueness of this protocol is that it considers a systematic exploration of the perturbation space as well as perfect oracles to determine the correctness of the output. To this extent, our findings on the stability of software under execution perturbations have a level of validity that has never been reported before in the scarce related work. A qualitative manual analysis enables us to set up the first taxonomy ever of the reasons behind correctness attraction.

**A generative model for sparse, evolving digraphs**,

Generating graphs that are similar to real ones is an open problem, while the similarity notion is quite elusive and hard to formalize. In this paper, we focus on sparse digraphs and propose SDG, an algorithm that aims at generating graphs similar to real ones. Since real graphs are evolving and this evolution is important to study in order to understand the underlying dynamical system, we tackle the problem of generating series of graphs. We propose SEDGE, an algorithm meant to generate series of graphs similar to a real series. SEDGE is an extension of SDG. We consider graphs that are representations of software programs and show experimentally that our approach outperforms other existing approaches. Experiments show the performance of both algorithms.

**A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks**,

This paper presents a novel spectral algorithm with additive clustering designed to identify overlapping communities in networks. The algorithm is based on geometric properties of the spectrum of the expected adjacency matrix in a random graph model that we call stochastic blockmodel with overlap (SBMO). An adaptive version of the algorithm, that does not require the knowledge of the number of hidden communities, is proved to be consistent under the SBMO when the degrees in the graph are (slightly more than) logarithmic. The algorithm is shown to perform well on simulated data and on real-world graphs with known overlapping communities.

**Modulating early visual processing by language**,

It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view dominates the current literature in computational models for language-vision tasks, where visual and linguistic inputs are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the entire visual processing by a linguistic input. Specifically, we introduce Conditional Batch Normalization (CBN) as an efficient mechanism to modulate convolutional feature maps by a linguistic embedding. We apply CBN to a pre-trained Residual Network (ResNet), leading to the MODulatEd ResNet (MODERN) architecture, and show that this significantly improves strong baselines on two visual question answering tasks. Our ablation study confirms that modulating from the early stages of the visual processing is beneficial.

**FiLM: Visual Reasoning with a General Conditioning Layer**,

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple , feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

**Learning Visual Reasoning Without Strong Priors**,

Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively. Index Terms: Deep Learning, Language and Vision Note: A full paper extending this study is available at http: //arxiv.org/abs/1709.07871, with additional references , experiments, and analysis.

**HoME: a Household Multimodal Environment**,

We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more. We hope HoME better enables artificial agents to learn as humans do: in an interactive, multimodal, and richly contextualized setting.

contract with http://

Title: Sequential Machine Learning for Adaptive Educational Systems

Duration: Mar. 2018 – Feb. 2021

Abstract: Adaptive educational content are technologies which adapt to the difficulties encountered by students. With the rise of digital content in schools, the mass of data coming from education enables but also ask for machine learning methods. Since 2010, Lelivrescolaire.fr has been developing some learning materials for teachers and students through collaborative creation process. For instance, during the school year 2015/2016, students has achieved more than 8 000 000 exercises on its homework platform Afterclasse.fr. Our approach would be based on sequential machine learning: the algorithm learns to recommend some exercises which adapt to students gradually as they answer.

contract with “OtherLang”; PI: Romaric Gaudel

Title: Tool to support foreign language practice

Duration: 2 months

Abstract: OtherLang develops an application to learn a foreign language by reading documents and interacting wit other people. During the time-line of the contract, SequeL brought his knowledge about Recommender Systems which may be used either to recommend documents to users or to recommend users to users.

contract with “Sidexa”; PI: Jérémie Mary and then Philippe Preux

Title: vision applied to the segmentation and recognition of car body parts parts

Duration: 3 months

Abstract: We investigate deep learning to perform car body segmentation. The result being very good, a second contract will follow up this one in 2018.

contract with “Renault”; PI: Philippe Preux

Title: State of the art in reinforcement learning regarding autonomous car control and path planning.

Duration: 3 months (Jan–Mar 2017)

Abstract: This work has consisted in surveying the litterature related to autonomous car control, and reinforcement learning.

contract with Renault; PI: Philippe Preux

Title: Control of an autonomous vehicle

Duration: 3 years (12/2017–11/2020)

Abstract: This contract comes along the CIFRE grant on the same topic. This work is done in collaboration with the NON-A team-project.

contract with “Criteo”; PI: Philippe Preux

Title: Computational advertizing

Duration: 3 years (12/2017–11/2020)

Abstract: This contract comes along the CIFRE grant on the same topic. The goal is to investigate reinforcmeent learning and deep learning on the problem of ad selection on the Internet.

contract with “Orange Labs”; PI: Philippe Preux

Title: Sequential Learning and Decision Making under Partial Monitoring

Duration: Oct. 2014 – Sep. 2017

Abstract: This contract comes along the CIFRE grant on the same topic. In applications such as recommendation systems, or computational advertising, the return collected from the user is partial: (s)he clicks on one item, or no item at all. We study this setting in which only a “partial” information is gathered in particular how to learn to behave optimaly in such a setting.

contract with “Orange Labs”; PI: Olivier Pietquin

Title: Inter User Transfer in dialogue systems

Duration: 3 years

Abstract: This contract comes along the CIFRE grant on the same topic. The research aims at developing new algorithms to learn fast adaptation strategies for dialogue systems when a new user starts using them while we collected data from previous interactions with other users. Especially, it addresses the cold-start problem encountered when a new user faces the system, before samples can be collected to optimize the interaction strategy.

contract with “55”; PI: Jérémie Mary

Title: Novel Learning and Exploration-Exploitation Methods for Effective Recommender Systems

Duration: Oct. 2015 – Sep. 2018

Abstract: This contract comes along the CIFRE grant on the same topic. In this Ph.D. thesis we intend to deal with this problem by developing novel and more sophisticated recommendation strategies in which the collection of data and the improvement of the performance are considered as a unique process, where the trade-off between the quality of the data and the performance of the recommendation strategy is optimized over time. This work also consider tensor methods (one layer of the tensor can be the time) with the goal to scale them at RS level.

*Title*: Bayesian statistics for expensive models and tall data

*Type*: National Research Agency

*Coordinator*: CNRS (Rémi Bardenet)

*Duration*: 2016-2020

*Abstract*:

Bayesian methods are a popular class of statistical algorithms for updating scientific beliefs. They turn data into decisions and models, taking into account uncertainty about models and their parameters. This makes Bayesian methods popular among applied scientists such as biologists, physicists, or engineers. However, at the heart of Bayesian analysis lie 1) repeated sweeps over the full dataset considered, and 2) repeated evaluations of the model that describes the observed physical process. The current trends to large-scale data collection and complex models thus raises two main issues. Experiments, observations, and numerical simulations in many areas of science nowadays generate terabytes of data, as does the LHC in particle physics for instance. Simultaneously, knowledge creation is becoming more and more data-driven, which requires new paradigms addressing how data are captured, processed, discovered, exchanged, distributed, and analyzed. For statistical algorithms to scale up, reaching a given performance must require as few iterations and as little access to data as possible. It is not only experimental measurements that are growing at a rapid pace. Cell biologists tend to have scarce data but large-scale models of tens of nonlinear differential equations to describe complex dynamics. In such settings, evaluating the model once requires numerically solving a large system of differential equations, which may take minutes for some tens of differential equations on today’s hardware. Iterative statistical processing that requires a million sequential runs of the model is thus out of the question. In this project, we tackle the fundamental cost-accuracy trade-off for Bayesian methods, in order to produce generic inference algorithms that scale favourably with the number of measurements in an experiment and the number of runs of a statistical model. We propose a collection of objectives with different risk-reward trade-offs to tackle these two goals. In particular, for experiments with large numbers of measurements, we further develop existing subsampling-based Monte Carlo methods, while developing a novel decision theory framework that includes data constraints. For expensive models, we build an ambitious programme around Monte Carlo methods that leverage determinantal processes, a rich class of probabilistic tools that lead to accurate inference with limited model evaluations. In short, using innovative techniques such as subsampling-based Monte Carlo and determinantal point processes, we propose in this project to push the boundaries of the applicability of Bayesian inference.

*Title*: BAnDits for non-Stationarity and Structure

*Type*: National Research Agency

*Coordinator*: Inria Lille (O. Maillard)

*Duration*: 2016-2020

*Abstract*: Motivated by the fact that a number of
modern applications of sequential decision making require
developing strategies that are especially robust to change in the
stationarity of the signal, and in order to anticipate and impact
the next generation of applications of the field, the BADASS
project intends to push theory and application of MAB to the next
level by incorporating non-stationary observations while retaining
near optimality against the best not necessarily constant decision
strategy. Since a non-stationary process typically decomposes into
chunks associated with some possibly hidden variables (states),
each corresponding to a stationary process, handling
non-stationarity crucially requires exploiting the (possibly
hidden) structure of the decision problem. For the same reason, a
MAB for which arms can be arbitrary non-stationary processes is
powerful enough to capture MDPs and even partially observable MDPs
as special cases, and it is thus important to jointly address the
issue of non-stationarity together with that of structure. In
order to advance these two nested challenges from a solid
theoretical standpoint, we intend to focus on the following
objectives: *(i)* To broaden the range of optimal
strategies for stationary MABs: current strategies are only known
to be provably optimal in a limited range of scenarios for which
the class of distribution (structure) is perfectly known; also,
recent heuristics possibly adaptive to the class need to be
further analyzed. *(ii)* To strengthen the literature on
pure sequential prediction (focusing on a single arm) for
non-stationary signals via the construction of adaptive confidence
sets and a novel measure of complexity: traditional approaches
consider a worst-case scenario and are thus overly conservative
and non-adaptive to simpler signals. *(iii)* To embed the
low-rank matrix completion and spectral methods in the context of
reinforcement learning, and further study models of structured
environments: promising heuristics in the context of
e.g. contextual MABs or Predictive State Representations require
stronger theoretical guarantees.

This project will result in the development of a novel generation of strategies to handle non-stationarity and structure that will be evaluated in a number of test beds and validated by a rigorous theoretical analysis. Beyond the significant advancement of the state of the art in MAB and RL theory and the mathematical value of the program, this JCJC BADASS is expected to strategically impact societal and industrial applications, ranging from personalized health-care and e-learning to computational sustainability or rain-adaptive river-bank management to cite a few.

*Title*: Extraction and Transfer of Knowledge in Reinforcement Learning

*Type*: National Research Agency (ANR-9011)

*Coordinator*: Inria Lille (A. Lazaric)

*Duration*: 2014-2018

*Abstract*: ExTra-Learn is directly motivated by the
evidence that one of the key features that allows humans to
accomplish complicated tasks is their ability of building
knowledge from past experience and transfer it while learning new
tasks. We believe that integrating transfer of learning in machine
learning algorithms will dramatically improve their learning
performance and enable them to solve complex tasks. We identify in
the reinforcement learning (RL) framework the most suitable
candidate for this integration. RL formalizes the problem of
learning an optimal control policy from the experience directly
collected from an unknown environment. Nonetheless, practical
limitations of current algorithms encouraged research to focus on
how to integrate prior knowledge into the learning
process. Although this improves the performance of RL algorithms,
it dramatically reduces their autonomy. In this project we pursue
a paradigm shift from designing RL algorithms incorporating prior
knowledge, to methods able to incrementally discover, construct,
and transfer “prior” knowledge in a fully automatic way. More in
detail, three main elements of RL algorithms would significantly
benefit from transfer of knowledge. *(i)* For every new
task, RL algorithms need exploring the environment for a long
time, and this corresponds to slow learning processes for large
environments. Transfer learning would enable RL algorithms to
dramatically reduce the exploration of each new task by exploiting
its resemblance with tasks solved in the past. *(ii)* RL
algorithms evaluate the quality of a policy by computing its
state-value function. Whenever the number of states is too large,
approximation is needed. Since approximation may cause
instability, designing suitable approximation schemes is
particularly critical. While this is currently done by a domain
expert, we propose to perform this step automatically by
constructing features that incrementally adapt to the tasks
encountered over time. This would significantly reduce human
supervision and increase the accuracy and stability of RL
algorithms across different tasks. *(iii)* In order to
deal with complex environments, hierarchical RL solutions have
been proposed, where state representations and policies are
organized over a hierarchy of subtasks. This requires a careful
definition of the hierarchy, which, if not properly constructed,
may lead to very poor learning performance. The ambitious goal of
transfer learning is to automatically construct a hierarchy of
skills, which can be effectively reused over a wide range of
similar tasks.

*Activity Report*: Research in ExTra-Learn continued in
investigating how knowledge can be transferred into reinforcement
learning algorithms to improve their performance. Pierre-Victor
Chaumier did a 4 months internship in SequeL studying how to
perform transfer neural networks across different games in the
Atari platform. Unfortunately, the preliminary results we obtained
were not very positive. We investigated different transfer models,
from basic transfer of a fully trained network, to co-train over
multiple games and retrain with initialization from a previous
network. In most of the cases, the improvement from transfer was
rather limited and in some cases even negative transfer effects
appeared. This seems to be intrinsic in the neural network
architecture which tends to overfit on one single task and it
poorly generlizes over alternative tasks. Another activity was
related to the study of macro-actions in RL. We proved for the
first time under which conditions macro-actions can actually
improve the learning speed of an RL exploration-exploitation
algorithm. This is the first step towards the automatic
identification and construction of useful macro-actions across
multiple tasks.

*Acronym*: KEHATH

*Title*: Advanced Quality Methods for Post-Edition of Machine Translation

*Type*: ANR

*Coordinator*: Lingua & Machina

*Duration*: 2014-2017

*Other partners*: Univ. Lille 1, Laboratoire d'Informatique de Grenoble (LIG)

*Abstract*: The translation community has seen a major
change over the last five years. Thanks to progress in the
training of statistical machine translation engines on corpora of
existing translations, machine translation has become good enough
so that it has become advantageous for translators to post-edit
machine outputs rather than translate from scratch. However,
current enhancement of machine translation (MT) systems from human
post-edition (PE) are rather basic: the post-edited output is
added to the training corpus and the translation model and
language model are re-trained, with no clear view of how much has
been improved and how much is left to be improved. Moreover, the
final PE result is the only feedback used: available technologies
do not take advantages of logged sequences of post-edition
actions, which inform on the cognitive processes of the
post-editor. The KEHATH project intends to address these issues
in two ways. Firstly, we will optimise advanced machine learning
techniques in the MT+PE loop. Our goal is to boost the impact of
PE, that is, reach the same performance with less PE or better
performance with the same amount of PE. In other words, we want to
improve machine translation learning curves. For this purpose,
active learning and reinforcement learning techniques will be
proposed and evaluated. Along with this, we will have to face
challenges such as MT systems heterogeneity (statistical and/or
rule-based), and ML scalability so as to improve domain-specific
MT. Secondly, since quality prediction (QP) on MT outputs is
crucial for translation project managers, we will implement and
evaluate in real-world conditions several confidence estimation
and error detection techniques previously developed at a
laboratory scale. A shared concern will be to work on continuous
domain-specific data flows to improve both MT and the performance
of indicators for quality prediction. The overall goal of the
KEHATH project is straightforward: gain additional machine
translation performance as fast as possible in each and every new
industrial translation project, so that post-edition time and cost
is drastically reduced. Basic research is the best way to reach
this goal, for an industrial impact that is powerful and
immediate.

*Title*: Bandits pour l'Internet des Objets

*Type*: CNRS PEPS project

*Coordinator*: CNRS (E. Kaufmann)

*Duration*: april-december 2017

*Abstract*: (in French)
Dans le but d’améliorer le qualité et de minimiser les coûts énergétiques des communications entre les objets communicants et leurs stations de base, nous cherchons dans ce projet à adapter les avancées récentes du domaine de la radio intelligente à la spécificité des communications de type Internet des Objets. Vu l’engorgement du spectre fréquentiel, il est nécessaire pour ces objets d’apprendre à détecter de manière adaptative quand et sur quelle fréquence communiquer. Nous proposons pour cette tâche l’utilisation d’algorithmes dits de bandit à plusieurs bras, déjà connus dans le contexte de la radio intelligente, mais pas toujours adaptés à la spécificité des communications pour l’Internet des Objets. Nous introduirons de nouveaux algorithmes de bandit multi-joueurs, traduisant la coordination nécessaire entre les multiples objets en plus de l’apprentissage de la qualité des canaux fréquentiel. Ensuite nous envisagerons une nouvelle modélisation, de type bandit adversarial, pour décrire les communications dans des standards comme LoRa où les objets reçoivent des messages de confirmation des stations de bases, conduisant à des algorithmes minimisant la latence de ces communications.

ENS Paris-Saclay

M. Valko collaborated with V. Perchet on structured bandit problem. They co-supervise a PhD student (P. Perrault) together.

Institut de Mathématiques de Toulouse

E. Kaufmann collaborated with Aurélien Garivier on sequential testing and structured bandit problems.

CentraleSupélec Rennes

E. Kaufmann co-advises Lilian Besson, who works at CentraleSupélec with Christophe Moy. Christophe, Lilian and Émilie worked together on a PEPS project about bandits for Internet Of Things. One paper was published to the CROWNCOM conference, and another has been submitted to the ALT conference.

Program: H2020

Project acronym: BabyRobot

Project title: Child-Robot Communication and Collaboration

Duration: 01/2016 - 12/2018

Coordinator: Alexandros Potamianos (Athena Research and Innovation Center in Information Communication and Knowledge Technologies, Greece)

Other partners: Institute of Communication and Computer Systems (Greece), The University of Hertfordshire Higher Education Corporation (UK), Universitaet Bielefeld (Germany), Kunlgliga Tekniska Hoegskolan (Sweden), Blue Ocean Robotics ApS (Denmark), Univ. Lille (France), Furhat Robotics AB (Sweden)

Abstract: The crowning achievement of human communication is our unique ability to share intentionality, create and execute on joint plans. Using this paradigm we model human-robot communication as a three step process: sharing attention, establishing common ground and forming shared goals. Prerequisites for successful communication are being able to decode the cognitive state of people around us (mindreading) and building trust. Our main goal is to create robots that analyze and track human behavior over time in the context of their surroundings (situational) using audio-visual monitoring in order to establish common ground and mind-reading capabilities. On BabyRobot we focus on the typically developing and autistic spectrum children user population. Children have unique communication skills, are quick and adaptive learners, eager to embrace new robotic technologies. This is especially relevant for special eduation where the development of social skills is delayed or never fully develops without intervention or therapy. Thus our second goal is to define, implement and evaluate child-robot interaction application scenarios for developing specific socio-affective, communication and collaboration skills in typically developing and autistic spectrum children. We will support not supplant the therapist or educator, working hand-inhand to create a low risk environment for learning and cognitive development. Breakthroughs in core robotic technologies are needed to support this research mainly in the areas of motion planning and control in constrained spaces, gestural kinematics, sensorimotor learning and adaptation. Our third goal is to push beyond the state-of-the-art in core robotic technologies to support natural human-robot interaction and collaboration for edutainment and healthcare applications. Creating robots that can establish communication protocols and form collaboration plans on the fly will have impact beyond the application scenarios investigated here.

Program: CHIST-ERA

Project acronym: DELTA

Project title: Dynamically Evolving Long-Term Autonomy

Duration: October 2017 - December 2021

Coordinator: Anders Jonsson (PI)

Inria coPI: Michal Valko

Other partners: UPF Spain, MUL Austria, ULG Belgium

Abstract: Many complex autonomous systems (e.g., electrical distribution networks) repeatedly select actions with the aim of achieving a given objective. Reinforcement learning (RL) offers a powerful framework for acquiring adaptive behaviour in this setting, associating a scalar reward with each action and learning from experience which action to select to maximise long-term reward. Although RL has produced impressive results recently (e.g., achieving human-level play in Atari games and beating the human world champion in the board game Go), most existing solutions only work under strong assumptions: the environment model is stationary, the objective is fixed, and trials end once the objective is met. The aim of this project is to advance the state of the art of fundamental research in lifelong RL by developing several novel RL algorithms that relax the above assumptions. The new algorithms should be robust to environmental changes, both in terms of the observations that the system can make and the actions that the system can perform. Moreover, the algorithms should be able to operate over long periods of time while achieving different objectives. The proposed algorithms will address three key problems related to lifelong RL: planning, exploration, and task decomposition. Planning is the problem of computing an action selection strategy given a (possibly partial) model of the task at hand. Exploration is the problem of selecting actions with the aim of mapping out the environment rather than achieving a particular objective. Task decomposition is the problem of defining different objectives and assigning a separate action selection strategy to each. The algorithms will be evaluated in two realistic scenarios: active network management for electrical distribution networks, and microgrid management. A test protocol will be developed to evaluate each individual algorithm, as well as their combinations.

Program: CHIST-ERA

Project acronym: IGLU

Project title: Interactively Grounded Language Understanding

Duration: 11/2015 - 10/2018

Coordinator: Jean Rouat (Université de Sherbrooke, Canada)

Other partners: UMONS (Belgique), Inria (France), Univ-Lille (France), KTH (sweden), Universidad de Zaragoza (Spain)

Abstract: Language is an ability that develops in young children through joint interaction with their caretakers and their physical environment. At this level, human language understanding could be referred as interpreting and expressing semantic concepts (e.g. objects, actions and relations) through what can be perceived (or inferred) from current context in the environment. Previous work in the field of artificial intelligence has failed to address the acquisition of such perceptually-grounded knowledge in virtual agents (avatars), mainly because of the lack of physical embodiment (ability to interact physically) and dialogue, communication skills (ability to interact verbally). We believe that robotic agents are more appropriate for this task, and that interaction is a so important aspect of human language learning and understanding that pragmatic knowledge (identifying or conveying intention) must be present to complement semantic knowledge. Through a developmental approach where knowledge grows in complexity while driven by multimodal experience and language interaction with a human, we propose an agent that will incorporate models of dialogues, human emotions and intentions as part of its decision-making process. This will lead anticipation and reaction not only based on its internal state (own goal and intention, perception of the environment), but also on the perceived state and intention of the human interactant. This will be possible through the development of advanced machine learning methods (combining developmental, deep and reinforcement learning) to handle large-scale multimodal inputs, besides leveraging state-of-the-art technological components involved in a language-based dialog system available within the consortium. Evaluations of learned skills and knowledge will be performed using an integrated architecture in a culinary use-case, and novel databases enabling research in grounded human language understanding will be released.

Title: Non-parametric sequential prediction project

Centrum Wiskunde & Informatica (CWI), Amsterdam (NL) - Peter Grünwald

Duration: 2016 - 2018

Start year: 2016

Abstract: The aim is to develop the theory of learning for sequential decision making under uncertainty problems.

In 2017, this collaboration involved D. Ryabko, É. Kaufmann, J. Ridgway, M. Valko, O. Maillard. A post-doc funded by Inria has been recruited in Fall 2016.

https://

Title: Educational Bandits

International Partner (Institution - Laboratory - Researcher):

Carnegie Mellon University (United States) - Department of Computer Science, Theory of computation lab - Emma Brunskill

Start year: 2015

See also: https://

Education can transform an individual's capacity and the opportunities available to him. The proposed collaboration will build on and develop novel machine learning approaches towards enhancing (human) learning. Massive open online classes (MOOCs) are enabling many more people to access education, but mostly operate using status quo teaching methods. Even more important than access is the opportunity for online software to radically improve the efficiency, engagement and effectiveness of education. Existing intelligent tutoring systems (ITSs) have had some promising successes, but mostly rely on learning sciences research to construct hand-built strategies for automated teaching. Online systems make it possible to actively collect substantial amount of data about how people learn, and offer a huge opportunity to substantially accelerate progress in improving education. An essential aspect of teaching is providing the right learning experience for the student, but it is often unknown a priori exactly how this should be achieved. This challenge can often be cast as an instance of decision-making under uncertainty. In particular, prior work by Brunskill and colleagues demonstrated that reinforcement learning (RL) and multi-arm bandit (MAB) can be very effective approaches to solve the problem of automated teaching. The proposed collaboration is thus intended to explore the potential interactions of the fields of online education and RL and MAB. On the one hand, we will define novel RL and MAB settings and problems in online education. On the other hand, we will investigate how solutions developed in RL and MAB could be integrated in ITS and MOOCs and improve their effectiveness.

Title: Adaptive allocation of resources for recommender systems

Inria contact: Michal Valko

International Partner (Institution - Laboratory - Researcher):

Univertät Potsdam, Germany A. Carpentier

Start year: 2017

We plan to improve a practical scenario of *resource
allocation in market surveys*, such as product appraisals and music
recommendation. In practice, the market is typically divided into
segments: geographic regions, age groups, ...These groups are
then queried for preference with some fixed rule of a number of
queries per group. This testing is *costly and
non-adaptive*. The reason is some groups are easier to estimate
than others, but this is impossible to know a priori. Our challenge
is **adaptively allocate the optimal number of samples** to
each group and improve the efficient of market studies, by
providing *sample-efficient* solutions.

**Adobe Research**

Branislav Kveton *Collaborator*

Zheng Wen *Collaborator*

Sharan Vaswani *Collaborator*

M. Valko collaborated with Adobe Research on online influence maximization in social networks. This led to a publication in NIPS 2017.

**Massachusetts Institute of Technology**

Victor-Emmanuel Brunel *Collaborator*

M. Valko collaborated with V.-E. Brunel on the estimation of low rank determinantal point processes useful for diverse recommender systems.

**Univertät Potsdam**

Alexandra Carpentier *Collaborator*

M. Valko collaborated with A. Carpentier on adaptive estimation of the block-diagonal matrices with application to market segmentations. This collaboration formalized in September 2017 by creating a north-european associate team.

**University of California, Berkeley**

Victor Gabillon *Collaborator*

M. Valko collaborated with V. Gabillon on the sample complexities in unknown type of environments.

**University of Southern California**

Haipeng Luo *Collaborator*

M. Valko collaborated with H. Luo on online submodular minimization.

**Adobe Research**

Mohammad Ghavamzadeh *Collaborator*

A. Lazaric collaborated with Adobe Research on active learning for accurate estimation of linear models. This led to a publication in ICML 2017.

**Stanford University**

Carlos Riquelme *Collaborator*

A. Lazaric collaborated with Carlos Riquelme on active learning for accurate estimation of linear models. This led to a publication in ICML 2017.

**Stanford University**

Emma Brunskill *Collaborator*

A. Lazaric collaborated with Emma Brunskill on exploration-exploitation with options in reinforcement learning. This led to a publication in NIPS 2017.

**University of California, Irvine**

Anima Anandkumar *Collaborator*

Kamyar Azzizade *Collaborator*

A. Lazaric collaborated with A. Anandkumar and K. Azzizade on exploration-exploitation with in reinforcement learning with state clustering. This led to a submission to AI&Stats 2018.

**University of Leoben**

Ronald Ortner *Collaborator*

A. Lazaric collaborated with R. Ortner on exploration-exploitation in reinforcement learning with regularized optimization. This will lead to a submission to ICML 2018.

**Politecnico di Milano**

Marcello Restelli *Collaborator*

Matteo Pirotta collaborate with M. Restelli on several topics in reinforcement learning. This will lead to publications to ICML 2017 and NIPS 2017.

**Lancaster University**

B. Balle *Collaborator*

O. Maillard collaborated on spectral learning of Hankel matrices. This led to a publication at ICML.

**Mila, Université de Montréal**

A. Courville *Collaborator*

F. Strub and O. Pietquin collaborate on deep reinforcement learning for language acquisition. This led to several papers at IJCAI, CVPR, and NIPS, as well as the guesswhat?! dataset and protocol, and the HOME dataset.

**Uberlandia University, Brasil**

C. Felicio *Collaborator*

Ph. Preux supervises this PhD on recommendation systems. This led to the defense of C. Felicio and a paper at UMAP.

**SequeL**

Title: The multi-armed bandit problem

International Partner (Institution - Laboratory - Researcher):

University of Leoben (Austria) Peter Auer

Duration: 2014 - 2018

Start year: 2014

In a nutshell, the collaboration is focusing on nonparametric algorithms for active learning problems, mainly involving theoretical analysis of reinforcement learning and bandits problems beyond the traditional settings of finite-state MDPs (for RL) or i.i.d. rewards (for bandits). Peter Auer from University of Leoben is a worldwide leader in the field, having introduced the UCB approach around 2000, along with its finite-time analysis. Today, SequeL is likely to be the largest research group working in this field in the world, enjoying worldwide recognition. SequeL and P. Auer's group have been collaborating for a couple of years now; they have co-authored papers, visited each other (sabbatical stay, post-doc), coorganized workshops; the STREP Complacs partially funds this very active collaboration.

**Contextual multi-armed bandits with hidden structure**

Title: Contextual multi-armed bandits with hidden structure

International Partner (Institution - Laboratory - Researcher):

IISc Bangalore (India) – Aditya Gopalan

Duration: 2015 - 2017

Recent advances in Multi-Armed Bandit (MAB) theory have yielded key insights into, and driven the design of applications in, sequential decision making in stochastic dynamical systems. Notable among these are recommender systems, which have benefited greatly from the study of contextual MABs incorporating user-specific information (the context) into the decision problem from a rigorous theoretical standpoint. In the proposed initiative, the key features of (a) sequential interaction between a learner and the users, and (b) a relatively small number of interactions per user with the system, motivate the goal of efficiently exploiting the underlying collective structure of users. The state-of-the-art lacks a wellgrounded strategy with provably near-optimal guarantees for general, low-rank user structure. Combining expertise in the foundations of MAB theory together with recent advances in spectral methods and low-rank matrix completion, we target the first provably near-optimal sequential low-rank MAB

Harm de Vries, PhD stduent, University of Montreal, Canada, Jan-Jun 2017

Mohammad Sadegh Talebi Mazraeh Shahi, PhD student, KTH Royal Institute of Technology, Sweden, Jun-Sep 2017

Xuedong Shang, master student, ENS Rennes, Feb–Jun 2017

Iuliia Olkhovskaia, master student, Moscow Institute of Physics and Technology, Russia, Feb–Jul 2017

Georgios Papoudakis, master student, Aristotle University of Thessalnoniki, Greece, May–Sep 2017

Subhojyoti Mukherjee, master student, Indian Institute of technology, Sep-Nov 2017

Mahsa Asadi, Shiraz University, Iran, Sep-Dec 2017

*Visually grounded interaction and language*, workshop at NIPS 2017, organized by Florian Strub, Harm de Vries, Abhishek Das, Satwik Kottur, Stefan Lee, Mateusz Malinowski, Olivier Pietquin, Devi Parikh, Dhruv Batra, Aaron C Courville, Jérémie Mary. URL: https://

O. Maillard: Workshop of the working group *Sequential Structured Statistical Learning*, May 17 2017 at Institut des Hautes Etudes Scientifiques (Bures-sur-Yvette). URL: https://

Members of SequeL have been involved in the following program committees in 2017:

Senior PC for International Joint Conference on Artificial Intelligence (IJCAI 2017)

Senior PC for ACM KDD 2017

International Conference on Artificial Intelligence and Statistics (AI & STATS 2017)

PC member for the international Conference On Learning Theory (COLT 2017)

European Conference on Machine Learning (ECML 2017)

1st Workshop on Transfer in Reinforcement Learning (TiRL) 2017

The Third International Conference on Machine Learning, Optimization and Big Data (MOD 2017)

French conferences:

Extraction et Gestion de Conaissances (EGC),

Journées Francophones de Planification, Décision, Apprentissage (JFPDA)

Journées de la Société Francophone de Classification (SFC)

Conférence francophone sur l'Apprentissage Automatique (CAp)

Édouard Oyallon receives a “best NIPS reviewer award”.

Members of SequeL have reviewed papers for the following conferences:

AI&Stats, COLT, ECML, ICML, IJCAI, NIPS, ALT.

Automatica

IEEE Transactions on Pattern Analysis and Machine Intelligence - Journal Reviewer

IEEE transaction on Software Engineering

International Federation of Automatic Control

Bernoulli Journal

Journal of Machine Learning Research

IEEE Transaction on Signal Processing

R. Gaudel,
*Recommendation as a Sequential Process*, Presented on Februaray 1st, 2017, at Séminaire CMLA, Paris, France
(*CMLA 2017*)

R. Gaudel,
*Recommendation as a Sequential Process*, Presented on January 10th, 2017, at Séminaire ENSAI, Rennes (Bruz), France
(*ENSAI 2017*)

A. Lazaric,
*Spectral Methods for Reinforcement Learning*, Presented on April 10, 2017, at Amazon, Berlin, Germany

M. Valko,
*SequeL, graphs in ML, and online recommender systems*, Presented on November 9th, 2017 at Plateau Inria Euratechnologies in Lille, France
(*Euratechnologies 2017*)

M. Valko,
*Sequential sampling for kernel matrix approximation and online learning*
Presented on September 19th, DeepMind, London, UK
(*DeepMind 2017*)

M. Valko,
*Active learning on networks and online influence maximization*, Presented on September 18th, 2017, Decision Theory and Network Science: Methods and Applications, Lancaster, UK
(*STOR-i 2017*)

M. Valko,
*Side observation in graph bandits*, Presented on July 11th, 2017, ICML 2017 workshop on Picky Learners, Sydney, Australia (*ICML 2017*)

M. Valko,
*Distributed sequential sampling for kernel matrix approximation*, Presented on June 28th, 2017, L'Institut de Mathématiques de Toulouse, France
(*IMT 2017*)

M. Valko,
*Online sequential solutions for recommender systems*, Presented on June 14th, 2017 at Journées Scientifiques Inria 2017 in Nice, France
(*JS 2017*)

M. Valko,
*Where is Justin Bieber?*, Presented on March 30th, 2017 at Dating day in Lille, France
(*Dating 2017*)

M. Valko,
*Distributed sequential sampling for kernel matrix approximation*, Presented on March 22nd, 2017, for Universität Potsdam at Amazon
(*Berlin 2017*)

É. Kaufmann was a member of the committee of Experts for Hiring junior faculty in the maths departement of Université of Lille 1

J.Mary was a member of the industrial transfer commission of Inria Lille

Alessandro Lazaric was reviewer for NSFC-ISF Research Grant

Philippe Preux is a member of the evaluation committee and participates in the hiring, promotion, and evaluation juries of Inria:

Inria CR1 hiring committee

Inria Lille CR2 hiring committee

Inria committee for researcher promotion

Inria committee for PEDR

Philippe Preux was a member of the hiring committees for 1 professor and 2 associate professors at the Université de Lille 3

Philippe Preux was a member of the committee for PhD grant of the “Pôle Métropolitain de la Côte d'Opale”

Philippe Preux reviewed a proposal for ANRT (and declined invitation from ANR)

M. Valko is an elected member of the evaluation committee and participates in the hiring, promotion, and evaluation juries of Inria, notably

Hiring committee for junior researchers at Inria Saclay (2017)

Inria work group for deontological ethics (2017)

Selection committee for Inria award for scientific excellence of junior and confirmed researchers (2017)

M. Valko was a member national Inria acceptance committee for hiring junior researchers

M. Valko was a member of the committee of Experts for Hiring junior faculty at CMLA, ENS Paris-Saclay

*M. Gaudel* was member of the Board of CRIStAL.

Philippe Preux is:

“délégué scientifique adjoint” of the Inria center in Lille

member of the Inria evaluation committee (CE)

member of the Inria internal scientific committee (COSI)

member of the scientific committee of CRIStAL

the head of the “Data Intelligence” thematic group at CRIStAL

Master: É. Kaufmann, 2017/2018 Fall: Machine Learning, 18h eq TD, M2 Maths/Finances, Université de Lille 1

Master: É. Kaufmann, 2016/2017 Spring: Data Mining, 36h eq TD, M1 Maths/Finances, Université de Lille 1

Master: A. Lazaric, 2017/2018 Fall: Reinforcement Learning, 36h eqTD, M2, ENS Cachan

Master: M. Valko, 2017/2018 Fall: Graphs in Machine Learning, 36h eqTD, M2, ENS Cachan

PhD in progress: Marc Abeille, Exploration-exploitation in reinforcement learning, started Sept. 2014, advisor: Remi Munos, Alessandro Lazaric

PhD in progress: Merwan Barlier, Human-in-the loop reinforrcement learning for dialogue systems, started Oct. 2014, advisor: Olivier Pietquin

PhD in progress: Alexandre Bérard, Deep learning for post-editing and automatic translation, started Oct. 2014, advisor: Olivier Pietquin

PhD in progress: Lilian Besson, Bandit approach to improve Internet Of Things Communications, started Oct. 2016, advisor: Émilie Kaufmann, Christophe Moy (CentraleSupélec Rennes)

PhD in progress: Daniele Calandriello, Efficient Sequential Learning in Structured and Constrained Environment, Inria, started Oct. 2014, advisor: Michal Valko, Alessandro Lazaric

PhD in progress: Ronan Fruit, Exploration-exploitation in hierarchical reinforcement learning, Inria, started Dec. 2015, advisor: Daniil Ryabko, Alessandro Lazaric

PhD in progress: Pratik Gajane, Multi-armed bandits with unconventional feedback, started Oct. 2014, defended Nov. 14th 2017, advisor: Philippe Preux

PhD in progress: Guillaume Gautier, DPPs in ML, started Oct. 2016, advisor: Michal Valko; Rémi Bardenet

PhD in progress: Jean-Bastien Grill, Création et analyse d'algorithmes efficaces pour la prise de décision dans un environnement inconnu et incertain, Inria/ENS Paris/Lille 1, started Oct. 2014, advisor: Rémi Munos, Michal Valko

PhD in progress: Édouard Leurent, Autonomous vehicle control: application of machine learning to contextualized path planning, started Oct. 2017, advisor: Odalric Maillard, Philippe Preux, Denis Effimov (NON-A), Wilfrid Perruquetti (NON-A)

PhD in progress: Sheikh Waqas Akhtar, Bandits for non-stationarity and structure, started Oct. 2017, advisor: Odalric Maillard, Daniil Ryabko.

PhD in progress: Julien Perolat, Reinforcement learning: the multi-player case, started Oct. 2014, advisor: Olivier Pietquin

PhD in progress: Pierre Perrault, Online Learning on Streaming Graphs, started Sep. 2017, advisor: Michal Valko; Vianney Perchet

PhD in progress: Mathieu Seurin, Multi-scale rewards in reinforcement learning, started Oct. 2017, advisor: Olivier Pietquin, Philippe Preux

PhD in progress: Julien Seznec, Sequential Learning for Educational Systems, started Mar. 2017, advisor: Michal Valko; Alessandro Lazaric, Jonathan Banon

PhD in progress: Xuedong Shang, Adaptive methods for optimization in stochastic environments, started Oct. 2017, advisor: Émilie Kaufmann, Michal Valko

PhD in progress: Florian Strub, Reinforcement Learning for visually grounded interaction, started Jan. 2016, advisors: Olivier Pietquin and Jeremie Mary

PhD in progress: Kiewan Villatel, Deep Learning for Conversion Rate Prediction in Online Advertising, started Oct. 2017, advisor: Philippe Preux

PhD and HDR juries:

É. Kaufmann, *Navikumar Modi*, CentraleSupélec Rennes, May 2017

A. Lazaric:

*Stefano Paladino*, Politecnico di Milano, Dec 2017

*Micheal Castronovo*, Université de Liege, March 2017

*Raffaello Camoriano*, Universitá di Genova, April 2017

*Claire Vernade*, TelecomParis Tech, October 2017

Ph. Preux:

Cricia Zilda Felicio Paixao, Uniervity Uberlandia, Brasil

Thibault Gisselbrecht, LIP 6, UPMC, Paris

Pratik Gajane, CRIStAL, Lille

M. Valko: *Clément Bouttier*, Université Toulouse 3 Paul Sabatier, June 2017

PhD mid-term evaluation:

M. Valko: *Thibault Liétard*, Université Lille, September 2017

CNRS publishes an article about zonotope sampling presented at ICML (see http://

Julien Seznec publishes an article in *Les Echos* that discusses ML for education (November 2017).

Émilie Kaufmann gave a popularization talk about bandit algorithms aimed at high school/prepa students at the MathPark seminar, organized at IHP in Paris (April 2017).

*Avec GuessWhat?! quand l’humain joue, l’ordinateur s’initie au langage*, https://

Florian Strub and Mathieu Seurin demonstrated guesswhat?! during the celebrations of Inria 50th anniversary (November 2017).

Philippe Preux:

interviewed for an article on *L’intelligence artificielle, est-ce vraiment de l’intelligence ?* in *BioTech.info*, Jan. 2017.

participates to a debate about Artificial Intelligence, as part of the franceIA tour (Euratechnologies, Lille).

interview by AFP in relation to alphaGo.

interviewed for an article on AI and games, published in *Le figaro*.

an interview that led to a publication in ATOS Connexion, the ATOS internal journal.

a video has been made with him being interviewed on Artificial Intelligence by NordEka (to be available on youtube).

has been selected to be protrayed at the “Soirée partenaires de l'université de Lille”, Nov.

was a member of the organization comittee of the celebrations of the 50th Inria anniversary in Lille.

co-organizes a meet-up on big data and machine learning at Inria.

M. Valko,
*Comment maximiser la détection des influenceurs sur les réseaux sociaux ?*, popularization talk, Presented on May 30th, 2017 at 13 France
(*Inria 13:45 2017*)