SequeL means “Sequential Learning”. As such, SequeL focuses on the task of learning in artificial systems (either hardware, or software) that gather information along time. Such systems are named *(learning) agents* (or learning machines) in the following.
These data may be used to estimate some parameters of a model, which in turn, may be used for selecting actions in order to perform some long-term optimization task.

For the purpose of model building, the agent needs to represent information collected so far in some compact form and use it to process newly available data.

The acquired data may result from an observation process of an agent in interaction with its environment (the data thus represent a perception). This is the case when the agent makes decisions (in order to attain a certain objective) that impact the environment, and thus the observation process itself.

Hence, in SequeL, the term **sequential** refers to two aspects:

The **sequential acquisition of data**, from which a model is learned (supervised and non supervised learning),

the **sequential decision making task**, based on the learned model (reinforcement learning).

Examples of sequential learning problems include:

tasks deal with the prediction of some response given a certain set of observations of input variables and responses. New sample points keep on being observed.

tasks deal with clustering objects, these latter making a flow of objects. The (unknown) number of clusters typically evolves during time, as new objects are observed.

tasks deal with the control (a policy) of some system which has to be optimized (see ). We do not assume the availability of a model of the system to be controlled.

In all these cases, we mostly assume that the process can be considered stationary for at least a certain amount of time, and slowly evolving.

We wish to have any-time algorithms, that is, at any moment, a prediction may be required/an action may be selected making full use, and hopefully, the best use, of the experience already gathered by the learning agent.

The perception of the environment by the learning agent (using its sensors) is generally neither the best one to make a prediction, nor to take a decision (we deal with Partially Observable Markov Decision Problem). So, the perception has to be mapped in some way to a better, and relevant, state (or input) space.

Finally, an important issue of prediction regards its evaluation: how wrong may we be when we perform a prediction? For real systems to be controlled, this issue can not be simply left unanswered.

To sum-up, in SequeL, the main issues regard:

the learning of a model: we focus on models that map some
input space

the observation to state mapping,

the choice of the action to perform (in the case of sequential decision problem),

the performance guarantees,

the implementation of usable algorithms,

all that being understood in a *sequential* framework.

SequeL is primarily grounded on two domains:

the problem of decision under uncertainty,

statistical analysis and statistical learning, which provide the general concepts and tools to solve this problem.

To help the reader who is unfamiliar with these questions, we briefly present key ideas below.

The phrase “Decision under uncertainty” refers to the problem of taking decisions when we do not have a full knowledge neither of the situation, nor of the consequences of the decisions, as well as when the consequences of decision are non deterministic.

We introduce two specific sub-domains, namely the Markov decision processes which models sequential decision problems, and bandit problems.

Sequential decision processes occupy the heart of the SequeL project; a detailed presentation of this problem may be found in Puterman's book .

A Markov Decision Process (MDP) is defined as the tuple

In the MDP (

The history of the process up to time

We move from an MD process to an MD problem by formulating the goal of the agent, that is what the sought policy

where

In order to maximize a given functional in a sequential framework, one usually applies Dynamic Programming (DP) , which introduces the optimal value function

We say that a policy *i.e.*, if

We say that a (deterministic stationary) policy

where

The goal of Reinforcement Learning (RL), as well as that of dynamic programming, is to design an optimal policy (or a good approximation of it).

The well-known Dynamic Programming equation (also called the Bellman equation) provides a relation between the optimal value function at a state

The benefit of introducing this concept of optimal value function relies on the property that, from the optimal value function

In short, we would like to mention that most of the reinforcement learning methods developed so far are built on one (or both) of the two following approaches ( ):

Bellman's dynamic programming approach, based on the introduction of the value function. It consists in learning a “good” approximation of the optimal value function, and then using it to derive a greedy policy w.r.t. this approximation. The hope (well justified in several cases) is that the performance **Approximate dynamic programming** addresses the problem of estimating performance bounds (*e.g.* the loss in performance

Pontryagin's maximum principle approach, based on sensitivity analysis of the performance measure w.r.t. some control parameters. This approach, also called **direct policy search** in the Reinforcement Learning community aims at directly finding a good feedback control law in a parameterized policy space without trying to approximate the value function. The method consists in estimating the so-called **policy gradient**, *i.e.* the sensitivity of the performance measure (the value function) w.r.t. some parameters of the current policy. The idea being that an optimal control problem is replaced by a parametric optimization problem in the space of parameterized policies. As such, deriving a policy gradient estimate would lead to performing a stochastic gradient method in order to search for a local optimal parametric policy.

Finally, many extensions of the Markov decision processes exist, among which the Partially Observable MDPs (POMDPs) is the case where the current state does not contain all the necessary information required to decide for sure of the best action.

Bandit problems illustrate the fundamental difficulty of decision making in the face of uncertainty: A decision maker must choose between what seems to be the best choice (“exploit”), or to test (“explore”) some alternative, hoping to discover a choice that beats the current best choice.

The classical example of a bandit problem is deciding what treatment to give each patient in a clinical trial when the effectiveness of the treatments are initially unknown and the patients arrive sequentially. These bandit problems became popular with the seminal paper , after which they have found applications in diverse fields, such as control, economics, statistics, or learning theory.

Formally, a K-armed bandit problem (*i.e.*, when the arm giving the highest expected reward is pulled all the time.

The name “bandit” comes from imagining a gambler playing with K slot machines. The gambler can pull the arm of any of the machines, which produces a random payoff as a result: When arm k is pulled, the random payoff is drawn from the distribution associated to k. Since the payoff distributions are initially unknown, the gambler must use exploratory actions to learn the utility of the individual arms. However, exploration has to be carefully controlled since excessive exploration may lead to unnecessary losses. Hence, to play well, the gambler must carefully balance exploration and exploitation. Auer *et al.* introduced the algorithm UCB (Upper Confidence Bounds) that follows what is now called the “optimism in the face of uncertainty principle”. Their algorithm works by computing upper confidence bounds for all the arms and then choosing the arm with the highest such bound. They proved that the expected regret of their algorithm increases at most at a logarithmic rate
with the number of trials, and that the algorithm achieves the smallest possible regret up to some sub-logarithmic factor (for the considered family of distributions).

Many of the problems of machine learning can be seen as extensions of classical problems of mathematical statistics to their (extremely) non-parametric and model-free cases. Other machine learning problems are founded on such statistical problems. Statistical problems of sequential learning are mainly those that are concerned with the analysis of time series. These problems are as follows.

Given a series of observations

Alternatively, rather than making some assumptions on the data, one can change the goal: the predicted probabilities should be asymptotically as good as those given by the best reference predictor from a certain pre-defined set.

Another dimension of complexity in this problem concerns the nature of observations

Given a series of observations of

The problem of hypothesis testing can also be studied in its general formulations: given two (abstract) hypothesis

A stochastic process is generating the data. At some point, the process distribution changes. In the “offline” situation, the statistician observes the resulting sequence of outcomes and has to estimate the point or the points at which the change(s) occurred. In online setting, the goal is to detect the change as quickly as possible.

These are the classical problems in mathematical statistics, and probably among the last remaining statistical problems not adequately addressed by machine learning methods. The reason for the latter is perhaps in that the problem is rather challenging. Thus, most methods available so far are parametric methods concerning piece-wise constant distributions, and the change in distribution is associated with the change in the mean. However, many applications, including DNA analysis, the analysis of (user) behaviour data, etc., fail to comply with this kind of assumptions. Thus, our goal here is to provide completely non-parametric methods allowing for any kind of changes in the time-series distribution.

The problem of clustering, while being a classical problem of mathematical statistics, belongs to the realm of unsupervised learning. For time series, this problem can be formulated as follows: given several samples

The online version of the problem allows for the number of observed time series to grow with time, in general, in an arbitrary manner.

Semi-supervised learning (SSL) is a field of machine learning that studies learning from both labeled and unlabeled examples. This learning paradigm is extremely useful for solving real-world problems, where data is often abundant but the resources to label them are limited.

Furthermore, *online* SSL is suitable for adaptive machine learning
systems.
In the classification case, learning is viewed as a repeated game against a
potentially adversarial nature. At each step

The challenge of the game is that we only exceptionally observe the true label

Recommendation systems have been a major field of applications of our research for a few years now. Recommendation systems should be understood in a broad sense, as systems that aim at providing personalized responses/items to users, based on their characteristics, and the environment in which the interaction happens.

In that broad sense, we have collaborated with companies on computational advertizing and recommendation systems. These collaborations have involved research studies on the following issues:

cold-start problem,

time varying environment,

ability to deal with large amounts of users and items,

ability to design algorithms to respond within a reasonnable amount of time, usually below 1 millisecond.

We have also competed in challenges, winning some of them

A company has been awarded an innovation award in 2015, thanks to the research work done in collaboration with SequeL (*cf.* sec. ).

In these works, we develop an original

We also started a new work aiming to introduce deep learning in recommender systems. An engineer (Florian Strub) was recruited to work on this topic and presented some results at the NIPS'2015 workshop on “Machine Learning for (e-)Commerce”. Moreover we released some code to handle sparse data with the Torch7 framework and GPUs https://

A Spoken Dialogue System (SDS) is a system enabling human people to interact with machines through speech. In contrast with command-and-control systems or question-answering systems that react to a single utterances, SDS build a real interaction over time and try to achieve complex tasks (like hotel booking, appointment scheduling etc.) by gathering pieces of information through several turns of dialogue. To do so, besides the required speech and language processing modules (*e.g.* speech recognition and synthesis, language understanding and generation), there is a need for a dialogue management module that decides what to say in any situation so as to achieve the goal in the most natural and efficient way, recovering from speech processing errors in a seamless manner.

The dialogue management module is thus taking sequences of decisions to achieve a long-term goal in an unknown, noisy and hard to model environment (since it includes human users). For this reason, we work on machine learning techniques such as reinforcement and imitation learning to optimize this specific sequential decision making under uncertainty problem.

In addition to bring novel and efficient solutions to this problem, we are interested in the new challenges brought to our research in machine learning by this type of application. Indeed, having the human in the learning loop typically requires dealing with non-stationarity, data-efficiency, safety as well as cooperation and imitation.

We collaborate with companies such as Orange Labs on this topic and several projects are ongoing (ANR MaRDi, CHIST-ERA IGLU). We will also be participating to a H2020 project on human robot-interaction starting in 2016 (BabyRobot). We organised a workshop at ICML this year: Machine Learning for Interactive Systems (MLIS). Olivier Pietquin was invited as a panelist at the NIPS Workshop on spoken language understanding and dialogue.

Reinforcement learning leads to the design of systems that adapts their behavior to their environment, hence adaptive systems. We have worked on various applications of this idea, beyond the two main applications domaines mentioned above (recommendation systems, and spoken dialog systems). Let us briefly mention: educative tutoring systems; adaptive heating system in buildings; players that adapt their strength to that of their human opponent; bioreactor.

Since the goal of our research is to design systems that learn to act in an optimal way in their environment, prediction is a major issue. Hence, we are doing some research activities on this particular task, without always being in direct connection with learning a policy.

We have done some research in the area of prediction web-server load in a non stationary environment. We also have activities in the prediction in bug in software code.

organization of the 32nd International Conference on Machine Learning (ICML), in Lille, from Jul 6th to Jul 11th, 2015.

ICML is the leading international conference in Machine Learning. This is the first time of its history that France hosts ICML. This edition has been the largest of all the times, with 1690 registrants (the previous record was 1400 in Beijing, in 2014).

as an outcome of a contract with this start-up, Nuukik has been awarded “best data analysis” during the “connected commerce night” - http://

V. Gabillon and B. Piot both received an AFIA award for their respective PhD, defended in 2014. They were both ranked second in this competition.

Olivier Pietquin, Fellow of the “Institut Universitaire de France”.

A. Lazaric and M. Valko received best reviewer awards at ICML 2015.

This is a black-box function optimization toolkit that finds the global optimum of a function given a finite budget of noisy evaluations. The algorithm does not require the knowledge of the function's smoothness. It works for a larger class of functions than what was previously considered, especially for functions that are difficult to optimize, in a precise sense.

*
Nonparametric multiple change point estimation in highly dependent time series
*

Given a heterogeneous time-series sample, the objective is to find points in time, called change points, where the probability distribution generating the data has changed. The data are assumed to have been generated by arbitrary unknown stationary ergodic distributions. No modelling, independence or mixing assumptions are made. A novel, computationally efficient, nonparametric method is proposed, and is shown to be asymptotically consistent in this general framework. The theoretical results are complemented with experimental evaluations.

*
Explore no more: Improved high-probability regret bounds for non-stochastic bandits
*

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and significant modifications to the standard, more intuitive algorithms that come only with guarantees that hold on expectation. One of these modifications is forcing the learner to sample arms from the uniform distribution at least Ω(

*
First-order regret bounds for combinatorial semi-bandits
*

We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions. After making each decision, the learner observes the losses associated with its action, but not other losses. For this problem, there are several learning algorithms that guarantee that the learner's expected regret grows as O(

*
Random-Walk Perturbations for Online Combinatorial Optimization
*

We study online combinatorial optimization problems where a learner is interested in minimizing its cumulative regret in the presence of switching costs. To solve such problems, we propose a version of the follow-the-perturbed-leader algorithm in which the cumulative losses are perturbed by independent symmetric random walks. In the general setting, our forecaster is shown to enjoy near-optimal guarantees on both quantities of interest, making it the best known efficient algorithm for the studied problem. In the special case of prediction with expert advice, we show that the forecaster achieves an expected regret of the optimal order O(

*
Qualitative Multi-Armed Bandits: A Quantile-Based Approach
*

We formalize and study the multi-armed bandit (MAB) problem in a generalized stochastic setting, in which rewards are not assumed to be numerical. Instead, rewards are measured on a qualitative scale that allows for comparison but invalidates arithmetic operations such as averaging. Correspondingly, instead of characterizing an arm in terms of the mean of the underlying distribution, we opt for using a quantile of that distribution as a representative value. We address the problem of quantile-based online learning both for the case of a finite (pure exploration) and infinite time horizon (cumulative regret minimization). For both cases, we propose suitable algorithms and analyze their properties. These properties are also illustrated by means of first experimental studies.

*
Predicting the outcomes of every process for which an asymptotically accurate stationary predictor exists is impossible
*

The problem of prediction consists in forecasting the conditional distribution of the next outcome given the past. Assume that the source generating the data is such that there is a stationary predictor whose error converges to zero (in a certainsense). The question is whether there is a universal predictor for all such sources, that is, a predictor whose error goes to zero if any of the sources that have this property is chosen to generate the data. This question is answered in the negative, contrasting a number of previously established positive results concerning related but smaller sets of processes.

*
Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning
*

We consider the problem of undiscounted reinforcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward andtransition function. Under the assumption that the rewards andtransition probabilities are Lipschitz, for 1-dimensional state space a regret bound of

*
Maximum Entropy Semi-Supervised Inverse Reinforcement Learning
*

A popular approach to apprenticeship learning (AL) is to formulate itas an inverse reinforcement learning (IRL) problem. The MaxEnt-IRL algorithm successfully integrates the maximum entropy principleinto IRL and unlike its predecessors, it resolves theambiguity arising from the fact that a possibly large number of policies couldmatch the expert's behavior. In this paper, we study an AL setting in which inaddition to the expert's trajectories,a number of unsupervised trajectories is available. We introduce MESSI,a novel algorithm that combines MaxEnt-IRLwith principles coming from semi-supervised learning. In particular, MESSIintegrates the unsupervised data intothe MaxEnt-IRL framework using a pairwise penalty on trajectories. Empiricalresults in a highway driving and grid-world problems indicate that MESSI is able to take advantage of the unsupervised trajectories and improve the performance ofMaxEnt-IRL.

*
Direct Policy Iteration with Demonstrations
*

We consider the problem of learning the optimal policy of an unknown Markov decision process (MDP) when expert demonstrations are available along with interaction samples. We build on classification-based policy iteration to perform a seamless integration of interaction and expert data, thus obtaining an algorithm which can benefit from both sources of information at the same time. Furthermore , we provide a full theoretical analysis of the performance across iterations providing insights on how the algorithm works. Finally, we report an empirical evaluation of the algorithm and a comparison with the state-of-the-art algorithms.

*
Approximate Modified Policy Iteration and its Application to the Game of Tetris
*

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of the well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analysis that unify those for approximate policy and value iteration. We develop the finite-sample analysis of these algorithms, which highlights the influence of their parameters. In the classification-based version of the algorithm (CBMPI), the analysis shows that MPI's main parameter controls the balance between the estimation error of the classifier and the overall value function approximation. We illustrate and evaluate the behavior of these new algorithms in the Mountain Car and Tetris problems. Remarkably, in Tetris, CBMPI outperforms the existing DP approaches by a large margin, and competes with the current state-of-the-art methods while using fewer samples.

*
Simple regret for infinitely many armed bandits
*

We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous algorithms for this setting were designed for minimizing the cumulative regret of the learner. In this paper, we propose an algorithm aiming at minimizing the simple regret. As in the cumulative regret setting of infinitely many armed bandits , the rate of the simple regret will depend on a parameter

*
Black-box optimization of noisy functions with unknown smoothness
*

We study the problem of black-box optimization of a function

We consider stochastic sequential learning problems where the learner can observe the average reward of several actions. Such a setting is interesting in many applications involving monitoring and surveillance, where the set of the actions to observe represent some (geographical) area. The importance of this setting is that in these applications , it is actually cheaper to observe average reward of a group of actions rather than the reward of a single action. We show that when the reward is smooth over a given graph representing the neighboring actions, we can maximize the cumulative reward of learning while minimizing the sensing cost. In this paper we propose CheapUCB, an algorithm that matches the regret guarantees of the known algorithms for this setting and at the same time guarantees a linear cost again over them. As a by-product of our analysis , we establish a ⌦(p dT) lower bound on the cumulative regret of spectral bandits for a class of graphs with effective dimension d.

*
Truthful Learning Mechanisms for Multi–Slot Sponsored Search Auctions with Externalities
*

Sponsored Search Auctions (SSAs) constitute one of the most successful applications of microeconomic mechanisms. In mechanism design, auctions are usually designed to incentivize advertisers to bid their truthful valuations and, at the same time, to guarantee both the advertisers and the auctioneer a non–negative utility. Nonetheless, in sponsored search auctions, the Click–Through–Rates (CTRs) of the advertisers are often unknown to the auctioneer and thus standard truthful mechanisms cannot be directly applied and must be paired with an effective learning algorithm for the estimation of the CTRs. This introduces the critical problem of designing a learning mechanism able to estimate the CTRs at the same time as implementing a truthful mechanism with a revenue loss as small as possible compared to the mechanism that can exploit the true CTRs. Previous work showed that, when dominant–strategy truthfulness is adopted, in single–slot auctions the problem can be solved using suitable exploration–exploitation mechanisms able to achieve a cumulative regret (on the auctioneer's revenue) of order

*
A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
*

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.

*
Simultaneous Optimistic Optimization on the Noiseless BBOB Testbed
*

We experiment the SOO (Simultaneous Optimistic Optimization) global optimizer on the BBOB testbed. We report results for both the unconstrained-budget setting and the expensive setting, as well as a comparison with the DiRect algorithm to which SOO is mostly related. Overall, SOO is shown to perform rather poorly in the highest dimensions while agreeably exhibiting interesting performance for the most difficult functions, which is to be attributed to its global nature and to the fact that its design was guided by the goal of obtaining theoretically provable performance. The greedy exploration-exploitation sampling strategy underlying SOO design is also shown to be a viable alternative for the expensive setting which gives rooms for further improvements in this direction.

*
Bandits and Recommender Systems
*

This paper addresses the on-line recommendation problem facing new users and new items; we assume that no information is available neither about users, nor about the items. The only source of information is a set of ratings given by users to some items. By on-line, we mean that the set of users, and the set of items, and the set of ratings is evolving along time and that at any moment, the recommendation system has to select items to recommend based on the currently available information, that is basically the sequence of past events. We also mean that each user comes with her preferences which may evolve along short and longer scales of time; so we have to continuously update their preferences. When the set of ratings is the only available source of information , the traditional approach is matrix factorization. In a decision making under uncertainty setting, actions should be selected to balance exploration with exploitation; this is best modeled as a bandit problem. Matrix factors provide a latent representation of users and items. These representations may then be used as contextual information by the bandit algorithm to select items. This last point is exactly the originality of this paper: the combination of matrix factorization and bandit algorithms to solve the on-line recommendation problem. Our work is driven by considering the recommendation problem as a feedback controlled loop. This leads to interactions between the representation learning, and the recommendation policy.

*
Collaborative Filtering as a Multi-Armed Bandit
*

Recommender Systems (RS) aim at suggesting to users one or several items in which they might have interest. Following the feedback they receive from the user, these systems have to adapt their model in order to improve future recommendations. The repetition of these steps defines the RS as a sequential process. This sequential aspect raises an exploration-exploitation dilemma, which is surprisingly rarely taken into account for RS without contextual information. In this paper we present an explore-exploit collaborative filtering RS, based on Matrix Factor-ization and Bandits algorithms. Using experiments on artificial and real datasets, we show the importance and practicability of using sequential approaches to perform recommendation. We also study the impact of the model update on both the quality and the computation time of the recommendation procedure.

*
AUC Optimisation and Collaborative Filtering
*

In recommendation systems, one is interested in the ranking of the predicted items as opposed to other losses such as the mean squared error. Although a variety of ways to evaluate rankings exist in the literature, here we focus on the Area Under the ROC Curve (AUC) as it widely used and has a strong theoretical underpinning. In practical recommendation, only items at the top of the ranked list are presented to the users. With this in mind, we propose a class of objective functions over matrix factorisations which primarily represent a smooth surrogate for the real AUC, and in a special case we show how to prioritise the top of the list. The objectives are differentiable and optimised through a carefully designed stochastic gradient-descent-based algorithm which scales linearly with the size of the data. In the special case of square loss we show how to improve computational complexity by leveraging previously computed measures. To understand theoretically the underlying matrix factorisation approaches we study both the consistency of the loss functions with respect to AUC, and generalisation using Rademacher theory. The resulting generalisation analysis gives strong motivation for the optimisation under study. Finally, we provide computation results as to the efficacy of the proposed method using synthetic and real data.

*
Collaborative Filtering with Localised Ranking
*

In recommendation systems, one is interested in the ranking of the predicted items as opposed to other losses such as the mean squared error. Although a variety of ways to evaluate rankings exist in the literature, here we focus on the Area Under the ROC Curve (AUC) as it widely used and has a strong theoretical underpinning. In practical recommendation, only items at the top of the ranked list are presented to the users. With this in mind we propose a class of objective functions which primarily represent a smooth surrogate for the real AUC, and in a special case we show how to prioritise the top of the list. This loss is differentiable and is optimised through a carefully designed stochastic gradient-descent-based algorithm which scales linearly with the size of the data. We mitigate sample bias present in the data by sampling observations according to a certain power-law based distribution. In addition, we provide computation results as to the efficacy of the proposed method using synthetic and real data.

*
Collaborative Filtering with Stacked Denoising AutoEncoders and Sparse Inputs
*

Neural networks have not been widely studied in Collaborative Filtering. For instance, no paper using neural networks was published during the Net-flix Prize apart from Salakhutdinov et al's work on Restricted Boltzmann Machine (RBM) [14]. While deep learning has tremendous success in image and speech recognition, sparse inputs received less attention and remains a challenging problem for neural networks. Nonetheless, sparse inputs are critical for collaborative filtering. In this paper, we introduce a neural network architecture which computes a non-linear matrix factorization from sparse rating inputs. We show experimentally on the movieLens and jester dataset that our method performs as well as the best collaborative filtering algorithms. We provide an implementation of the algorithm as a reusable plugin for Torch [4], a popular neural network framework.

*
The Replacement Bootstrap for Dependent Data
*

Applications that deal with time-series data often require evaluating complex statistics for which each time series is essentially one data point. When only a few time series are available, bootstrap methods are used to generate additional samples that can be used to evaluate empirically the statistic of interest. In this work a novel bootstrap method is proposed, which is shown to have some asymptotic consistency guarantees under the only assumption that the time series are stationary and ergodic. This contrasts previously available results that impose mixing or finite-memory assumptions on the data. Empirical evaluation on simulated and real data, using a practically relevant and complex extrema statistic is provided.

*
Inverse Reinforcement Learning in Relational Domains
*

In this work, we introduce the first approach to the Inverse Reinforcement Learning (IRL) problem in relational domains. IRL has been used to recover a more compact representation of the expert policy leading to better generalization performances among different contexts. On the other hand, rela-tional learning allows representing problems with a varying number of objects (potentially infinite), thus provides more generalizable representations of problems and skills. We show how these different formalisms allow one to create a new IRL algorithm for relational domains that can recover with great efficiency rewards from expert data that have strong generalization and transfer properties. We evaluate our algorithm in representative tasks and study the impact of diverse experimental conditions such as : the number of demonstrations, knowledge about the dynamics, transfer among varying dimensions of a problem, and changing dynamics.

*
Imitation Learning Applied to Embodied Conversational Agents
*

Embodied Conversational Agents (ECAs) are emerging as a key component to allow human interact with machines. Applications are numerous and ECAs can reduce the aversion to interact with a machine by providing user-friendly interfaces. Yet, ECAs are still unable to produce social signals appropriately during their interaction with humans, which tends to make the interaction less instinctive. Especially, very little attention has been paid to the use of laughter in human-avatar interactions despite the crucial role played by laughter in human-human interaction. In this paper, methods for predicting when and how to laugh during an interaction for an ECA are proposed. Different Imitation Learning (also known as Apprenticeship Learning) algorithms are used in this purpose and a regularized classification algorithm is shown to produce good behavior on real data.

Active learning is the problem of interactively constructing the training set used in classification in order to reduce its size. It would ideally successively add the instance-label pair that decreases the classification error most. However, the effect of the addition of a pair is not known in advance. It can still be estimated with the pairs already in the training set. The online minimization of the classification error involves a tradeoff between exploration and exploitation. This is a common problem in machine learning for which multiarmed bandit, using the approach of Optimism int the Face of Uncertainty, has proven very efficient these last years. This paper introduces three algorithms for the active learning problem in classification using Optimism in the Face of Uncertainty. Experiments lead on built-in problems and real world datasets demonstrate that they compare positively to state-of-the-art methods.

*
Bayesian Credible Intervals for Online and Active Learning of Classification Trees
*

Classification trees have been extensively studied for decades. In the online learning scenario, a whole class of algorithms for decision trees has been introduced, called incremental decision trees. In the case where subtrees may not be discarded, an incremental decision tree can be seen as a sequential decision process, consisting in deciding to extend the existing tree or not. This problem involves an trade-off between exploration and exploitation, which is addressed in recent work with the use of Hoeffding's bounds. This paper proposes to use Bayesian Credible Intervals instead, in order to get the most out of the knowledge of the output's distribution's shape. It also studies the case of Active Learning in such a tree following the Optimism in the Face of Uncertainty paradigm. Two novel algorithms are introduced for the online and active learning problems. Evaluations on real-world datasets show that these algorithms compare positively to state-of-the-art.

*
Optimism in Active Learning with Gaussian Processes
*

In the context of Active Learning for classification, the classification error depends on the joint distribution of samples and their labels which is initially unknown. The minimization of this error requires estimating this distribution. Online estimation of this distribution involves a trade-off between exploration and exploitation. This is a common problem in machine learning for which multi-armed bandit theory, building upon Optimism in the Face of Uncertainty, has been proven very efficient these last years. We introduce two novel algorithms that use Optimism in the Face of Uncertainty along with Gaussian Processes for the Active Learning problem. The evaluation lead on real world datasets shows that these new algorithms compare positively to state-of-the-art methods.

*
Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games
*

This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we can achieve a stationary policy which is 2

*
Non-negative Spectral Learning for Linear Sequential Systems
*

Method of moments (MoM) has recently become an appealing alternative to standard iterative approaches like Expectation Maximization (EM) to learn latent variable models. In addition, MoM-based algorithms come with global convergence guarantees in the form of finite sample bounds. However, given enough computation time, by using restarts and heuristics to avoid local optima, iterative approaches often achieve better performance. We believe that this performance gap is in part due to the fact that MoM-based algorithms can output negative probabilities. By constraining the search space, we propose a non-negative spectral algorithm (NNSpectral) avoiding computing negative probabilities by design. NNSpectral is compared to other MoM-based algorithms and EM on synthetic problems of the PAutomaC challenge. Not only, NNSpectral outperforms other MoM-based algorithms, but also, achieves very competitive results in comparison to EM.

*
Learning of scanning strategies for electronic support using predictive state representations
*

In Electronic Support, a receiver must monitor a wide frequency spectrum in which threatening emitters operate. A common approach is to use sensors with high sensitivity but a narrow band-width. To maintain surveillance over the whole spectrum, the sensor has to sweep between frequency bands but requires a scanning strategy. Search strategies are usually designed prior to the mission using an approximate knowledge of illumination patterns. This often results in open-loop policies that cannot take advantage of previous observations. As pointed out in past researches, these strategies lack of robustness to the prior. We propose a new closed loop search strategy that learns a stochastic model of each radar using predic-tive state representations. The learning algorithm benefits from the recent advances in spectral learning and rank minimization using nuclear norm penalization.

*
Spectral learning with proper probabilities for finite state automation
*

Probabilistic Finite Automaton (PFA), Probabilistic Finite State Transducers (PFST) and Hidden Markov Models (HMM) are widely used in Automatic Speech Recognition (ASR), Text-to-Speech (TTS) systems and Part Of Speech (POS) tagging for language mod-eling. Traditionally, unsupervised learning of these latent variable models is done by Expectation-Maximization (EM)-like algorithms, as the Baum-Welch algorithm. In a recent alternative line of work, learning algorithms based on spectral properties of some low order moments matrices or tensors were proposed. In comparison to EM, they are orders of magnitude faster and come with theoretical convergence guarantees. However, returned models are not ensured to compute proper distributions. They often return negative values that do not sum to one, limiting their applicability and preventing them to serve as an initialization to EM-like algorithms. In this paper, we propose a new spectral algorithm able to learn a large range of models constrained to return proper distributions. We assess its performances on synthetic problems from the PAutomaC challenge and real datasets extracted from Wikipedia. Experiments show that it outperforms previous spectral approaches as well as the Baum-Welch algorithm with random restarts, in addition to serve as an efficient initialization step to EM-like algorithms.

*
Operator-valued Kernels for Learning from Functional Response Data
*

In this paper we consider the problems of supervised classification and regression in the case where attributes and labels are functions: a data is represented by a set of functions, and the label is also a function. We focus on the use of reproducing kernel Hilbert space theory to learn from such functional data. Basic concepts and properties of kernel-based learning are extended to include the estimation of function-valued functions. In this setting, the representer theorem is restated, a set of rigorously defined infinite-dimensional operator-valued kernels that can be valuably applied when the data are functions is described, and a learning algorithm for nonlinear functional data analysis is introduced. The methodology is illustrated through speech and audio signal processing experiments.

*
An Experimental Protocol for Analyzing the Accuracy of Software Error Impact Analysis
*

In software engineering, error impact analysis consists in predicting the software elements (e.g. modules, classes, methods) potentially impacted by a change. Impact analysis is required to optimize the testing effort. In this paper we present a new protocol to analyze the accuracy of impact analysis. This protocol uses mutation testing to simulate changes that introduce errors. To this end, we introduce a variant of call graphs we name the ”use graph” of a software which may be computed efficiently. We apply this protocol to two open-source projects and correctly predict the impact of 30

*
A Learning Algorithm for Change Impact Prediction: Experimentation on 7 Java Applications
*

Change impact analysis consists in predicting the impact of a code change in a software application. In this paper, we take a learning perspective on change impact analysis and consider the problem formulated as follows. The artifacts that are considered are methods of object-oriented software; the change under study is a change in the code of the method, the impact is the test methods that fail because of the change that has been performed. We propose an algorithm, called LCIP that learns from past impacts to predict future impacts. To evaluate our system, we consider 7 Java software applications totaling 214,000+ lines of code. We simulate 17574 changes and their actual impact through code mutations, as done in mutation testing. We find that LCIP can predict the impact with a precision of 69

*
Human-Machine Dialogue as a Stochastic Game
*

In this paper, an original framework to model human-machine spoken dialogues is proposed to deal with co-adaptation between users and Spoken Dialogue Systems in non-cooperative tasks. The conversation is modeled as a Stochastic Game: both the user and the system have their own preferences but have to come up with an agreement to solve a non-cooperative task. They are jointly trained so the Dialogue Manager learns the optimal strategy against the best possible user. Results obtained by simulation show that non-trivial strategies are learned and that this framework is suitable for dialogue modeling.

Jeremie Mary got a contract with Nuukik on the use of seasonality to improve recommender systems for e-commerce. This work won the price of the “Best data analysis” at “La nuit du commerce connecté” - http://

Romain Warlop obtains a CIFRE grant with the start-up Fifty-Five and started his PhD in July under the supervision of Alessandro Lazaric, Jérémie Mary and Philippe Preux. The PhD is on the use of tensor and bandits techniques for recommender systems with a special focus on the cold start problem, and the non-stationarity of the environment.

Nicolas Carrara obtains a CIFRE grant with Orange Labs and started his PhD in October under the supervision of Olivier Pietquin. The PhD topic is on transfer learning for fast adaption of spoken dialogue systems.

*Title*: Sniper, Guerrilla, Shark, Razor et les autres

*Type*: PICTANOVO

*Coordinator*: Association P.A.S. (Emmanuelle Grangier)

*Duration*: 2015

*Abstract*:

*“Sniper, Guerrilla, Shark et les autres”* is an interactive physical setting as well as a choreographic performance for four dancers
/performers and two types of robots behaving as a swarm (some of them flying, others being on the floor). The context is high frequency trading from which emerges a world where human performers and non-humanoid robots live together. Their behaviour are depending on the same basic rules working at a non-temporal scale and a macro-temporal scale of share prices fluctuation.

*Title*: Extraction and Transfer of Knowledge in Reinforcement Learning

*Type*: National Research Agency (ANR-9011)

*Coordinator*: Inria Lille (A. Lazaric)

*Duration*: 2014-2018

*Abstract*:
ExTra-Learn is directly motivated by the evidence that one of the key features that allows humans to accomplish complicated tasks is their ability of building knowledge from past experience and transfer it while learning new tasks. We believe that integrating transfer of learning in machine learning algorithms will dramatically improve their learning performance and enable them to solve complex tasks. We identify in the reinforcement learning (RL) framework the most suitable candidate for this integration. RL formalizes the problem of learning an optimal control policy from the experience directly collected from an unknown environment. Nonetheless, practical limitations of current algorithms encouraged research to focus on how to integrate prior knowledge into the learning process. Although this improves the performance of RL algorithms, it dramatically reduces their autonomy. In this project we pursue a paradigm shift from designing RL algorithms incorporating prior knowledge, to methods able to incrementally discover, construct, and transfer “prior” knowledge in a fully automatic way. More in detail, three main elements of RL algorithms would significantly benefit from transfer of knowledge. *(i)* For every new task, RL algorithms need exploring the environment for a long time, and this corresponds to slow learning processes for large environments. Transfer learning would enable RL algorithms to dramatically reduce the exploration of each new task by exploiting its resemblance with tasks solved in the past.
*(ii)* RL algorithms evaluate the quality of a policy by computing its state-value function. Whenever the number of states is too large, approximation is needed. Since approximation may cause instability, designing suitable approximation schemes is particularly critical. While this is currently done by a domain expert, we propose to perform this step automatically by constructing features that incrementally adapt to the tasks encountered over time. This would significantly reduce human supervision and increase the accuracy and stability of RL algorithms across different tasks.
*(iii)* In order to deal with complex environments, hierarchical RL solutions have been proposed, where state representations and policies are organized over a hierarchy of subtasks. This requires a careful definition of the hierarchy, which, if not properly constructed, may lead to very poor learning performance. The ambitious goal of transfer learning is to automatically construct a hierarchy of skills, which can be effectively reused over a wide range of similar tasks.

*Activity Report*: Research in ExTra-Learn focused on how to effectively transfer knowledge from an external expert as in apprenticeship learning. This is an important step towards automatic transfer because it digs into the problem of how knowledge of an expert can be integrated into the learning process. This investigation led to the publication of two papers at IJCAI'15. In 2015 a number of activities has also started. Ronan Fruit has been recruited for a PhD started in December. The main focus of the PhD will be related to transfer in multi-armed bandit, in particular in systems which are non-stationary where the task can change multiple times. Pierre-Victor Chaumier will start a long internship on transfer in RL with focus on applications to Atari games. Romain Warlop started in July a Cifre PhD (co-supervised by A. Lazaric, J. Mary, and Ph. Preux) with focus on how to use transfer learning in recommendation systems. We expect these activities to significantly advance the research in the project within 2016.

*Acronym*: KEHATH

*Title*: Advanced Quality Methods for Post-Edition of Machine Translation

*Type*: ANR

*Coordinator*: Lingua & Machina

*Duration*: 2014-2017

*Other partners*: Univ. Lille 1, Laboratoire d'Informatique de Grenoble (LIG)

*Abstract*: The translation community has seen a major change over the last five years. Thanks to progress in the training of statistical machine translation engines on corpora of existing translations, machine translation has
become good enough so that it has become advantageous for translators to post-edit machine outputs rather
than translate from scratch. However, current enhancement of machine translation (MT) systems from
human post-edition (PE) are rather basic: the post-edited output is added to the training corpus and the
translation model and language model are re-trained, with no clear view of how much has been improved
and how much is left to be improved. Moreover, the final PE result is the only feedback used: available
technologies do not take advantages of logged sequences of post-edition actions, which inform on the
cognitive processes of the post-editor.
The KEHATH project intends to address these issues in two ways. Firstly, we will optimise advanced
machine learning techniques in the MT+PE loop. Our goal is to boost the impact of PE, that is, reach the
same performance with less PE or better performance with the same amount of PE. In other words, we want
to improve machine translation learning curves. For this purpose, active learning and reinforcement learning
techniques will be proposed and evaluated. Along with this, we will have to face challenges such as MT
systems heterogeneity (statistical and/or rule-based), and ML scalability so as to improve domain-specific
MT.
Secondly, since quality prediction (QP) on MT outputs is crucial for translation project managers, we will
implement and evaluate in real-world conditions several confidence estimation and error detection
techniques previously developed at a laboratory scale. A shared concern will be to work on continuous
domain-specific data flows to improve both MT and the performance of indicators for quality prediction.
The overall goal of the KEHATH project is straightforward: gain additional machine translation performance
as fast as possible in each and every new industrial translation project, so that post-edition time and cost is
drastically reduced. Basic research is the best way to reach this goal, for an industrial impact that is powerful
and immediate.

*Acronym*: MaRDi

*Title*: Man-Robot Dialogue

*Type*: ANR

*Coordinator*: Univ. Lille 1 (Olivier Pietquin)

*Duration*: 2012-2016

*Other partners*: Laboratoire d'Informatique d'Avignon (LIA), CNRS - LAAS (Toulouse), Acapela group (Toulouse)

*Abstract*: In the MaRDi project, we study the interaction between humans and machines as a situated problem in which human users and machines share the same environment. Especially, we investigate how the physical environment of robots interacting with humans can be used to improve the performance of spoken interaction which is known to be imperfect and sensible to noise. To achieve this objectif, we study three main problems. First, how to interactively build a multimodal representation of the current dialogue context from perception and proprioception signals. Second, how to automatically learn a strategy of interaction using methods such as reinforcement learning. Third, how to provide expressive feedbacks to users about how the machine is confident about its behaviour and to reflect its current state (also the physical state).

Inria Bordeaux - Sud-Ouest

B.Piot and O.Pietquin worked with T.Munzer and M.Lopes on Inverse Reinforcement Learning with Relational Domains. It led to a publication in IJCAI 2015 .

CentraleSupélec

B.Piot and O.Pietquin worked with M.Geist on Inverse Reinforcement Learning with Relational Domains and Dialogue Management. It led to a conference publication in IJCAI 2015 and a workshop publication in MLIS 2015 .

Inria Nancy - Grand Est

J.Perolat, B.Piot and O.Pietquin worked with Bruno Scherrer on Stochastic Games. It led to a conference publication in ICML 2015 .

CMLA - ENS Cachan.

Julien Audiffren *Collaborator*

M. Valko, A. Lazaric, and M. Ghavamzadeh work with Julien on Semi-Supervised Apprenticeship Learning. We finalized and published a max-entropy algorithm that outperforms the approach without unlabeled data.

LTCI, Institut Télécom-ParisTech, France.

Charanpal Dhanjal, Stefan Clemençon*Collaborator*

Romaric Gaudel collaborates with Charanpal and Stefan since 2010 on topics related to *Matrix Factorization*. In the past we applied our work to sequential recommendation and to sequential clustering. This year, the collaboration has led to a publication in AAAI'15 conference .

Program: CHIST-ERA

Project acronym: IGLU

Project title: Interactive Grounding of Language Generation

Duration: 10/2015 - 9/2018

Coordinator: Jean-Rouat (Univ. Sherbrooke)

Other partners: Univ. Lille, CRIStAL (France) - Inria, Flowers (France) - UMONS, Numédiart (Belgium) - KTH, TMH (Sweden) - Universidad de Zaragoza, I3A (Spain)

Abstract: Language is an ability that develops in young children through joint interaction with their caretakers and their physical environment. At this level, human language understanding could be referred as interpreting and expressing semantic concepts (e.g. objects, actions and relations) through what can be perceived (or inferred) from current context in the environment. Previous work in the field of artificial intelligence has failed to address the acquisition of such perceptually-grounded knowledge in virtual agents (avatars), mainly because of the lack of physical embodiment (ability to interact physically) and dialogue, communication skills (ability to interact verbally). We believe that robotic agents are more appropriate for this task, and that interaction is a so important aspect of human language learning and understanding that pragmatic knowledge (identifying or conveying intention) must be present to complement semantic knowledge. Through a developmental approach where knowledge grows in complexity while driven by multimodal experience and language interaction with a human, we propose an agent that will incorporate models of dialogues, human emotions and intentions as part of its decision-making process. This will lead anticipation and reaction not only based on its internal state (own goal and intention, perception of the environment), but also on the perceived state and intention of the human interactant. This will be possible through the development of advanced machine learning methods (combining developmental, deep and reinforcement learning) to handle large-scale multimodal inputs, besides leveraging state-of-the-art technological components involved in a language-based dialog system available within the consortium. Evaluations of learned skills and knowledge will be performed using an integrated architecture in a culinary use-case, and novel databases enabling research in grounded human language understanding will be released.

In the end of 2015 SequeL started an Inria Associate team with CWI, Amsterdam. This project is called “Universal algorithms for sequential forecasting and bandit problems” and is led by Daniil Ryabko from the SequeL side, and by Peter Grunwald from the CWI side.

Title: Educational Bandits

International Partner (Institution - Laboratory - Researcher):

Carnegie Mellon University (United States) - Department of Computer Science, Theory of computation lab - Emma Brunskill

Inria investigators: A. Lazaric, M. Valko

Start year: 2015

See also: https://

Education can transform an individual's capacity and the opportunities available to him. The proposed collaboration will build on and develop novel machine learning approaches towards enhancing (human) learning. Massive open online classes (MOOCs) are enabling many more people to access education, but mostly operate using status quo teaching methods. Even more important than access is the opportunity for online software to radically improve the efficiency, engagement and effectiveness of education. Existing intelligent tutoring systems (ITSs) have had some promising successes, but mostly rely on learning sciences research to construct hand-built strategies for automated teaching. Online systems make it possible to actively collect substantial amount of data about how people learn, and offer a huge opportunity to substantially accelerate progress in improving education. An essential aspect of teaching is providing the right learning experience for the student, but it is often unknown a priori exactly how this should be achieved. This challenge can often be cast as an instance of decision-making under uncertainty. In particular, prior work by Brunskill and colleagues demonstrated that reinforcement learning (RL) and multi-arm bandit (MAB) can be very effective approaches to solve the problem of automated teaching. The proposed collaboration is thus intended to explore the potential interactions of the fields of online education and RL and MAB. On the one hand, we will define novel RL and MAB settings and problems in online education. On the other hand, we will investigate how solutions developed in RL and MAB could be integrated in ITS and MOOCs and improve their effectiveness.

Montanuniverstat Leoben (MUL), Austria, is an international partner of SequeL. The work in 2015 has been mostly on representation learning in reinforcement learning. The partnership involves Ronald Ortner and Peter Auer on the MUL side.

University of California Irvine (USA)

Anima Anandkumar *Collaborator*

A. Lazaric collaborates with A. Anandkumar on the use of spectral methods for reinforcement learning.

Politecnico di Milano (Italy)

Nicola Gatti *Collaborator*

A. Lazaric finalized a work with N. Gatti on the application of MAB on sponsored search auctions and mechanism design.

Universität Potsdam (Germany)

Alexandra Carpentier *Collaborator*

M. Valko collaborates with A. Carpentier on scaling bandits to large dimensions and structures.

Adobe Research, California

Branislav Kveton *Collaborator*

M. Valko and B. Kveton collaboration for sequential learning at recommendation for the entertainment content that features diversity.

Boston University, USA

Venkatesh Saligrama *Collaborator*

M. Valko, R. Munos collaborated with V. Saligrama and M. Hanawal, on cost-effective spectral sensing, useful in radars.

Ryabko Daniil

Date: Jan 2014 - Jan 2015

Institution: CMM (Chile)

Participation to the organization of the 32nd International Conference on Machine Learning:

Philippe Preux, local chair organization of ICML

Jérémie Mary, conference webmaster

Romaric Gaudel, local volunteer chair

Co-organization of ICML workshops:

12th European Workshop on Reinforcement Learning (EWRL)

4th Workshop on Machine Learning for Interactive Systems

Jeremie Mary was Co-organizer of the workshop “Offline and Online Evaluation of Web-based Services” at WWW'15 with Lihong Li (MSR) as main organizer.

International Joint Conference on Artificial Intelligence (IJCAI 2015)

International Conference on Pattern Recognition Applications and Methods (ICPRAM 2015)

Approximate Dynamic Programing and Reinforcement Learning (ADPRL 2015)

International Conference on Machine Learning (ICML 2015)

Annual Conference on Neural Information Processing Systems (NIPS 2015)

French-speaking conferences:

French Conference on Planning, Decision-making, and Learning in Control Systems (JFPDA 2015)

Extraction et Gestion des Connaissances (EGC 2015)

XXIIè rencontres de la société francophone de classification

International Conference on Pattern Recognition Applications and Methods (ICPRAM 2015)

Algorithmic Learning Theory (ALT 2015)

AAAI Conference on Artificial Intelligence (AAAI 2015)

Conference on Learning Theory (COLT 2015)

European Workshop on Reinforcement Learning (EWRL 2015)

Annual Conference on Neural Information Processing Systems (NIPS 2015)

International Conference on Artificial Intelligence and Statistics (AISTATS 2015)

European Conference on Machine Learning (ECML 2015)

International Conference on Machine Learning (ICML 2015)

International Joint Conferences on Artificial Intelligence (IJCAI 2015)

Reinforcement Learning and Decision Making (RLDM 2015)

International Conference on Uncertainty in Artificial Intelligence (UAI 2015)

International Conference on Autonmous Agents and Multiagent Systems (AAMAS 2015)

International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015)

French-speaking conferences:

French Conference on Planning, Decision-making, and Learning in Control Systems (JFPDA 2015)

Conférence francophone sur l'Apprentissage Automatique (CAP 2015)

Neurocomputing

IEEE Signal Processing Letters

IEEE Transactions on Information Theory

IEEE Transactions on Neural Networks and Learning Systems

Scandinavian Journal of Statistics

Speech Communication

Journal of Machine Learning Research

Artificial Intelligence Journal

Machine Learning Journal

Journal of Artificial Intelligence Research

Gergely Neu, invited talk at the “Data, Learning, and Inference” (DALI) workshop on Learning Theory, Spain, April 2015.

Gergely Neu, invited talk at the “Learning Faster from Easy Data” NIPS workshop, Montreal, December 2015.

Olivier Pietquin, NIPS Workshop on Spoken Language Understanding (SLU NIPS 2015).

Michal Valko, invited talk at “LIX, École Polytechnique” April 2015

A. Lazaric, *Open Questions in Transfer in RL*, “Machine Learning with Interdependent and Non-identically Distributed Data”, Dagsthul, Germany, April 2015.

A. Lazaric, *Exploiting Easy Data in Online Optimization*, “Modal Seminar Series”, Inria Lille, May 2015.

A. Lazaric, *Policy Search in Reinforcement Learning*, Criteo, Paris, June 2015.

A. Lazaric, *Transfer in Multi-Armed Bandit*, Aston University, Birmingham, July 2015.

A. Lazaric, *Transfer in Reinforcement Learning*, “Promotion et Developpement de l'Intelligence Artificielle”, Paris, October 2015.

A. Lazaric, *The Hidden World of Bandits*, “Workshop on Sequential Learning and Applications”, Toulouse, November 2015.

J. Mary, Invited talk at Euratechnologies on *recent advances of machine learning and deep learning for Sequential data*, Lille, November 2015.

J. Mary, Invited talk at Recommender Days organized by CRITEO http://

Agence Nationale pour la Recherche (ANR)

Fonds National pour la Recherche Scientifique (FNRS), Belgium

Olivier Pietquin and Philippe Preux are expert for H2020 European Program

*M. Valko* is an elected member of the evaluation committee and participates in the hiring, promotion, and evaluation juries of Inria, notably

Hiring committee for junior researchers at Inria Nancy (2015)

Selection committee for Inria award for scientific excellence (2015)

Selection committee for CR promotions (2015)

Jérémie Mary is expert for the Research Council of Norway.

*A. Lazaric* is a member of the committee for research evaluation (CER) at Inria Lille.

*A. Lazaric* was a member of the hiring committee for junior researchers at Inria Lille (2015).

Philippe Preux is:

head of the DatInG (Data Intelligence Group) thematic group at CRIStAL that gathers 4 research groups, totaling more than 70 people,

member of the scientific committee of CRIStAL,

member of the Bureau du Comité des Projets at Inria Lille.

Romaric Gaudel is:

board member of CRIStAL

manager of proml mailing list. This mailing list gathers French-speaking researchers from Machine Learning community.

Olivier Pietquin is:

board member of CRIStAL

board member of the IEEA faculty at Univ. Lille 1

member of the computer science department of the Ecole Doctorale SPI

in charge of research and innovation for the computer science department of Univ. Lille 1

Licence: R. Gaudel, 2015/2016 Spring: programmation R pour statistiques et sociologie quantitative, 28h eqTD, L1, université Lille 3, France

Licence: R. Gaudel, 2015/2016 Fall: préparation au C2i niveau 1, 24h eqTD, L1-3, université Lille 3, France

Licence: R. Gaudel, 2015/2016 Fall: travail collaboratif et à distance dans un monde numérique, 13h eqTD, L1-3 (enseignement à distance), université Lille 3, France

Master: M. Valko, 2014/2015 Spring: Graphs in Machine Learning, 27h eqTD, M2, ENS Cachan

Master: M. Valko, 2015/2016 Fall: Graphs in Machine Learning, 27h eqTD, M2, ENS Cachan

Master : A. Lazaric, Reinforcement Learning, 25h eqTD, M2, ENS Cachan, France

Master : A. Lazaric, Reinforcement Learning, 25h eqTD, M2, Ecole Centrale Lille, France

Summer school : A. Lazaric, Reinforcement Learning, 8h eqTD, Toulouse, France

Master: J. Mary, 2015/2016 Fall: Machine learning with R, 20h eqTD, M2, Ecole Centrale de Lille.

**E-learning**

SPOC: R. Gaudel, Marc Tommasi and Alain Preux, culture numérique S3, 8 semaines, Moodle, université Lille 3, licence (L1), formation initiale, tous les étudiants (> 3 000).

Ph. Preux:

modeling and simulation of the dynamics of behavior, Master 1 in Psychology & master in Cognitive Science, Université de Lille 3

Formal neural networks, Master 1 in Cognitive Science, Université de Lille 3

Supervised Learning, Licence 3 MIASHS, Université de Lille 3

Advanced Data Mining, master 2 MIASHS, Université de Lille 3

Unsupervised learning, master 1 MIASHS, Université de Lille 3

C. Dimitrakakis:

Web Fundamentals, Licence MIASHS, Université de Lille 3

Supervised Learning, Master MIASHS, Université de Lille 3

B. Piot:

Networks, Master SID, Université de Lille 3

Databases, Licence MIASHS, Université de Lille 3

Excel, Licence MIASHS, Université de Lille 3

Databases, Master SIAD, Université de Lille 1

Networks, Master SIAD, Université de Lille 1

UML, Université de Lille 1

O. Pietquin:

Machine learning, Master Informatique, Université Lille 1

Machine learning and decision making, Master Informatique, Université Lille 1

Bayesian signal processing, Engineering degree, Université de Mons (Belgique)

Supervision of PhD:

HDR: Jérémie Mary, Université de Lille 3, defended Nov 2015

PhD: Amir Sani, Université de Lille 1, defended May 2015, Munos, Lazaric

PhD: Marta Soare, Université de Lille 1, defended Dec 2015, Munos, Lazaric

PhD in progress: Marc Abeille, since Sept. 2014, Munos, Lazaric

PhD in progress: Merwan Barlier, since oct. 2014, Pietquin

PhD in progress: Alexandre Bérard, since Oct. 2014, Pietquin

PhD in progress: Daniele Calandriello, since Oct. 2014, Preux, Lazaric, Valko

PnD in progress: Nicolas Carrara, since Oct. 2015, Pietquin

PhD in progress: Ronan Fruit, since Dec. 2015, Ryabko, Lazaric

PhD in progress: Pratik Gajane, since oct. 2014, Preux

PhD in progress: Hadrien Glaude, since Feb. 2014, Pietquin

PhD in progress: Jean-Bastien Grill, since Oct. 2014, Munos, Valko

PhD in progress: Frédéric Guillou, since Oct. 2013, Preux, Mary, Gaudel

PhD in progress: Tomáš Kocák, since Oct. 2013, Munos, Valko

PhD in progress: Vincenzo Musco, since Nov. 2013, Preux, Monperrus

PhD in progress: Julien Perolat, since Oct. 2014, Pietquin

PhD in progress: Florian Strub, since Jan. 2016, Pietquin, Mary

PhD in progress: Romain Warlop, since Sep. 2015, Preux, Mary, Lazaric

Management of diplomas:

Ph. Preux is the head of the master in computer science “machine learning and data science”, Université de Lille 3.

J. Mary is the head of the “Web analyst” track in master MIASHS, Université de Lille 3.

head of the MoCAD master at Université Lille 1.

Ph. Preux has been member of the PhD juries:

Manel Tagorti, Université de Lorraine,

Yacine Nair Benrekia, Université de Nantes,

El Mehdi Rochd, Université de Marseille.

Ph. Preux has been member of the HdR juries:

Jérémie Mary, Université de Lille 3.

A. Lazaric has been member of the PhD juries:

Rodrigue Talla Kuate, Aston University, Birmingham, UK.

Kamyar Azzizade (PhD qualification), University of California Irvine, USA.

O. Pietquin has been member of the PhD juries:

Nicolas Galichet, Université Paris-Saclay,

Alaedine Mihoub, Université Grenoble-Alpes,

Emmanuel Ferreira, Université d'Avignon et des Pays du Vaucluse.

Ph. Preux participates to a radio program on machine learning.

Ph. Preux co-authors two papers on “Le Monde” binaire blog .

Inria interview with N. Vayatis and M. Valko about teaching machine learning at ENS, July 2015.

Rue89 interviewed M. Valko about machine learning at Inria, June 2015.

Intel advertising face recognition software (that included the work of M. Valko), February 2015.

M. Valko volunteered in teaching mathematics in “Association de la Clé”, that helps students from underprivileged backgrounds.

Jérémie Mary was interviewed by "Ca m'intéresse" for a special issue on artificial intelligence.