SequeLis a joint project with the LIFL(UMR 8022 of CNRS and University of Lille 1 and University of Lille 3) and the LAGIS(UMR 8021 of the École Centrale of Lille 1 and the University of Lille 1).

SequeLmeans ``Sequential Learning''. As such,
SequeLfocuses on the task of learning in artificial systems (either hardware, or software) that gathers information along time. Such systems are named
*(learning) agents*in the following

For the purpose of model building, the agent needs to gather information collected so far in some compact representation and combine it to newly available data.

The acquired data may result from an observation process of an agent in interaction with its environment (the data thus represent a perception). This is the case when the agent makes decisions (in order to fulfill a certain goal) that impact the environment thus the observation process itself.

Hence, in
SequeL, the term
**sequential**refers to two aspects:

The
**sequential acquisition of data**, from which a model is learned (supervised and non supervised learning),

the
**sequential decision making task**, based on the learned model (reinforcement learning).

We exemplify these various problems:

tasks deal with the prediction of some response given a certain set of observations of input variables and responses. New sample points keep on being observed.

tasks deal with clustering objects, these latter making a flow of objects. The (unknown) number of clusters typically evolves during time, as new objects are observed.

tasks deal with the control (a policy) of some system which has to be optimized (see ). We do not assume the availability of a model of the system to be controlled.

In all these cases, we assume that the process can be considered stationary for at least a certain amount of time, and slowly evolving.

We wish to have any-time algorithms, that is, at any moment, a prediction may be required/an action may be selected making full use, and hopefully, the best use, of the experience already gathered by the learning agent.

The perception of the environment by the learning agent (using its sensors) is generally not the best one to make a prediction, nor to take a decision (we deal with Partially Observable Markov Decision Problem). So, the perception has to be mapped in some way to a better, and relevant, state (or input) space.

Finally, an important issue of prediction regards its evaluation: how wrong may we be when we perform a prediction? For real systems to be controlled, this issue can not be simply left unanswered.

To sum-up, in SequeL, the main issues regard:

the learning of a model: we focus on models than map some input space to ,

the observation to state mapping,

the choice of the action to perform (in the case of sequential decision problem),

the bounding of the performance,

the implementation of usable algorithms,

all that being understood in a
*sequential*framework.

Machine learning refers to a system capable of the autonomous acquisition and integration of knowledge. This capacity to learn from experience, analytical observation, and other means, results in a system that can continuously self-improve and thereby offer increased efficiency and effectiveness. (source: http://www.aaai.org/AITopics/html/machine.html)

An approach to machine intelligence which is based on statistical modeling of data. With a statistical model in hand, one applies probability theory and decision theory to get an algorithm. This is opposed to using training data merely to select among different algorithms or using heuristics/"common sense" to design an algorithm. (source: http://www.cs.wisc.edu/~hzhang/glossary.html)

Generally speaking, a kernel function is a function that maps a couple of points to a real value. Typically, this value is a measure of dissimilarity between the two points. Assuming a few properties on it, the kernel function implicitly defines a dot product in some function space. This very nice formal property as well as a bunch of others have ensured a strong appeal for these methods in the last 10 years in the field of function approximation. Many classical algorithms have been ``kernelized'', that is, restated in a much more general way than their original formulation. Kernels also implicitly induce the representation of data in a certain ``suitable'' space where the problem to solve (classification, regression, ...) is expected to be simpler (non-linearity turns to linearity).

The fundamental tools used in SequeLcome from the field of statistical learning . We briefly present the most important for us to date, namely, kernel-based non parametric function approximation, sequential Monte-Carlo methods, and non parametric Bayesian models.

In SequeL, the model to be learned is a real-valued function defined in a multi-dimension space.

Many methods have been proposed for this purpose. We are looking for suitable ones to cope with the problems we wish to solve. In reinforcement learning, the value function may have areas where the gradient is large; these are areas where the approximation is difficult, while these are also the areas where the accuracy of the approximation should be maximal to obtain a good policy (and where, otherwise, a bad choice of action may imply catastrophic consequences).

For the moment, we consider non parametric methods since they do not make any assumptions about the function to learn. Locally weighted regression have yielded efficient methods to learn a policy in reinforcement learning, as well as good performance in regression settings. The kernelized version gives us a wide ability to handle sample points and combine them to obtain the approximation. To keep computation times of practical interest, a sparse representation is sought.

We currently devote a lot of efforts to LARS-like approximators , that we have fitted into the reinforcement learning framework , , .

Sequential Monte-Carlo (or particle filtering, see ) methods are currently used for various purposes in SequeL:

the estimation of the state of the agent given its current observation as well as its history;

the estimation of parameters of a model.

Numerous problems in signal processing may be solved efficiently by way of a Bayesian approach. The use of Monte-Carlo methods let us handle non linear, as well as non Gaussian problems. In their standard form, they require the formulation of densities of probability in their parametric form. For instance, it is a common usage to use Gaussian likelihood, because it is handy.

However, in some applications such as Bayesian filtering, or blind deconvolution, the choice of a parametric form of the density of the noise is often arbitrary. If this choice is wrong, it may also have dramatic consequences on the estimation.

To overcome this shortcoming, non parametric methods provide an other approach to this problem. In particular, mixtures of Dirichlet processes provide a very powerful formalism.

Mixtures of Dirichlet Processes are an extension of finite mixture models. Given a mixture density
f(
x|
), and
, a Dirichlet process
U_{k}are distributed along a
*base distribution*
, and where weights follow a certain
*stick breaking*law with parameter
.

A mixture of Dirichlet processes is fully parameterized by the mixture density, as well as the parameters of , that is and .

The class of densities that may be written as a mixture of Dirichlet processes is very wide, so that these are really fit to very large amount of applications.

Given a set of observations, the estimation of the parameters of a mixture of Dirichlet processes is performed by way of a
*Monte Carlo Markov Chain (MCMC)*algorithm.

SequeLaims at solving problems of prediction, as well as problems of optimal control. As such, the application domains are very numerous. Furthermore, we consider that tackling real applications is necessary for us as a feedback for our fundamental researches, as well as a guide towards technological locks, and their resolution. The applications are being studied with industrial partners (such as function prediction), as well as with academic partners (such as in the control of depollution systems).

The research concerns the control of waste-water treatment by an anaerobic digestion process. Organic waste-water is treated in a biological reactor by means of an anaerobic digestion process, which produces bio-gas (methane) that can be used to power electricity generators. The digestion of the organic material is done by an appropriate mixing of (mainly) two different species of bacteria, each doing a different job at a different step in the digestion process. Maintaining suitable conditions in the digester is essential for the viability of the bacterial population. The problem of designing a good controller is crucial. A mandatory condition is to stabilize the system and maintain the bacterial populations alive. Under these constraints, an additional objective is to optimize the production of bio-gas.

From the scientific standpoint, this problem is formalized as a partially observable Markov decision problem (POMDP), where the state dynamics is not perfectly known: there are several available models for the dynamics, but most of the parameters of these models are unknown. Besides, the observation process is very poor. We aim at designing an adaptive policy that is both secure and which optimizes the methane production.

This work is being held in collaboration with the Naskeo Environment spin-off (Paris), the INRA laboratory in Narbonne (France), and the COMORE team with INRIA-Sophia-Antipolis. The work is currently only beginning.

This work concerns the automatic transcription of music, as well as the detection of sonor effects in an urban environment.

To estimate audio signals, we have used sequential sinusoidal plus noise models. A particular filtering algorithm has been developed to estimate the parameters. An emphasis has been put on the reduction of the computation times (see ).

We have worked on the speaker segmentation problem, on speech signals. We have used a kernel method originally developed by F. Desobry . We obtained almost state-of-the-art results with this method (see , ). Further experimental assessments are on their way.

In collaboration with France Telecom, we have also worked on the automatic transcription of speech by using kernel methods (Stéphane Rossignol's post-doc fellowship).

Basically speaking, machine learning aims at predicting a function (rather than a single value). A function formalizes the behavior of a ``system'', for instance, the behavior of customers in a shop (either a material shop, or a virtual shop on the web).

Auchan is a major international group which operates more that 150 Hypermarkets worldwide. One of its crucial issues is to be able to predict many variables, such as the number of customers reaching the cashiers at a given time of the day, the number of breads to cook, etc. For each day, this is a functional prediction problem, which may be seen as non-stationary, because the customer habits evolve.

SequeLmembers and Auchan are currently investigating this problem by extensively using past events. State-of-the art inference methods have been implemented with excellent performance (not described here for confidentiality reasons). To date, we have improved the prediction accuracy of about 25% percent. Due to these very promising results, a new research contract is currently under discussion between Auchan and SequeL, to extend the use of sequential learning approaches in commercial industries.

We have worked in collaboration with Prof Carl Haas of the University of Waterloo (Canada) on a problem appearing in civil engineering: how can we automatically localize the building materials on a construction site. This is a real problem because a lot of time (hence of money) is lost to look for these materials that have often been moved away. The proposed solution is to equipped each piece with a RFID and each people working on the construction site with a RFID receiver, a GPS for the localization and a transmitter. We then learn sequentially the position of the pieces using the incoming detection information send automatically by the transmitter to a central processor when the workforces walk near these pieces and detect them. RFID systems and localization systems as GPS allow to treat such a problem in the more general context of randomly distributed communication nodes localization. When the nodes are moving the problem is still more complicated. Our work shows how the Transferable Belief Model can be used to learn the position of the communication nodes and to detect potential movements. This study also shows how to deal with the computation.

This work has also been applied for land vehicle localization. The vehicle is equipped with three sensors, including a GPS sensor.

This work has been carried out in collaboration with François Caron. See , .

For the moment, we do not have developed yet a software from which general users could benefit. We plan to start the development of a software platform in the coming year. In its first stage, the goal would be to integrate various programs we have designed separately.

The interested reader should check out our website for any new information regarding software development http://www.grappa.univ-lille3.fr/SequeL.

We have worked on several aspects of reinforcement learning and optimal control, including the use of function approximation to represent the value function or the policy. We have worked in collaboration mainly with Csaba Szepesvári (University of Alberta, Canada), András Antos (Hungarian Academy of Sciences), Jean-Yves Audibert (CERTIS, Ecole des Ponts et Chaussées), Guillaume Deffuant and Sophie Martin (Cemagref, Clermont-Ferrand), Hasnaa Zidani (ENSTA), Olivier Bokanowski (Paris VII), Olivier Teytaud (LRI, Orsay). This work can be summarized as follows:

**Establishing links between statistical learning and reinforcement learning**. Performance bounds on the policies deduced by approximate dynamic programming methods (such as
approximate value iteration, approximate policy iteration) when using sampling devices are established in terms of the capacity (using VC dimension, covering numbers) of the function
space considered in the approximations. See
,
,
.

**Analysis of dynamic programming using
L_{p}-norms**. This work extends usual analysis in
-norm to

**Adaptive variance reduction method for Monte-Carlo estimation in Markov chains**. This has been applied to fast estimation of value functions in Markov decision processes and their
gradient (with respect to control parameters). See
.

**Policy gradient estimation in continuous time**. This method allows to search directly for a locally optimal controller in a class of parameterized policies, in the case of
continuous-time state-dynamics. See
. An application of this method to a
control problem in finance in described in
.

**Analysis of the exploration-exploitation tradeoff using variance estimate**. We investigate the multi-armed bandit framework using new deviation inequalities that takes into account
the variance estimate. This results in a great sharpening of the regret bounds. See
.

**Numerical approximation of viability problems**. We use several ideas from the ultra-bee schemes used for transport equation with discountinuous solutions and the dynamic programming
approach combined with function approximation to approximate the viability kernel of a viability problem. See
,
.

As well as in many machine learning situations, a key component of a reinforcement learning algorithm lies in the approximation of a real function, generally in a high dimension space. Even though various approaches have been studied for the last 15 years in the reinforcement learning framework, a scalable, workable, ... solution is still awaited. We have considered the use of non parametric function approximator, and in particular, the LARS algorithm introduced in statistics in and later kernelized . We have adapted the algorithm to the RL problem setting, and actually completely rethought and re-interpreted it, to approximate the value function, providing the so-called ``equi-gradient descent'' algorithm introduced in , . This work is currently going on . We have recently proposed a unified view of many algorithms (TD, residual TD, iLSTD, LSTD, LSPE and our equi-gradient TD algorithm), showing that they all fit into a single, and simple, formalism . Experiments demonstrate the interest of this new approach.

The research on the game of Go conducted in 2006 followed the Master thesis of Guillaume Chaslot. Guillaume introduced us to Monte-Carlo techniques applied to move selection. When he left after defending his master in 2005, a new Go-playing program (Crazy Stone) was developped by Rémi Coulom. This led to the design of original and efficient methods for combining Monte-Carlo evaluation with tree search. Crazy Stone won the 9x9 Go tournament at the 11th Computer Olympiad, and a paper describing its algorithm was presented in the `` International Conference on Computer and Games'' , .

We have also investigated the exploration vs. exploitation trade-off more theoretically by way of multi-arm bandit models. This has already been discussed in the theoretical foundations section. See .

With E. Jackson (PhD student) and A. Doucet (Professor at the U. of British Columbia), we have investigated Bayesian functional clustering. Given observations obtained by sampling different functions at random locations (and different from one function to an other), we have developed an algorithm that clusters these functions into coherent groups. For instance, each observation may be a signal (in one or more dimensions), such that the sampling instants are different from one signal to an other. We have then modeled the underlying signals by using Gaussian processes and the clustering itself is performed by using a Dirichlet mixture process. This study is currently under submission . The target applications concerns the clustering of expression data of messenger-RNA, as well as sampling data originating from geostatistics.

We consider the problem of sequentially estimating the state
x_{t}of a system. In the Bayesian framework, one writes down a transition equation
and an observation equation
. This may be written as:

where
v_{t}and
w_{t}are noise of given distributions. However, it may be very difficult to forethink a parametric representation for
p(
v_{t})and
p(
w_{t}). We have considered the case where the two densities are given by a mixture of Dirichlet processes. This let us estimate them in the same time as we estimate the state
x_{t}. With regards to the state of the art, our contribution concerns the modeling, as well as the Monte-Carlo approach for estimation that as been used
.

Certain electrical devices include a set of electric cables which, under certain circumstances, may give rise to electrical arcs (a typical example is the command circuitry of an airplane). This is basically a problem of detecting ruptures in a multichannel environment.

We have designed a Bayesian algorithm to detect these ruptures which models the signals on each cable individually, by an autoregressive model and that assumes a correlation between the instants of rupture in the different cables. See , .

Our main contribution takes place in the frame of the PhD work of François Caron who has defended his PhD on the 10th of November. In this work we have been interested in the addition of uncertainty in hidden Markov models. In this work a multisensor system is considered where each sensor may switch between several states of work. A new jump model has been developped based on an original modeling of the transition probabilities of the sensors reliability probabilities using Dirichlet Processes. An algorithm based on a particle filter with efficient importance law has also been developped. This algorithm has been applied with sucess to the positionning of a land vehicle equipped with three sensors, one of the sensors being a GPS. It is shown that the proposed algorithm allows the detection and the rejection of GPS data corrupted with multipathes, improving therefore the estimation process.

In such systems and application, it is generally assumed that the state noise and the measurement noise is gaussian and stationary so as to be able to use Kalman filtering. In real applications this isn't often the case. A possible solution developped in this work consists in estimating the density probability function (pdf) using Dirichlet processes mixture modelling. First, the case of linear models has been adressed and MCMC and particle filter algorithms have been developped. Then, the case of the estimation of pdf in non linear models has been adressed. For that purpose, time-varying Dirichlet processes have been defined for the on line estimation of time-varying pdf.

This work has been carried out in collaboration with Manuel Davy. It has been published in , , , , , .

As an INRIA team, SequeLhas not yet signed any contract by way of the INRIA. However, various works in 2006 have been done under contract, such as France Telecom, Auchan, and Neutrik Test Instruments (NTI, Liechtenstein).

As an INRIA team, we have had serious contacts with a few industrial partners. These contacts are currently under development. To date, these contacts concern EADS Space-Transportation, ATOS, Naskeo-Environment, and Auchan.

This project is headed by Prof. S. Canu with the INSA-Rouen. It deals with the study of kernel methods for signal processing. Thomas Bréhard has been recruited on a post-doc fellowship on this contract.

This ANR deals with adaptive Monte Carlo Methods, in collaboration with the teams from the École des Ponts (B. Lapeyre, B. Jourdain), the Université de Dauphine (C. Robert, A. Guillin), the École Nationale Supérieure des Télécommunications (E. Moulines, O. Cappé) and the Centre de Mathématiques Appliquées (R. Douc). This is a 3 years project 2006-2008.

This is the last year of this action (2004-2006). The project, named DYNAPP for ``Dynamics of Learning'', aims at modeling the learning of behavior of an animal, at a functional level, using concepts from system dynamics. We use reinforcement learning as a model and as a way to simulate this learning. This work is being done with a team of experimental psychologists (U. Lille 3).

Rémi Munos is co-supervisor of Yuval Tassa's Phd at the ICNC, Hebrew University of Jerusalem, Israel.

Along with Csaba Szepesvári (U. Alberta), we have organized a workshop on Kernel Machines and Reinforcement Leaning at the International Conference on machine Learning, in Pittsburgh, in June 2006 ( http://www.grappa.univ-lille3.fr/ ppreux/krl/). The workshop has gathered 30 people during the day.

Manuel Davy is associate editor for the IEEE Transactions on Signal Processing review. He has also reviewed papers for IEEE Trans. on Signal Processing, IEEE Signal Processing Letter, Speech communications, Signal Processing, IEEE Trans. on Circuits and Systems I.

Along with A. Klapuri (Tampere University of Technology, Finland), Manuel Davy has coordinated a book on ``signal processing methods for music transcription''

Rémi Munos is member of Journal of Machine Learning Research, Artificial Intelligence Journal, Revue d'Intelligence Artificielle. He has been a member of the PC of the 2006 conferences Neural Information Processing Systems, International Conference on Machine Learning, Conférence Francophone sur l'Apprentissage Automatique.

Along with Csaba Szepesvári (U. Alberta), Philippe Preux, Rémi Munos and Manuel Davy co-organized the
**workshop on Kernel Machines and Reinforcement Leaning**, International Conference on Machine Learning, 2006. See:
http://www.icml2006.org/icml2006/workshops.html

Rémi Munos is Co-chair of the
**IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning**, 2007. See :
http://liu.ece.uic.edu/ dliu/ADPRL07/

Philippe Preux is a member of the program committee of the
**IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning**, 2007.

Rémi Coulom has been a reviewer for the
**Computer & Games**2006 conference.

Philippe Preux, Rémi Coulom, together with Samuel Delepoulle, have prepared a special issue of the ``Revue d'Intelligence Artificielle'' on Markov Decision Processes (to appear in 2007).

Philippe Preux has served as workshop co-chairman of the EGC 2006 (french-speaking yearly conference on data mining).

We list the courses that are related to the research activities in SequeLthat happened in 2006.

Rémi Munos teaches a class in reinforcement learning in the M2 ``Mathematics-Vision-Learning'' (MVA) at the ENS-Cachan; he also teaches a cognitive science class in M1 at the EHESS (Paris).

Philippe Preux teaches in the M2 of computer science at the University of Lille a class in reinforcement learning.

Otherwise, each of the 5 professors and assistant professors of the SequeLteam teaches 192 hours per year, mostly at master level. Taught classes include machine learning, data mining, and signal processing classes.