MAIA is a common project to INRIA, CNRS, INPL, Henri Poincaré University and Nancy 2 University through the LORIA laboratory (UMR 7503). For more details, we invite the reader to
consult the team web site at
http://

MAIA
*artificial intelligence*: our goal is to model, design and simulate computer based entities (agents) that are able to sense their environment, interpret it, and act on it with autonomy.
We mainly work on two research themes: 1) stochastic models and 2) self-organization.

Jörg Hoffmann succeed the INRIA DR2 competitive selection and joined the MAIA team on november 2009. He obtained a PhD from the University of Freiburg, Germany, in 2002, with a thesis that
won the ECCAI Dissertation Award (the yearly award for the best European PhD thesis in AI). He subsequently worked in postdoctoral positions at Max Planck Institute for Computer Science,
Saarbrücken, Germany; at Cornell University, Ithaca, USA; at the University of Innsbruck, Austria; and at SAP Research, Karlsruhe, Germany. Jörg obtained his Habilitation from the University of
Innsbruck in April 2009, and joined INRIA as a Directeur de Recherche in October 2009. His research is centered around the design and analysis of algorithms for addressing hard search problems
in AI and related areas, notably Automatic Planning, Model Checking, and Semantic Technologies. Jörg has published more than 80 papers in international journals, conferences, and workshops,
receiving best paper awards from ICAPS (the International Conference on Automated Planning and Scheduling) in 2004 and 2007, and from JAIR (the Journal of Artificial Intelligence Research) in
2005. He is a regular PC and SPC member of international AI conferences such as AAAI, ECAI, and IJCAI. He serves on the Editorial Board of JAIR, and is a Conference Chair of ICAPS 2010. Further
information is available on his homepage at
http://

MAIA research covers two research themes: 1) stochastic models and 2) self-organization. This section presents the scientific foundations of these themes.

We develop algorithms for stochastic models applied to machine learning and decision. On the one hand, we consider standard stochastic models (Markov chains, Hidden Markov Models, Bayesian
networks) and study the computational problems that arise, such as inference of hidden variables and parameter learning. On the other hand, we consider the parameterized version of these
models (the parameter can be seen as a control/decision of an agent); in these models (Markov decision processes, partially observable Markov decision processes, decentralized Markov decision
processes, stochastic games), we consider the problem of a) planning and b) reinforcement learning (estimating the parameters
*and*planning) for one agent and for many agents. For all these problems, our aim is to develop algorithmic solutions that are efficient, and apply them to complex problems.

In the following, we concentrate our presentation on parameterized stochastic models, known as (partially observable) Markov decision processes, as they trivially generalize the non-parameterized models (Markov chain, Hidden Markov Models). We also outline how these models can be extended to multi-agent settings.

An agent is anything that can be viewed as sensing its environment through sensors and acting upon that environment through actuators. This view makes Markov decision processes (
**MDPs**) a good candidate for formulating agents. It is probably why MDPs have received considerable attention in recent years by the artificial intelligence (AI) community. They have
been adopted as a general framework for planning under uncertainty and reinforcement learning.

Formally, a Markov decision process is a four-tuple , where:

Sis the state space,

Ais the action space,

Pis the state-transition probability function that models the dynamics of the system.
P(
s,
a,
s^{'})is the probability of transitioning from
sto
s^{'}given that action
ais chosen.

ris the reward function.
r(
s,
a,
s^{'})stands for the reward obtained from taking action
ain state
s, and transitioning to state
s^{'}.

With this framework, we can model the interaction between an agent and an environment. The environment can be considered as a Markov decision process which is controlled by an agent. When,
in a given state
s, an action
ais chosen by the agent, the probability for the system to get to state
s^{'}is given by
P(
s,
a,
s^{'}). After each transition, the environment generates a numerical reward
r(
s,
a,
s^{'}). The behaviour of the agent can be represented by a mapping
:
SAbetween states and actions. Such a mapping is called a policy.

In such a framework, we consider the following problems:

Given the explicit knowledge of the problem (that is
Pand
r), find an optimal behaviour,
*i.e.*, the policy
which maximizes a given performance criteria for the agent. There are three popular performance criteria to evaluate a policy:

expected reward to target,

discounted cumulative reward,

the average expected reward per stage.

Given the ability to interact with the environment (that is, samples of
Pand
robtained by simulation or real-world interaction), find an optimal behaviour. This amounts to learning what to do in each state of the environment by a trial and error process and
such a problem is usually called
*reinforcement learning*. It is, as stated by Sutton and Barto
, an approach for understanding and automating goal-directed
learning and decision-making that is quite different from supervised learning. Indeed, it is in most cases impossible to get examples of good behaviors for all situations in which an
agent has to act. A trade-off between exploration and exploitation is one of the major issues to address.

Furthermore, a general problem, which is useful for the two previous problems, consists in finding good representations of the environment so that an agent can achieve the above objectives.

In a more general setting, an agent may not perceive the state in which he stands. The information that an agent can acquire on the environment is generally restricted to
*observations*which only give partial information about the state of the system. These observations can be obtained for example using sensors that return some estimate of the state of
the environment. Thus, the decision process has hidden state, and the issue of finding an optimal policy is no more a Markov problem. A model that describes such an hidden-state and
observation structure is the
**POMDP**(partially observable MDP). Formally, a POMDP is a tuple
where

S,
A,
Pand
rare defined as in an MDP.

is a finite set of observations.

Ois a table of observation probabilities.
O(
s,
a,
s^{'},
o)is the probability of transitioning from
sto
s^{'}on taking action
ain
swhile observing
o. Here
s,
s^{'}S,
aA,
o.

Hidden Markov Models are a particular case of POMDP in which there is no action and no reward. Based on the mathematical framework, several learning algorithms can be used in dealing with
diagnosis and prognosis tasks. Given a proper description of the
*state*of a system, it is possible to model it as a Markov chain. The dynamics of the systems is modeled as
*transition probabilities*between states. The information that an external observer of the system can acquire about it can be modeled using
*observations*which only give partial information on the state of the system. The problem of
*diagnosis*is then to find the most likely state given a sequence of observations.
*Prognosis*is akin to predicting the future state of the system given a sequence of observation and, thus, is strongly linked to diagnosis in the case of Hidden Markov Model. Given a
proper corpus of diagnosis examples, AI algorithms enable the automated learning of an appropriate Hidden Markov Model that can be used for both diagnosis and prognosis. Rabiner
gives an excellent introduction to HMM and describes the most
frequently used algorithms.

While substantial progress has been made in planning and control of single agents, a similar formal treatment of multi-agent systems is still missing. Some preliminary work has been reported, but it generally avoids the central issue in multi-agent systems: agents typically have different information and different knowledge about the overall system and they cannot share all this information all the time. To address the problem of coordination and control of collaborative multi-agent systems, we are conducting both analytical and experimental research aimed at understanding the computational complexity of the problem and at developing effective algorithms for solving it. The main objectives of the project are:

To develop a formal foundation for analysis, algorithm development, and evaluation of different approaches to the control of collaborative multi-agent systems that explicitly captures the notion of communication cost.

To identify the complexity of the planning and control problem under various constraints on information observability and communication costs.

To gain a better understanding of what makes decentralized planning and control a hard problem and how to simplify it without compromising the efficiency of the model.

To develop new general-purpose algorithms for solving different classes of the decentralized planning and control problem.

To demonstrate the applicability of new techniques to realistic applications and develop evaluation metrics suitable for decentralized planning and control.

In formalizing coordination, we take an approach based on distributed optimization, in part because we feel that this is the richest of such frameworks: it handles coordination problems in which there are multiple and concurrent goals of varying worth, hard and soft deadlines for goal achievement, alternative ways of achieving goals that offer a trade off between the quality of the solution and the resources required. Equally important is the fact that this decision-theoretic approach allows us to model explicitly the effects of environmental uncertainty, incomplete and uncertain information and action outcome uncertainty. Coping with these uncertainties is one of the key challenges in designing sophisticated coordination protocols. Finally, a decision-theoretic framework is the most natural one for quantifying the performance of coordination protocols from a statistical perspective.

As far as stochastic planning is concerned, since the mid-1990s, models based on Markov decision processes have been increasingly used by the AI research community, and more and more
researchers in this domain are now using MDPs. In association with the
*ARC INRIA LIRE*and with P. Chassaing of the OMEGA project, our research group has contributed to the development of this field of research, notably in co-organizing workshops for the
AAAI, IJCAI and ECAI conferences. We also maintain vivid collaborations with S. Zilberstein (on two NSF-INRIA projects) and with NASA (on a project entitled “Self-directed cooperative
planetary rovers”) in association with S. Zilberstein and V. Lesser of the University of Massachusetts, E. Hansen of the Mississippi State University, R. Washington now at Google and A.-I.
Mouaddib of GREYC, Caen.

We have been using the strengths of the basic theoretical properties of the two major approaches for learning and planning that we follow, to design exact algorithms that are able to deal with practical problems of high complexity. Instances of these algorithms include the JLO algorithm for Bayesian networks, the Q-learning, TD( ) and Witness algorithms for problems based on the Markov decision process formalism, etc. While it is true that the majority of this work has been done in the United States, the French research community is catching up quickly by developing further this domain on its own. MAIA has been involved directly in making substantial contributions to this development, notably through our active participation in the (informally formed) group of French researchers working on MDPs. Thus, today there is a growing number of research labs in France with teams working on MDPs. To name a few, Toulouse-based labs such as IRIT, CERT, INRA, LAAS, etc., the GREYC at Caen, INRIA Lille Nord Europe and Paris.

Most of the current work is focused on finding approximate algorithms. Besides applying these algorithms to a multi-agent system (MAS) framework, we have also been focusing on reducing the complexity of implementing these algorithms by making use of the meta-knowledge available in the system being modeled. Thus in implementing the algorithms, we seek temporal, spatial and structural dynamics or functions of the given problem. This is time-effective in finding approximate solutions of the problem. Moreover, we are seeking ways to combine rigorously these two forms of learning, and then to use them for applications involving planning or learning for agents located in an environment.

One of the research themes of the MAIA project is that of collective intelligence. Collective intelligence concerns the design of reactive multi-agent systems to collectively solve a problem. Reactive systems made up of simple-behavior agents with decentralized control that despite their individual simplicity are able to collectively solve problems whose complexity is beyond the scope of individuals: “intelligence” of the system can be envisaged as a collective property.

One of the difficulties in the design of reactive multi-agent systems is to specify simple interactions between agents and between them and their environment so as to make the society be able to fulfill its requirements with a reasonable efficiency. This difficulty is proportional to the distance between the simplicity of individuals and the complexity of the collective property.

We are interested in the design of such systems by the transposition of natural self-organized systems.

Reactive multi-agent systems are characterized by decentralized control (no agent has a knowledge of the whole system) and simple agents that have limited (possibly no) representation of themselves, of the others, and of the environment. Agent behaviors are based upon stimulus-response rules, decision-making is based on limited information about the environment and on limited internal states, and they do not refer to explicit deliberation.

Thus the collective complexity that is observed comes out of the individual simplicity and is the consequence of successive actions and interactions of agents through the environment. Such systems involve two levels of description: one for individual behavior (with no reference to the global phenomena) and one to express collective phenomena.

The design problem can be summarized as the two following questions:

Considering a global desired property or behavior, how to build individual behaviors and system dynamics in order to obtain it?

Considering a set of individual behaviors and a system dynamics, how to predict (or guarantee) the global property?

Such a methodology is still missing and we contribute to this goal. We organize our research in three parts:

understanding collective intelligence by studying examples of such (natural) systems,

transposing principles found in example systems to solve problems, and

providing a framework to help analyze and formalize such systems.

The first part is to model existing self-organized phenomena and thus have a better understanding of the underlying mechanisms. For instance, social phenomena in biology provide many examples in which a collection of simple, situated entities (such as ants) can collectively exhibit complex properties which can be interpreted as a collective response to an environmental problem. We have worked with biologists and provided several models of self organized activities in case of spiders and rats.

Since individual models and system dynamics are established, the second part consists in transposing them in order to solve a given problem. The transposition corresponds to encode the problem such as to be an input for the swarm mechanism ; to adapt the swarm mechanism to the specificities of the problem, and if necessary to improve it for efficiency purpose ; and then to interpret the collective result of the swarm mechanism as a solution of the problem.

The third part aims at providing a framework to face the following issues:

Is it possible to describe such mechanisms in order to easily adapt and reuse them for several different instances of the problem (
*generic or formal description*)?

If such a generic description of a system is available, is it possible to assess the behaviour of the system in order to derive properties that will be conserved in
its instantiations (
*analyze and assessment of system*)?

Among the two principal approaches to the study of multi-agent systems (MAS), we have chosen the line of “collective” systems which emphasizes the notions of interactions and organization. This choice is reflected in the numerous collaborations that we have undertaken with researchers of this field as well as in the kinds of research groups we associate and work with:

the AgentLink community in Europe, especially the members interested in self-organization, and

the research group “Colline” (under the aegis of GDR I3 and the AFIA) since 1997.

The approach that we have adopted for the design of multi-agent systems is based on the notion of self-organization, and it notably also includes the study of their emerging properties. If the research community working in this specific sub-domain is even smaller, it is growing interestingly, especially through the work being done at IREMIA (at the University of Réunion), at IRIT (Toulouse), at LIRIS (Lyon), at LIRMM (Montpellier) and in certain other laboratories of USA (D. Van Parunak, R. Brooks for example) and Europe F. Zambonelli (University of Modena, Italy), P. Marrow (British Telecom ICT Research Centre, UK), G. Di Marzo Serugendo (University of Geneva, Switzerland), etc.

Some of these researchers have taken inspiration from biological models to envisage the emerging properties. Principally, this current work is inspired by ant-colony models (such as at LIP6 and LIRMM in France or at the IRIDIA of Brussels in Belgium). We consider the use of the models such as the spider colonies or the groups of rats as an original contribution from us toward this study, it having never been utilized before. It must be mentioned that this field has been influenced to a considerable extent by the work of J.-L. Deneubourg of CENOLI (Brussels) which concerns phenomena involving self-organization in such colonies and the mechanisms of interaction by pheromones in ant-colonies.

In order to carry on its basic research program, the MAIA team has developed and is developing a strong known-how in sequential or distributed decision making. In particular, mathematical tools such as Markov decision processes, hidden Markov models or Bayesian Networks are appropriate and are used by the team for the development of real applications such as:

monitoring the hydration state of patients suffering from kidney disease.

Through “Dialhemo” (see Sec. ), the Maia team helps physicians to monitor patients by using stochastic models.

elderly fall prevention.

The PréDICA project (see Sec. ) illustrates the use of particle filtering to detect loss of autonomy for elderly people.

Coordination of intelligent vehicles and swarms of AUV (flying drones), see ).

Bayabox is a toolbox for developping Bayesian networks applications in java. It supports algorithms for exact inference and parameter learning in directed graphicals models with discrete or continuous Gaussian variables. Bayabox is used in the Transplantelic project (see Sec. ).

*Availability*: Not distributed.

*Contributors*: Cherif Smaili, Cédric Rose and François Charpillet.

*Contact*: francois.charpillet@loria.fr

The Dialhemo project has the objective to develop a remote surveillance and telediagnosis system adapted to renal insufficiency patients treated by hemodialysis. The main objective is to insure people who are treated either at home, or in self-dialysis centers, the same level of security as in hospital. A first software developed in cooperation with Diatelic SA, Gambro and ALTIR is currently experimented in several sites. About 150 patients currently benefit of this first system.

*Availability*: distributed by Diatelic SA

*Contributors*: Cédric Rose, François Charpillet

*Contact*: francois.charpillet@loria.fr

FiatLux is a cellular automata simulator that allows the user to experiment with various models and to perturb them. These perturbations can be of two types. On the one hand, perturbations of dynamics change the type of updating, for example from a deterministic parallel updating to an asynchronous random updating. On the other hand, the user may perturb the topology of the grid by removing links between cells randomly.

FiatLux may be run in an interactive mode with a Graphical User Interface or in a batch mode for longer experiments. The interactive mode is suited for small size universes whereas the batch may be used for experiments involving several thousands of cells. The software uses two external libraries for the random generator and the real-time observations of variables ; it is also fitted with output procedures that writes in Gnuplot, Tex, HTML formats.

In 2009, the software has evolved with the development of a module allowing the simulation of multi-agent systems (two models coded so far, the multi-Turmite model and the DLA).

*Availability*: Download it at
http://

*Contributors*: Nazim Fatès

*Contact*: Nazim.Fates@loria.fr

FPG is a probabilistic planner addressing its problem as a reinforcement learning one. Its principle is to optimise a controller's parameters (such as a neural network whose input is a state and whose output is a decision) using a domain simulator. It relies on the libPG library (see below).

Although the first version was meant to deal with concurrent temporal probabilistic planning, current development effort is focusing on FPG-ipc, the version used for the international planning competition.

*Availability*:
http://

*Contributors*: Olivier Buffet, Douglas Aberdeen, Joerg Hoffmann

*Contact*: olivier.buffet@loria.fr

LibPG is a high-speed C++ and modular implementation of many popular RL algorithms for MDPs and POMDPs including: Baxter and Bartlett's GPOMDP/OLPOMDP, Jan Peter's Natural Actor-Critic, Various PSR code from Satinder Singh's papers, Online PSRs from McCracken and Bowling, HMM estimation of hidden state from observations, Finite history methods.

It requires the uBlas components of the Boost library. Having Lapack and Atlas will also open up more features.

*Availability*:
http://

*Contributors*: Olivier Buffet, Douglas Aberdeen

*Contact*: olivier.buffet@loria.fr

In 2009 we were contacted by Canada defense for the presentation and the use of the simulator and the pheromone-based algorithms.

*Availability*:
http://

*Contributors*: Olivier Simonin, Arnaud Glad, François Charpillet

*Contact*: Olivier.Simonin@loria.fr

In the CRISTAL project (from Pole de competitivité Véhicule du Futur), we explore new models for safe platooning of autonomous vehicules, see ( ). Studying platooning models and identifying global complex behaviors (e.g. non-linear behaviors such as oscillations and amplification of perturbations) can be addressed through the individual-based simulation approach. As no such tool exists for Platooning, we proposed an original real-time and multi-agent simulator mixing the event-based approach and the influence-reaction model. This later ensuring the simulation of simultaneous actions from several autonomous entities. Finally we developed 1D and 2D graphical viewers, and connexions to 3D world viewers.

*Contributors*: Arnaud Glad, Olivier Simonin, François Charpillet, Alexis Scheuer

*Contact*: Olivier.Simonin@loria.fr, Arnaud.Glad@loria.fr

`wifibotlib`is a low/mid level library for controling WifiBot robots. This library allow interaction with the Wifibot either on the hardware level (setting motor speed, reading
sensors) or on a slighlty more abstract level (advance, turn, stop). This software is available on the INRIA gforge at
http://

*Availability*: Download at
http://

*Contributors*: Nicolas Beaufort, Jérôme Béchu, Alain Dutech, Julien Le Guen, François Rispal, Olivier Rochel

*Contact*: Alain.Dutech@loria.fr

Through various internship and master's project, MaIA has greatly contributed to a low/mid level toolbox for interacting with a fleet of KheperaIII robots. Currently, the developpment of this toolbox benefits greatly from the ADT ROMEA.

*Contributors*: Nicolas Beaufort, Alain Dutech, Romain Mauffray, Olivier Rochel, Olivier Simonin

*Contact*: Olivier.Rochel@loria.fr

`mdptetris`is a project that gathers our C software related to our works on the game of Tetris. It contains a highly optimized Tetris game simulator and several modules for computing
good controllers: exact dynamic programming for a small version of the game, approximate dynamic programming, cross-entropy optimization, CMAES (Covariance Matrix Adaptation Evolution
Strategy), UCT (Upper Confidence Tree).

*Availability*: Download at
http://

*Contributors*: Bruno Scherrer, Christophe Thiery, Amine Boumaza, Olivier Teytaud (from TAO, INRIA)

*Contacts*: Bruno.Scherrer@loria.fr, Christophe.Thiery@loria.fr

ISeeML is an Integrated Smooth, Efficient and Easy-to-use Motion Library. It offers a simple way to compute continuous-curvature paths for car-like robots, using line segments, circular
arcs and pieces of clothoids
,
: C++ simple constructors are provided for these paths, as well as for
more classical paths (
*i.e.*Dubins'paths
).

ISeeML will soon be distributed using GPL licence: it is currently in the process of registration to the APP (French Agency for Programs Protection).

*Availability*: Presented at
http://

*Contributors*: Alexis Scheuer

*Contacts*: Alexis.Scheuer@loria.fr

The keyword for our recent work on stochastic models is “distributed”. In terms of decentralized control, we have developed exact and approximate methods for the Decentralized Partially Observable Markov Decision Processes framework (DEC-POMDP) and investigated the use of game theory inspired concepts for learning to coordinate. We have also unveiled strong links between optimal and harmonic control and discussed some implications of these links for the distributed computation of optimal trajectories.

There is a wide range of application domains in which decision-making must be performed by a number of distributed agents that try to achieve a common goal. This includes information-gathering agents, distributed sensing, coordination of multiple distributed robots, decentralized control of a power grid, autonomous space exploration systems, network traffic routing, decentralized supply chains, as well as the operation of complex human organizations. These domains require the development of a strategy for each decision maker assuming that decision makers will have limited ability to communicate when they execute their strategies, and therefore will have different knowledge about the global situation.

Our research team is focusing on the development of a decision-theoretic framework for such collaborative multi-agent systems. The overall goal is to develop sophisticated coordination strategies that stand on a formal footing. This enabled us to better understand the strengths and limitations of existing heuristic approaches to coordination and, more importantly, to develop new approaches based on these more formal underpinnings. One important result is that we are showing that the theory of Markov Decision Processes is particularly powerful in this context. In particular, we are extending the MDP framework to problems of decentralized control.

By relying on concepts coming from the Decision Theory and Game Theory, we have proposed some algorithms for decentralized stochastic models. These new results are related to both planing and learning. This work has been supported partly by the INRIA associated team Umass with S. Zilberstein.

Mahuna Akplogan participated during his internship.

The DEC-POMDP model, proposed by S. Zilberstein in 2000, was one of the first models to formally describe distributed decision problems, but works have proven that building the optimal policies of agents in this context is in practice intractable (NEXP complexity). Our work is based on the constatation that the interactions among agents which can structure the problem are not explicitely represented. We assume that this can be one of the reasons why solving DECPOMDP is a difficult issue and that representing interactions can open new perspectives in collective reinforcement learning.

Guided by these hypotheses, in the past, we have proposed 1. an original formalism, the Interac-DEC-POMDP, in which interactions are explicitely represented so that agents can reason about the use of interactions and their relationships with others and 2. a new general-purpose decentralized learning algorithm based on heuristic distribution of rewards among agents during interactions to build their policies. However, it was difficult to compare Interac-DEC-POMDPs with DEC-POMDPs due to the particular structure of Interac-DEC-POMDPs.

We are currently poursuing a similar approach through the concept of social actions as a way to represent actions and interactions in a similar manner to be closer to DEC-POMDP formalism while allowing the agents to learn and to reason on the interactions with the other agents of the system .

Raghav Aras is a former PhD student of MAIA and now an external collaborator.

Studying Decentralized Reinforcement Learning, so as to allow Multi-Agent Systems to learn to coordinate, from the point of view of Game Theory lead us to formulate a new approach for solving Dec-POMDP. This new formulation can also be applied for POMDP.

More specifically, we address the problem of finding exact solutions for finite-horizon decentralized decision processes for
n-agents where
nis greater than two. Our new approach is based on two ideas:

we represent each agent's policy in the sequence-form and not in the tree-form, thereby obtaining a very compact representation of the set of joint-policies.

using this compact representation, we solve this problem as an instance of combinatorial optimization for which we formulate a mixed integer linear program (MILP).

Our new algorithm has been experimentally validated on several classical problems often used in the Dec-POMDP community.

The impact of our new approach is still to be evaluated. If our algorithm is quicker than other exact algorithms, the improvement is not very large. A valid question is to know if our new approach can inspire new algorithms for either infinite-horizon problem or for finding approximate solutions to finite-horizon problems. This work has been concretized this year by a major publication:

Planning in the Dec-PMDP framework has been shown to be very difficult: it is NEXP-complete in a finite horizon. The exact dynamic programming construction of policy trees is generally exponential in the planning horizon and the number of agents and doubly exponential in the number of observations.

Up to recently, problems of very small horizon could be solved. Recent approximate point based memory bounded approaches have been able to find plans for higher horizons: MBDP (Memory Bounded Dynamic Programming), IMBDP (Improved MBDP), MBDP-OC (MBDP with Observation Compression), PBIP (Point Based Incremental Pruning). They bound the number of policy trees to a fixed value, maxTrees. maxTrees points are generated as approximations of the prior probability distribution on states and for each point the best policy tree is kept for each agent.

We are trying to see whether probabilistic heuristic information (currently in the form of prior probability distribution over beliefs) can be generated and used to compute solutions of better quality. Based on this idea, this year , we have proposed a new approach for point based memory bounded dynamic programming planning using this information which, we expect, will lead to solutions of better quality: the problem of choosing the policy trees is formulated as a combinatorial optimisation problem whose objective is to maximise the expectation (given the heuristics distribution) of the sum of subsequent rewards.

In practice, the results heavily depends on the heuristics chosen. Overall, the approach is able to find very good solutions often much better than MBDP. The computations time is lower than MBDP and often by an order or two of magnitude.

In the future, we expect to adapt the approach in order to scale better with the number of observations: this would enable us to compare with algorithms such as IMBDP, MBDP-OC and PBIP on problems with a higher number of observations. A longer term goal is to see whether the approach may be used to scale better with the number of agents.

Anne Boyer, Armelle Brun and Ahmad Hamad (Kiwi team, Loria) are external collaborators.

Classical approaches in recommender systems need ratings (
*i.e.*utilities) to represent user preferences and deduce unknown preferences of other users. In this work, we focused on an original way to represent preferences under the form of
preference relations. This approach can be viewed as a qualitative representation of preferences at the opposite of the usual quantitative representation. The only information known is
whether a user prefers one item over an other item. “How much” this item is preferred is not known. This approach has the advantage not to require users to rate items in a small rating
scale of integer values, which may be a difficult task and the resulting ratings may highly depend on the user, his mood, the preceding items he has rated, etc. We have proposed to adapt
classical measures that exploit utilities so as to exploit preference relations, such as the similarity measure between users. First experiments have been conducted on a well-known user
data set that represents user utilities. We have transformed this data set under the form of preference relations. First results have shown that this approach leads to comparable
performance with the classical approach. This work has been accepted for publication in the proceedings of the French conference RFIA 2010
.

A large class of sequential decision making problems — namely active sensing — is concerned with acting so as to maximize the acquired information. Such problems can be cast as a special form of Partially Observable Markov Decision Processes where 1) the reward signal is linked to the information gathered and 2) there may be no actions with effects on the state of the system. These problems imply reasoning about belief states, and therefore involve dealing with large (if not continuous) state spaces.

Preliminary experiments have been conducted on a “hide and seek” problem where a predator wants to locate a prey in a complex environment. Using a probabilistic occupancy map, we have investigated two interesting heuristic approaches:

to keep on moving towards the point of highest occupancy probability; and

to search for the sequence of moves that is likely to maximize the acquired information (under some simplifying assumptions).

More recently, an in-depth study of existing work led us to a typology of active sensing problems and approaches. Our objective is to better understand this problem class and to possibly identify some specific structures that could be exploited.

This research will be further developed in the context of the COMAC project (Section ) concerned with the low cost identification of defaults in aeronautics parts made of composite materials by selecting the observations to perform (which sensor, where, at which resolution), with a possible extension to multiple collaborative active sensing agents.

We have deepened our understanding and analysis of the algorithm -Policy Iteration (or Temporal Difference Based Policy Iteration), which generalized Value Iteration and Policy Iteration by introducing a parameter (0, 1)that allows to continuously vary from one algorithm to the other. In , we have proposed a modified version of this algorithm, which is analogous the well-known modified version of Policy Iteration. We have proved that it converges to the optimal solution. Using analytical and empirical algorithm, we have underlined the fact that values of smaller than 1 are not interesting when computations are made exactly. We expect that this parameter will be useful in an approximate setting, and this is currently under investigation.

The game of Tetris is a very large (and therefore challenging) optimal control problem. In , , we consider the problem of designing a controller for this game. We use the cross-entropy method to tune a rating-based one-piece controller based on several sets of features among which some original features. This approach leads to a controller that outperforms the previous known results. On the original game of Tetris, we show that with probability 0.95it achieves at least lines per game on average. On a simplified version of Tetris considered by most research works, it achieves lines per game on average.

L. Cucu-Grosjean (TRIO team, LORIA) is an external collaborator.

Many embedded systems (e.g. in cars or planes) have to treat repetitive tasks (with different periods) using several processors. However, existing work focuses on distributing jobs on the processors under the assumption that their execution time is fixed, which requires considering the worst-case execution time.

Up to now, we have considered the problem of scheduling jobs over processors in this worst-case deterministic scenario. We have formalized this problem as a constraint satisfaction problem (CSP) and studied various exact and heuristic resolution algorithms .

Our objective is to turn to the uncertain scenario where probability distributions over task durations are known. This will require modelling the problem as an MDP and looking for the most appropriate resolution techniques.

Nicolas Beaufort and Jérôme Bechu during their internship.

Applying Reinforcement Learning on a robot is a difficult task because of the following limitations:

Learning must deal with continuous state and action spaces.

Learning must able to take advantage of very few experiences as the cost to get new experiences can be high, especially when time is concerned.

Nevertheless, by taking inspiration from the work of W. Smart
, we investigated the notion of
*efficient*reinforcement learning. We designed a simple artificial experiment where a WifiBot has to detect and move to a given target before stopping in front of it. Provided the robot
can detect and identify the target with its camera, showing the robot a path to the target 10 or 20 times is enough for it to
*learn*to do it by itself. This was achieved by combining Peng's eligibility traces
with locally weighted approximation of the Q-Value of the MDP
underlying the behavior of the robot.

Using the newest KheperaIII robot, we also worked on a indirect reinforcement learning algorithm. The goal of the robot is to learn a model of its environement using an approximation of the transition probabilities associated to its various action with only the smallest amount of external guidance by a human operator. The model is learned on a continuous state space and we have reused the locally weighted approximation algorithm previously developed. One of the strong point of our approch is a strong coupling between reinforcement learning induced behavior and more basic behavior (like obstacle avoidance) in a kind of subsomption architecture that allow the agent to efficiently navigate to its goal despite very crude and uncertain perceptions.

As a side effect of this work is the creation of two low/mid level library for controling WifiBots (TM) and KheperaIII robots. These libraries allow interaction with the robot either on
the hardware level (setting motor speed, reading sensors) or on a slighlty more abstract level (advance, turn, stop). This software is available on the INRIA gforge at
http://

In order to obtain an autonomous navigation system for intelligent transportation system, vehicles embedded systems have to complete several tasks: localization, obstacle detection, trajectory planning and tracking, lane departure detection,... In our work, we study approaches based on the use of stochastic method for multi-sensors data fusion in taking into account especially quality and integrity of fusion results. In this work, we pay a special attention to be safety critical in estimating the confidence which one can be placed in the correctness of the estimation supplied by the whole system. We consider that managing multi-hypotheses can be a useful strategy to treat ambiguity situation induced by sensors uncertainty or failure. The multi-sensor fusion and multi-modal estimation are realized using hybrid Bayesian network (HBN). The multi-modal estimation can be a way to manage multi-hypothesis for the localisation task in order to take into account the event of a sensors or an information sources imprecision or failure [cite:SMAILI:2008:INRIA-00339350:1]. There are several problems to tackle in order to develop a kind of fault tolerant data fusion approaches for safe vehicle localisation like the convergence robustness and divergence detection of multi-sensors fusion methods due to sensors measurements errors.

Our work tries also to study the optimal way to use new information sources like geographical 3D model managed in real-time by 3D Geographical Information System (3DGIS) to ameliorate an autonomous navigation system. In the last two years, we perform approaches for Mono and Multi-vehicles localisation, map matching and obstacle detection. Experimental results with real data are used to validate the developed approach and demonstrators are under development. This work is performed in the context of FD2S project of the GIS-3SGS and the CRISTAL project , .

Classical automated planning differs from Markov Decision Processes in that 1) transitions are deterministic and 2) the system is modelled in a structured — hence compact — manner using state variables. The present section presents work related both to classical (deterministic) planning and probabilistic planning, where problems involve both a structured representation and uncertainties in the system's dynamics.

Douglas Aberdeen (Google Zürich) is an external collaborator.

FPG addresses probabilistic planning. A key issue in such planning is to exploit the problem's structure to make large instances solvable. Most approaches are based on algorithms which explore — at least partially — the state space, so that their complexity is usually linked to the number of states. Our approach — a joint work with Douglas Aberdeen — is very different in that it is exploring a space of parameterized controllers. By chosing factored controllers (one sub-controller per action) with state variables as inputs, we strongly reduce the complexity problem.

In practice, the
*Factored Policy-Gradient*(FPG) planner uses a policy-gradient reinforcement learning algorithm (coupled to a simulator) to optimize a controller based on a linear network (or a
multi-layer perceptron). Although suboptimal, FPG proved to be very efficient by winning the probabilistic track of the international planning competition 2006. One of its strengths lies in
generalization: an action known to be good in certain states will be prefered in similar states. This novel approach is presented in full details in
.

Recent work includes comparing FPG with a probabilistic planner (named FQL) based on a
Q-learning algorithm. Although both algorithms use similar function approximators, FQL fails to provide good policies. Current research looks at using more appropriate policy-search
algorithms based on population algorithms of actor-critic algorithms.

We are also developing a new method for reward shaping, where the problem's reward function is modified so as to encourage progress towards goal states. The reward shaping is non-intrusive in that it does not change the optimal solution to the problem; it is essential to success in problems with large search spaces, where reaching the goal by a pure random walk is very unlikely. The shaping is based on progress estimates which are derived from the structure of the problem. In a pre-process, our technique automatically detects landmarks — variable values that every successful path must at some point traverse — as well as pairwise constraints on the order in which that will happen. The progress estimator is based on reasoning about how many landmarks yet need to be achieved. A paper is in prepaparation for ECAI 2010.

Ingo Weber (University of New South Wales, Australia) and Frank Michael Kraft (SAP, Gerlany) are external collaborators.

The behavior of certain software artefacts can be naturally expressed, at an appropriate level of abstractions, in terms of their effect on state variable values. At SAP, a set of 2700 system transactions, which underly the execution of business processes, have been modeled in this way (the uncertainty lies in the fact that many transactions may have different outcomes depending on details that are abstracted away at the level of the model). Our work leverages on this model by providing a formalization in a planning language, and an adaptation of an existing planning tool. The resulting technology fully automatically composes useful business process fragments, requiring as input only the “goal”, i.e., a specification of which variables should assume which values. This corresponds well to the background and language of the targeted group of end-users (managers); in a prototype developed at SAP, the goal specification is given using simple drop-down menues. The technical description of the planning aspects of the work is published in an ICAPS'09 workshop ; a full paper is in prepaparation for AAAI 2010.

Hector Palacios (Universidad Simon Bolivar, Caracas, Venezuela) is an external collaborator.

An interesting question for several types of cellular automata is that of which behaviors lead to a stable system state where no more changes can be made. This corresponds to the problem of planning from an initial system state to a stable state. Adding uncertainty about what the initial state is, the task for the planner is to find a general strategy that leads to a stable state from many (in the extreme case, from all possible) start states. We have formulated this problem in a planning language, and are investigating under which conditions existing planning tools can solve the problem to which extent of generality. A paper is in preparation for ICAPS 2010.

Carmel Domshlak (Technion Haifa, Israel) and Ashish Sabharwal (Cornell University, USA) are external collaborators.

Planning as SAT is one of the most effective known approaches for finding plans with an optimality guarantee. The bottleneck lies in leading the optimality proof, which entails proving that no shorter plan exists. Similar disproval tasks have been addressed very successfully in Verification, by considering abstractions (over-approximations) of the system at hand. We applied this methodology to Classical Planning, and found that, somewhat surprisingly, hardly any empirical benefit can be gained. Towards explaining this, we have conducted a theoretical analysis, revealing that, in many of the considered SAT encodings of planning, abstraction cannot improve the best-case behavior of resolution. This finding may be relevant as well for other areas (like Verification) where both abstraction and SAT solving have been successful. The results are presented in .

There are many situations which require us to deal with strongly interacting, massively parallel and decentralized systems. This is what brought us to work in the field of self-organized systems. These systems are described by various formal models such as reactive multi-agent systems or cellular automata. The work of the team mixes both theoretical and experimental approaches and seeks to provide applications in the field of image processing, localization and tracking, and bio-inspired problem solving.

L. Ciarletta (Madynes team) is external collaborator for this action.

In distributed, dynamic networks and applications, such as Peer-to-Peer (P2P) networks or Mobile Ad hoc NETworks (MANET), the users behaviour has a strong influence on the quality of service (QoS). Furthermore, the QoS also influences the users behaviour. In worst cases these mutual influences could lead the system to crash. We propose a novel approach to model relationships between users and QoS.

We propose to use models and simulators from both fields (computer networks and human behaviour) and to make them interact. This raises some coordination issues (synchronization, compatibility and coherence).

We first implemented this proposal by adapting an existing simulator (PeerfactSim). This is called a strong coupling approach. We undertook experiments to study the influence of the rate of cooperation of user and the rate of pollution of data on the functioning of the network . These first experiments show us the limits of such a strong and centralized approach.

So we propose to use the Agent and Artefact paradigm in order to deal with the coordination issues and to make the interaction of the existing simulators and models decentralized and as simple as possible. We have developed a decentralized coordination framework called AA4MM

The aim of this framework is to make heterogeneous simulators interact in such a way that coordination and integration issues are transparent for the people involved in the simulation process. When someone wants to include an existing simulator within the AA4MM framework, only few changes are needed.

Moreover, the framework is based upon a decentralised coordination model we proposed. This has been formalised (in Event-B) in collaboration with Joris Rehm

Source code and JMS implementation have been developed in collaboration with Virginie Galtier Ciarletta

This framework is currently used in order to study the impact of the user mobility on the performances of MANET.

In a reactive multi-agent system (MAS), the link between the collective behaviour and the behaviours of the individuals who make up this system is difficult to set up. We support the concept of driving the behaviour of a MAS by a control approach. In order to obtain this control, we act on the MAS by using information about its global behaviour.

The originality of our proposal lies in the global level description of the MAS's dynamics. Thus the different behaviours of the MAS are expressed, in our proposal, at the same description level as the one of the target behaviour. We developed a clustering measure to identify a current behaviour in a MAS representing pedestrians.

We studied this approach, and its control performance, that is its capacity to reach a target behaviour even if the MAS is initialised in a stable, undesired behaviour. It remains efficient when the MAS undergoes a perturbation. The use of luring agents as control actions is studied too. The proposal provides good control performance on a study MAS and achieves a target behaviour more frequently than other tested approaches. The details of the experiments are provided in the Phd Thesis .

This work , , is an attempt to formalize swarm intelligence under the angle of the science of complex systems. Its purpose is to design a generic model of situated reactive multi-agent systems capable of explaining collective behaviors resulting from auto-organization mechanisms such as those observed in natural systems like birds flocking or ants foraging.

The model we propose integrates decisional mechanisms inspired from coupled map lattice (CML) imagined in 1986 by the physicist Kuhiniko Kaneko for the study chaotic space-time phenomena. Roughly speeking CML can be seen as cellular automata in continuous space in which the transitions are controlled by chaotic nonlinear functions like the logistic function (the logistic map is a polynomial mapping, often cited as an archetypal example of how complex chaotic behaviour can arise from very simple non-linear dynamical equations).

This source of inspiration has several advantages: the mathematical framework is well suited to model dynamical systems such those we want to study and interesting mathematical results are available.

Cellular automata can be seen as the environment part of a multi-agent system. Formally, they are discrete dynamical systems and they are widely used to model natural systems. Classically
they are run with perfect synchrony;
*i.e.*, the local rule is applied to each cell at each time step. A possible modification of the updating scheme consists in applying the rule with a fixed probability, called the
synchrony rate.

It has been shown previously that the updating method of a cellular automaton could produce a discontinuity in the behaviour of the cellular automaton. We investigated the nature of this
change of behaviour using Monte-Carlo numerical simulations. For a stochastic version of the Greenberg-Hastings CA, we showed that the phenomenon is a phase transition whose critical
exponents are in good agreement with the predicted values of directed percolation
. . We wrote a short survey to gather the references relevant to
cellular automata and critical phenomena
. We also contributed to a collective book dedicated to the Game of
Life edited by A. Adamatzky. Our chapter, entitled “Does
*Life*resist desynchronisation?” examines the behaviour of the well-known cellular automaton under asynchronous updating
.

These phase transitions were also observed in the context of bio-inspired computing. We examined how to model cellular societies such as the
*Dictyostelium Discoideum amoebae*. We proposed a simple model of their behaviour, which allows to group a great number of agents at the same localisation without any need for a
centralised control
.

It is a well-known problem that there exists no agreement in the scientific community on how multi-agent systems should be defined formally. The practise so far has been either to use an
*ad hoc*formalism to describe a model, or, to present a model informally and to analyse the simulations obtained with a particular simulation platform. As a result, two major drawbacks
appear: (a) reproducing the experiments on another platform is difficult, if not impossible, since one needs to have all the implicit parameters of a simulation in hand (e.g., the order of
updating of the agents), (b) it is not clear whether the behaviour observed is due to the rules defining the agents and the environment or due to the “simulation scheme” that governs the
interaction between the components of a multi-agent system.

To tackle this problem, we focused our efforts on two directions:

We examine what is the relationship between cellular automata and multi-agent systems. A first method to automatically translate a reactive multi-agent model into a cellular automaton was proposed, see ICAART publication .

We propose a framework for describing multi-agent systems as discrete dynamical systems. As a starting point, we have restricted this study to very simple agents where the cells of environement are binary and where the agent's actions are restricted to turning left/right, moving forward and changing the state of the cells on which they are located (submitted).

Swarm intelligence emerges from the interactions performed by a large number of simple agents. In this context, we are interested by algorithms relying on the marking of the environment, which is used as a common memory (a well-known example is pheromone dropped by ants to perform indirect communication). Such an approach is now called digital pheromones, as marks are values that can be read and written by agents in cells of a discrete environment. In this framework we address the following challenge: understanding and designing self-organized systems, deploying physically these models and enable their interaction with real robots.

We proposed in 2007 a reactive-agent algorithm, called EVAP, to deal with multi-agent patrolling, which is based on the marking and evaporation of a digital pheromone (cf. publications ICTAI'07, JFSMA'07). During the simulations carried out to measure the performances of EVAP, we identified that the system can self-organize towards an optimal behavior. In particular we observed that agents tends to follow stable cycles corresponding to a hamiltonian covering of the environment. We then established the mathematical proof that the system can stabilize only in cycles, one per agent, having the same lenght, cf. publication in ECAI'2008. Moreover, we recently introduced new heuristics in the agent behavior that improve dramaticaly the time for convergence, and we proved that under some hypotheses the system always converge to stable cycles (these results have been published to SASO and JFPDA conferences). This work is in line with the PhD thesis of Arnaud Glad (since dec. 2007) which concerns the understanding and the optimization of self-organization resulting from such algorithms.

In the context of path-planning, we rewrited the classical Artificial Potential Field (APF) computation proposed by Barraquand & Latombe in an asynchronous and collective construction by reactive agents. We proved this model builds an optimal APF while dealing with the collective forraging problem (research and transport of resources by a set of autonomous agents/robots). In 2008, we extended simulations and measures by introducing dynamic environments (moving obstacles). Then we have shown that our approach is more efficient in static environments than the classical ant algorithm, and need to be extended with a behavioral heuristics to compete with it in dynamic environments. An article is under submission to TAAS international journal. We done this work in collaboration with Eric Thierry from LIP, ENS Lyon.

The Maia team acquired in the end of year 2008 six Kheperas III mobile robots, in order to study and validate several multi-robot models. In particular we aim at studying reactive coordination and swarm models (as presented below). In 2009, we obtained ten more Kheperas III robots throught the CPER MIS Action. Thus we dispose now of a first swarm robotic system, that will allow us to focus our work on multi-agent models. In the same time, we explored different ways to tackle the challenge of implementing environment-based models with real robots. It requires to allow robots to read and write information in the environment, for instance digital pheromones.

*Reactive coordination with Potential Fields*. We investigate the multirobot navigation problem funded on artificial potential fields (APF) and real signals emission. We first
considered the definition and the solving of the “robot-pushing” task. It extends the box-pushing task by replacing the box for an immobilized robot, which can send signals to be pushed
in a desired direction by a swarm of reactive robots
(Mechatronics Journal). During the Master internship of
Kaouther Bouzouita, we analysed and proposed generic mathematical models of the potential fields and signals used in the collective control. It allowed us to optimize infra-red settings
and to validate the control with Kheperas III robots (publication in preparation).

*Multi-robot Deployment and Mapping*. As introduced in Sec.
, we started through the ANR Cartomatic project the study of multi-robot
deployment and mapping. We aim at exploring multi-agent models for fast exploration of multi-room spaces, and to improve accuracy of the localization of robots and objects. This work is
in line with the PhD thesis of Antoine Bautin, starting in november 2009.

*Human inspired heuristics for robots*. This work concerns a new collaboration with E. Zibetti from Paris 8 university and CHART Laboratory (Cognition Humaine et Artificielle), and
N. Sabouret and A. Beynier from Paris 6. It concerns the identification of human heuristics face to spatial and cooperation problems, and their translation to mobile robots. The
approach and first experimental results are introduced in
. We also plan to submit in january 2010 a young researcher ANR
project, headed by E. Zibetti.

We aim at defining and studying bio-inspired robots able to interact through their environment by reading, writing or modifying it (as laying pheromone trails, following a signal, etc.). The challenge is to translate theoritical and/or simulated models to real systems in order to define new robotics abilities. We started two original approaches, one relying on the design of an interactive table (funded by INRIA ADT ROMEA ), another consisting in paving the ground with intelligent tiles.

*Interactive Table*: For this purpose we developp a first original environment based on an interactive table.
*It will allow robots to read and write information on their environment without requiring/communicating their location*(see ROMEA project for the design of the table, section
). Robots evolve on the table (2m x 2m) and mark their presence through
infra-red emissions. Such a support should allow the study of continous and discrete active environment models, and also to consider the man in the loop.

*Intelligent Tiles*: In this second approach we consider large real environments (indoor), and study how to deploy discrete multi-agent models. To this end we propose to pave the floor
with “communicating” and autonomous
*tiles*. Each tile is defined to ensure communication with its neighbours, and to allow a possible supported agent to read and write information. As a consequence tiles can be
exploited to extend agents' perceptions and communications, and to physically implement bio-inspired algorithms. A first Tile model has been defined and evaluated using a simulator. In
particular we rewrited the Satisfaction-Altruism model (Simonin & Ferber 2001) by spliting its behavior between a tile behavior and a simplified agent behavior. These first results were
presented in a paper published to ICAART'09 international conference
. In 2009, during the Master internship of Romain Mauffray, we
developed a tiles' emulator and performed some experiments with real mobiles robots (Kheperas III), which validated the interest and the efficiency of the approach.

We consider decentralised control methods to operate autonomous vehicles at close spacings to form a platoon. We study models inspired by the flocking approach, where each vehicle computes its control from its local perceptions. Such decentralised approaches provide robust and scalable solutions.

However, collision avoidance needs to be studied to be guaranteed. Stability is also an open problem for platoon of more than three vehicles: do oscillations appear when the motion of some vehicles are disrupted?

This action is related to the ANR project TACOS (see Sec. ), which started in January 2007. It is a collaboration with the DEDALE project of LORIA Lab. We are interested in formally specifying and studying situated multi-agent systems. This is an open problem, which is particularly interesting when designing critical decentralised systems. Our approach relies on the formal specification – in B language – of the influence-reaction model, which is a generic formulation of multi-agent systems proposed by Ferber & Muller in 1996.

This specification can be instantiated to prove properties for specific multi-agent systems. We considered as example the platooning task: we proved that the Daviet and Parent constant coefficient controller ensures collision avoidance with a simple longitudinal platooning model (perceptions and actions are synchronous and without error). This work is presented as an INRIA research report and an article under submission to an international journal.

We studied a more realistic model of the platooning task, introducing a delay between perceptions and actions and noise/errors in perceptions and actions. Within this modelling, Daviet and Parent controllers show their limitations. We thus proposed a high-level controller, which transforms any controller into a safe controller, i.e. avoiding collision, and we proved this property. This work has been published in the 2009 IEEE International Conference on Robotics and Automation (ICRA'09) . An extended version of this article, with detailed proof, is available as INRIA research report and will be submitted to an international journal.

The work presented in the two previous paragraphs focuses on longitudinal control: all the vehicles are moving along a fixed path. When vehicles move in a two dimension space, a lateral controller is needed to steer the vehicles. While lateral and longitudinal controls can be considered separately, the longitudinal control should be done after the lateral control: while turning, a higher inter-vehicle distance is needed to avoid collision. A lateral control only based on individual sensors will be studied and the current longitudinal control adapted to curvilinear distances. First results were obtained through the cooperation with the team of Pr. D. Matko from Ljubljana University (see PHC Egide 06-08) where an approach based on reference-path control was explored, see publication ICINCO'08. In 2009, we started the study of the non-stop crossing of two orthogonal decentralized platoons, through the internship of Sebastien Alouze (Mines de Nancy, initiation a la recherche internship).

We continue to develop telemedicine solutions for End Stage Renal Chronic patients. Transplantelic is a telemedicine project which aims at improving the follow up for patients with kidney graft. A new system is being developed and a clinical trial in a three year project is scheduled. Transplantelic just started in the beginning of 2006 and it is funded both by Region Lorraine and ARH. We have developed a new expert system using Bayabox (see Sec. ) for the surveillance of patients with graft kidney.

In kidney failure treatment by hemodialysis, the blood is purified outside the body. The vascular access that allows to perform the extra-body blood circulation is usually a vein of the arm that has been enlarged by a surgical creation of a fistula that connects the vein to an artery. Since the number of veins in the arm is limited, the prevention of complications such as stenosis or thrombosis of the vascular access is a key issue in hemodialysis treatment.

One of the parameters recorded by some dialysis machines is the ionic dialysance which is based on a conductivity measure. The ionic dialysance is an indicator of the flow of filtered wastes. Previous work have shown that the follow-up of the dialysance and blood pressures can help to detect early a potential risk with the vascular access.

Gambro is an international company that develops dialysis machines. The Diatelic company was founded in 2002 as an Inria start-up for developping telemedicine solutions for kidney failure treatments. Gambro, Diatelic and Maia have collaborated through a CIFRE convention to develop an automated classifier of the dialysis sessions for estimating the risk related to the vascular access.

The main difficulty of the analysis is the large variability of the measures and the need to detect tendencies. The system developed is based on a supervised learning of a dynamic signal classifier formalized as a dynamic bayesian network. A preprocessing of the signals allows for each patient to be his own reference. Two separate labelled datasets were provided by a medical expert from Gambro for developping and testing of the system.

The evaluation of the results was done performing a double-blind analysis of real data which resulted in a 85% agreement rate. The system was validated by the medical expert who estimated that the concordance of the automated classification with his own classification was good enough for the system to be included in a gambro software that is planned for 2010.

The CUGN (communauté urbaine du grand nancy) possesses a huge amount of data collected from its public transportation network. These data are used to monitor and regulate in real time the trafic of buses and tramways.

In this collaboration, we studied the possibility to integrate these data in a multi-agent simulator in order to reproduce in differed time the functioning of a part of the network and to propose indicators that help to analyse the functioning of the system. A first part of the work was dedicated on the extraction of relevant data from raw ones and on their integration in the simulator. This year, we studied within a student projet different viewpoints to provide an accurate and meaningful understanding of the transportation system.

This project relies on results and questions arising from the SMAART project (2006-08). During this project we adapted the EVAP algorithm to the patrol with UAVs, while providing a generic digital pheromone based patrolling simulator (see ). Concerning sharing authority, we proposed original interface to manipulate groups of UAVs. However, experiments with operators have shown that they succeed in improving the whole system when dealing with the patrolling task.

So, the aim of the SUSIE project is twofold: (i) studying and improving parameters of the EVAP algorithm through the SMAART simulator, (ii) defining new ways to manipulate pheromones fields in order to improve the sharing authority.

Abdallah Dib,

The program PréDICA is related to the theme of falls in the elderly, with aspects related to both prediction and detection included. The program is a continuation of the exploratory project PARAChute, which was financed by the RNTS in 2003, which included only those aspects related to fall prediction:

Definition of the characteristic parameters of a static balance “signature” using the stabilogram analysis produced by a personal scale.

Analysis of typical gait using a camera without images.

Partners: Lohr Transitec, GEA, VULog, UTBM, MaIA, TRIO, DEDALE, IMARA, Lasmea.

This project is one of the major projects of the Alsace Franche-Comté competitiveness cluster on automotive systems. This Cristal project is leaded by Lohr Industry which has conceived the tram chosen for the city of Clermont-Ferrand(Translohr). Lohr is convinced that this kind of system conceived for transportation of huge number of people can be completed by an individual transportation system made up of small electric vehicles for short downtown movements. One key issue of such a system is platooning (convoy of automatic vehicles) and certification. This project has been funded by FCE (1 M euros). During the year 2008, we studied the Daviet & Parent approach for platooning, showing the limit of the model when the number of vehicles grows. For this purpose we developed a multi-agent simulator allowing the study of models' parameters and robustness (via simulation of noise on sensors and actuators). We then proposed a high-level controller, which transforms any controller into a safe controller, i.e. avoiding collision, and we proved this property. We presented these results in Cristal Project's annual reports and published an article to ICRA'09 Conference .

TACOS is an ANR-SETIN project started in january 2007 and managed by the DEDALE-Loria team (Pr. J. Souquieres). Other partners are LACL LAS (Paris 12), LAMIH ROI-SID (U. Valenciennes) and LIFC TFC (U. Franche-Comté).

This project proposes a components based approach for the specification of trustworthy systems. It consists in requirement expression to formal specification, by using or adapting existing tools. The applicative domain is the transport, by focusing on distributed and embedded systems that have functional and non-functional properties relating to time and availability contraints. Maia is involved in the definition of the case-study, which consists in a platoon of autonomous vehicules. In order to study such systems we defined in collaboration with DEDALE a generic B expression of the Influence/Reaction model, that was proposed by Ferber & Muller in 1996. The I/R model allows to clearly represent dynamics in situated multi-agent systems. Our proposition extends the approach to its formal writting and the ability to prove some properties by using B provers. We illustrated this framework by studying the bio-inspired platooning model proposed by Maia. This work is detailled in internal report INRIA-00173876 and is submitted to an international journal. We now work on improving the framework, involving simulation and study of the properties of the system.

This project proposes to adress localisation of platoon of vehicules as a data fusion problem. This year we proposed multi-sensor fusion method for a high performance navigation system. This method is designed to support real-time navigational features. At the heart of the proposed method is Bayesian Network which fuses data from several positioning sensors. The data fusion approach provides high quality, high integrity data to both the navigation systems and also for the control, and is consequently safety critical. Managing multi-hypotheses is a useful strategy to treat ambiguity situation induced by sensors uncertainty or failure. The multi-sensor fusion and multi-modal estimation are realized using hybrid Bayesian network (HBN). The multi-modal estimation is a way which we propose in this paper to manage multi-hypothesis for the localisation task in order to take into account the event of a sensors or an information sources imprecision or failure. Experimental results, using data from Anti-lock Braking System (ABS) sensors, a Differential Global Positioning System (DGPS) receiver and an accurate digital roadmap, illustrate the performances of the proposed approach, especially in ambiguous situations.

Rooted at the crossroad of neuroscience and computer neuroscience, this project aims at increasing our understanding of the brain. By studying, modeling and simulating the behavior of
*orienting the gaze*from different points of view (neuroscience, behavior psychology and artificial intelligence), we plan to increase our understanding of the spatial and temporal
dynamics of cortical maps. MAIA is more particularly involved in using reinforcement learning in a high level simulation to validate some functional concepts and then in trying to formulate
biologically plausible low-level mecanism that could support the reinforcement learning paradigm.

**Partners :**Loria-INRIA, Nancy (Maia and Cortex); UMR Mouvement et Perception, Marseille; Institut de Neurosciences Cognitives de la Méditerranée (INCM)-CNRS, Marseilles;
Laboratoire d'Informatique en Images et Systèmes d'information (LIRIS), Lyon.

The project gathers researchers from the MAIA team (Nazim Fatès, Nikolaos Vlassopoulos), the CORTEX team (Bernard Girau), the ALCHEMY team (Hugues Berry). It began in January 2008 and goes on until teh end of December 2009. The context of our collaboration is the definition of innovative schemes of decentralised and massively distributed computing. We aim at contributing to this at three levels:

At the modeling level, we think that biology provides us with complex and efficient models of such massively distributed behaviours. We start our study by addressing the decentralised gathering problem with the help of an original model of aggregation based on the behaviour of social amoebae.

At the simulation level, our research mainly relies on achieving large scale simulations and on obtaining large statistical samples. Mastering these simulations is a major scientific issue, especially considering the imposed constraints: distributed computations, parsimonious computing time and memory requirements. Furthermore its raises further problems, such as: how to handle asynchronism, randomness and statistical analysis?

At the hardware level, the challenge is to constantly confront our models with the actual constraints of a true practice of distributed computing. The main idea is to consider the hardware as a kind of sanity check. Hence, we intend to implement and validate our distributed models on massively parallel computing devices. In return, we expect that the analysis of the scientific issues raised by these implementations will influence the definition of the models themselves.

The work related to this issue was presented at the “journées ARC” in Bordeaux, see:

Laurent Bougrain (CORTEX team, LORIA) is an external collaborator.

The COMAC
*contrôle optimisé multi-techniques des aérostructures composites*/ optimised multi-technique control of composite aeronautic parts

In collaboration with Laurent Bougrain, one of our objectives is to propose a software toolbox for computer-aided diagnosis in this context. The current project is a system relying on expert knowledge taking the form of a database of labelled images.

In the MAIA team, our research effort will focus more precisely on information gathering problems involving active sensors, i.e. an intelligent system which has to select the observations to perform (which sensor, where, at which resolution). Mauricio Araya has recently started a PhD on this precise topic of Active Sensing (Section ).

The DOPEC project is a DGA PEA (upstream studies project) on the optimization of the use of sensor systems. In collaboration with EADS (project leader) and the LAAS, we work on autonomous sequential decision making problems. We are more particularly interested, on the one hand, in multi-agent problems and, on the other hand, in taking uncertainties into account.

The main challenge of this STI2 PEPS (Projets Exploratoires Pluridisciplinaires) is to gather researchers from several fields (sociologist, specialist of theory of organization and computer scientists) to collaborate on the understanding and the modelling of social organizations.

Whereas an important number of collaborations has been undertaken between computer scientists and ethologists to understand and model biological collective phenomena, few similar works have been conducted in sociology. The main objective of this STI2 PEPS (Projets Exploratoires Pluridisciplinaires) proposed by Pascal Roggéro (Université Toulouse 1) and Christophe Sibertin-Blanc (IRIT) is to gather multi-agent community and researchers from sociology (sociologists, specialists of theory of organization) to collaborate on the understanding and the modeling of social organizations.

As an exploratory project, this project is very prospective. It focuses on presenting organization theories and multi-agent models that could be suited for social systems multi-agent modeling. This project constitutes a first step to analyse the feasibility of a research programme devoted to social organization modeling.

This project is funded by the "Institut Rhône Alpin des Systèmes Complexes". It involves teams from the LIG (Laboratoire d'informatique de Grenoble) and from the LIESP (Laboratoire d'informatique pour l'entreprise et les systèmes de production), and is associated to the CUGN.

The aim of the project is the governance of transportation systems from the enactive perspective. This year, we focused on the specification of a participative simulator which aims at proposing means for a regulator to interact with a simulated transportation systems.

This project is funded by the “Agence Nationale de la Recherche”. It was selected for the Robotic Challenge “Carotte”, ie. ”CArtographie par ROboT d'un TErritoire”, in the “Contenus et Interactions” program. The consortium is composed of researchers from LISA Laboratory (P. Lucidarme is the coordinator of the project), MAIA team and Wany Robotics. This project concerns the mapping of an indoor structured but unknown environment, and the localisation of objects, with one or several robots. We aim at studying multi-robot or swarm algorithms to achieve such a challenge, while showing the robustness and the accuracy of the mapping when using cooperation between several autonomous robots. Antoine Bautin has recently started a PhD on the topic of multi-robot deployment and mapping (Section ).

ROMEA, for “RObots Mobiles et Environnements Actifs”, is a project proposed by Maia team and funded by INRIA NGE through an ADT “Action de Developpement Technologique”. The project deals with the development and the study of intelligent and collective behaviors with Khepera III mobile robots. In particular we aim at developping a new technological support based on an interactive table to allow robots to read and write information in the environment (e.g. digital pheromones), see Section . Nicolas Beaufort was recently hired as an INRIA IJD engineer to developp the required functions on the interactive table and the robots.

MAIA is member of AgentLink that is the European Commission's IST-funded Coordination Action for Agent-Based Computing
*SELF-ORGANISING SOFTWARE - FROM NATURAL TO ARTIFICIAL ADAPTATION*where MAIA team is responsible of two chapters.

The InTraDE
http://

Olivier Buffet was a reviewer for the conferences: JFPDA-09, RFIA-10, SAS-09; and for the journals: AIJ (Artificial Intelligence Journal), JAIR (Journal of Artificial Intelligence Research), RIA (Revue d'Intelligence Artificielle), SMC-C (IEEE Transactions on Systems, Man and Cybernetics - C) .

Alain Dutech has been a reviewer for JFPDA-2009, for two special issues of RIA (JFPDA and Agents' Rights) and for the following journals: RIA; JAAMAS.

Bruno Scherrer was a reviewer for ICML, IJCAI, RFIA and JFPDA.

Alexis Scheuer was a reviewer for AIAA JGCD (American Institute of Aeronautics and Astronautics' Journal of Guidance, Control and Dynamics), for IEEE TRO (Transactions on Robotics) and for Elsevier's international journal SIMPAT (Simulation Modelling Practice and Theory).

Olivier Simonin was a reviewer for TAAS (ACM Transactions on Autonomous and Adaptive Systems), Mechatronics (International Journal), for the conferences JFSMA'09, JFSMA'10, HUMOUS'10, FIRA ICER09 & ICSR09, and for the workshops MMAS'09 (AAMAS workshop), AT2AI-7 (2010), SIM'09, SIM'10 and ACM SAC'09.

Vincent Thomas was a reviewer for AAMAS09 (Autonomous Agents and Multi Agents Systems), ISA-IADIS 2009 (Intelligent Systems and Agents) and JFPDA-2009.

MAIA is a leading force in the
*PDMIA*group (Processus Décisionnels de Markov et Intelligence Artificielle) and played a great part in the annual meeting of the group. This year, the group annual meeting was held
in Paris as the fourth edition of the JFPDA conference (JFPDA'09) where people from the
*planning*community exchanged with people from
*reinforcement learning*.

Vincent Chevrier is a member of:

the editorial board of Interstices

the advisory board of JFSMA (Journées Francophones sur les Systèmes Multi-Agents)

the program committee of AAMAS 09 and AAMAS 10 (the International Conference on Autonomous Agents and Multiagent Systems) ; Eumas09 (European workshop on multi-agent systems) ; ICAART09 (the International Conference on Agents and Artificial Intelligence) ; MCPC 2009 (the international Conference on Mobile Communications and Pervasive Computing) ; and SSS09 (track on Self-Organizing Systems at the International Symposium on Stabilization, Safety, and Security of Distributed Systems).

Vincent Chevrier is the moderator of the mailing list of the French spoken community on multi-agent systems.

François Charpillet was a member of the following conference committees:

Cinquièmes Journées Francophones Modèles Formels de l'Interaction, 2009,

JFSMA'09 (Journées Francophones sur les Systèmes Multi-Agents),

SFTAG 2009 (congrès de la Société Française des Technologies pour l'Autonomie et de Gérontechnologie),

MSDM 2009 - 2010 (Multi-agent Sequential Decision Making in Uncertain Domains),

ECAI 2010 (European Conference on Artificial Intelligence).

François Charpillet is member of the editorial committee of "revue d'intelligence Artificielle".

Christine Bourjot, Vincent Chevrier and Olivier Simonin are members of the working group "Colline” (AFIA, GDR I3).

Christine Bourjot is member of the scientific council of CogniEst “Reseau Grand Est des Sciences Cognitives”

Christine Bourjot is member of the administration council of ARCo “Association for cognitive research”

Joerg Hoffmann is Conference Co-Chair of the “20th International Conference on Automated Planning and Scheduling” (ICAPS 2010).

Joerg Hoffmann is an Associate Editor of the “Journal of Artificial Intelligence Research” (JAIR).

Joerg Hoffmann is Area Chair for Planning of “AI Communications”.

Joerg Hoffmann was a member of the following conference program committee:

Senior Program Committee of the 19th International Conference on Automated Planning and Scheduling (ICAPS 2009)

Senior Program Committee of the 21st International Joint Conference on Artificial Intelligence (IJCAI 2009)

Program Committee of the 6th International Conference on Integration of Artificial Intelligence and Operations Research Techniques in Constraint Programming for Combinatorial Optimization (CPAIOR 2009)

Olivier Simonin was a member of the following conference Program Committee:

JFSMA'09 (Journées Francophones sur les Systèmes Multi-Agents).

HUMOUS'2010 (Conference on Humans Operating Unmanned Systems).

ICSR 09 Inter. Conference on Social Robotics (FIRA RoboWorld Congress).

ICER 09 Inter. Conference on Entertainment Robotics (FIRA RoboWorld Congress).

SIM 09 track on Advances in Computer Simulation at ACM Symp. on Applied Computing (SAC 09).

AT2AI-7, Seventh International Symposium "From Agent Theory to Agent Implementation" co-located with the 20th European Meeting on Cybernetics and Systems Research (EMCSR'2010), 2010.

Olivier Buffet is a member of the editorial board of the “revue d'intelligence artificielle” (RIA).

Vincent Thomas was a member of IADIS 2009 program committee.

François Charpillet was a member of the following PhD committees:

(as a reviewer) Jilles Dibangoye, Contributions à la résolution des processus décisionnels de Markov centralisés et décentralisés: Algorithmes et Théorie, decembre 2009, Université Laval, Quebec, Canada.

(as a reviewer) Jean Michel Contet, Modèle multi-agents réactifs pour la navigation multi-véhicules: Spécification formelle et Vérification, UTBM, 4 décembre 2009

(as a reviewer) Marc Ricordeau, Espace de Généralisation pour l'Apprentissage par Renforcement, Université de Montpellier, 16 juillet 2009.

(as a reviewer) Frédéric Maris, Planification SAT et Planification temporellement expressive : les systèmes TSP et TLP-GP, 18 septembre, université de Toulouse

(as a committee member) Nadir Karam, Agrégation de données décentralisées pour la localisation multi-véhicules, Université de Clermont Ferrant. 15 decembre 2009.

François Charpillet was a member of the following HDR committees:

(as a reviewer) Nicolas Bredeche, contributions to evolutionary design of embodied agents: from autonomous artificial creatures to self-organizing, self-adaptive swarm of embodied agents, Univ. Paris-Sud XI, INRIA, CNRS., 7 décembre 2009.

(as a reviewer) Régis Sabbadin, Modèles et Algorithmes pour la Décision Séquentielle dans l'Incertain, Université de Toulouse, 5 février 2009.

Vincent Chevrier was a member of the following PhD Committees:

(as a reviewer) Maya Ruppert, Univ Claude bernard Lyon 1, September 2009.

Joerg Hoffmann was a member of the PhD committee of Hector Palacios (Universitat Pompeu Fabra Barcelona, Spain), dec. 2009.

François Charpillet is a member of the “Specialist Committees” in INPL Nancy and UTBM Belfort.

Olivier Simonin is a member of the “Specialist Committees” in UHP.

Bruno Scherrer is a member of the “Specialist Committees” in Lille 3.

Olivier Buffet is a member of a “Specialist Committee” in UTBM.

François Charpillet is member of the AERES evaluation committee for LIRMM.

Olivier Simonin is a member of the LORIA Direction team, in charge of the scientific theme “Perception, Action, Cognition” (until 31 dec. 2009).

Olivier Simonin is a member of the “comipers-chercheur” 2010, the INRIA Nancy Grand Est examination committee for scientific employees (doctoral and post-doctoral positions).

Olivier Simonin is a member of the “operation committee” of the MIS Project (Modélisation, Interaction, Simulation) of CPER MISN INRIA & Region Lorraine.

Vincent Chevrier is member of the “comipers”, the INRIA Lorraine LORIA examination committee for scientific employees.

Olivier Buffet is a member of the “CDT”, the INRIA Nancy Grand-Est committee for technologic developments.

The modelling of rats' collective behaviour made by Vincent Thomas, Charistine Bourjot and Vincent Chevrier was part of the Philippe Thomine movie "faits comme des rats". This movie, done in the Videoscop - University Nancy 2 institution, presents the biological experiments conducted by Didier Desor regarding the specialization of groups of rats and presents the modelling of this situation made in the MAIA team.

For "Fêtes de la science 2009", an official projection of the movie has been organized where people were invited by Francois Laurent (president of Nancy-University) and Karl Tombre to discuss the movie with Didier Desor and Vincent Thomas after the projection.

LORIA - November 2008 - Maia team presented its works on decentralized platooning models and simulation tools, during the two days of Fete de la Science.

In the last few years, a group of french-speaking researchers has written an introductory book on MDPs in Artificial Intelligence. It not only covers the principles of MDPs (including reinforcement learning and POMDPs) but also popular extensions (approximate algorithms, multi-agent approaches...) and selected applications. After the publication of the french version of this book in June 2008, an english version has been prepared and should be published in 2010 . Maia team members were involved as authors in five chapters , , , , .