MAIA is a common project to INRIA, CNRS, INPL, Henri Poincaré University and Nancy 2 University through LORIA laboratory (UMR 7503). For more details, we invite the reader to consult the team web site at http://maia.loria.fr/.

MAIA
*artificial intelligence*: our goal is to model, design and simulate computer based entities (agents) that are able to sense their environment, interpret it, and act on it with autonomy.
We mainly work on two research themes: 1) stochastic models and 2) self-organization.

MAIA research covers two research themes: 1) stochastic models and 2) self-organization. This section presents the scientific foundations of these themes.

We develop algorithms for stochastic models applied to machine learning and decision. On the one hand, we consider standard stochastic models (Markov chains, Hidden Markov Models, Bayesian
networks) and study the computational problems that arise, such as inference of hidden variables and parameter learning. On the other hand, we consider the parameterized version of these
models (the parameter can be seen as a control/decision of an agent); in these models (Markov decision processes, partially observable Markov decision processes, decentralized Markov decision
processes, stochastic games), we consider the problem of a) planning and b) reinforcement learning (estimating the parameters
*and*planning) for one agent and for many agents. For all these problems, our aim is to develop algorithmic solutions that are efficient, and apply them to complex problems.

In the following, we concentrate our presentation on parameterized stochastic models, known as (partially observable) Markov decision processes, as they trivially generalize the non-parameterized models (Markov chain, Hidden Markov Models). We also outline how these models can be extended to multi-agent settings.

An agent is anything that can be viewed as sensing its environment through sensors and acting upon that environment through actuators. This view makes Markov decision processes (
**MDPs**) a good candidate for formulating agents. It is probably why MDPs have received considerable attention in recent years by the artificial intelligence (AI) community. They have
been adopted as a general framework for planning under uncertainty and reinforcement learning.

Formally, a Markov decision process is a four-tuple , where :

Sis the state space,

Ais the action space,

Pis the state-transition probability function that models the dynamics of the system.
is the probability of transitioning from
sto
given that action
ais chosen.

ris the reward function.
stands for the reward obtained from taking action
ain state
s, and transitioning to state
.

With this framework, we can model the interaction between an agent and an environment. The environment can be considered as a Markov decision process which is controlled by an agent. When,
in a given state
s, an action
ais chosen by the agent, the probability for the system to get to state
is given by
. After each transition, the environment generates a numerical reward
. The behaviour of the agent can be represented by a mapping
:
SAbetween states and actions. Such a mapping is called a policy.

In such a framework, we consider the following problems:

Given the explicit knowledge of the problem (that is
Pand
r), find an optimal behaviour,
*i.e.*, the policy
which maximizes a given performance criteria for the agent. There are three popular performance criteria to evaluate a policy:

expected reward to target,

discounted cumulative reward,

the average expected reward per stage.

Given the ability to interact with the environment (that is, samples of
Pand
robtained by simulation or real-world interaction), find an optimal behaviour. This amounts to learning what to do in each state of the environment by a trial and
error process and such a problem is usually called
*reinforcement learning*. It is, as stated by Sutton and Barto
, an approach for understanding and
automating goal-directed learning and decision-making that is quite different from supervised learning. Indeed, it is in most cases impossible to get examples of good behaviors for all
situations in which an agent has to act. A trade-off between exploration and exploitation is one of the major issues to address.

Furthermore, a general problem, which is useful for the two previous problems, consists in finding good representations of the environment so that an agent can achieve the above objectives.

In a more general setting, an agent may not perceive the state in which he stands. The information that an agent can acquire on the environment is generally restricted to
*observations*which only give partial information about the state of the system. These observations can be obtained for example using sensors that return some estimate of the state of
the environment. Thus, the decision process has hidden state, and the issue of finding an optimal policy is no more a Markov problem. A model that describes such an hidden-state and
observation structure is the
**POMDP**(partially observable MDP). Formally, a POMDP is a tuple
where

S,
A,
Pand
rare defined as in an MDP.

is a finite set of observations.

Ois a table of observation probabilities.
is the probability of transitioning from
sto
on taking action
ain
swhile observing
o. Here
s,
,
aA,
o.

Hidden Markov Models are a particular case of POMDP in which there is no action and no reward. Based on the mathematical framework, several learning algorithms can be used in dealing with
diagnosis and prognosis tasks. Given a proper description of the
*state*of a system, it is possible to model it as a Markov chain. The dynamics of the systems is modeled as
*transition probabilities*between states. The information that an external observer of the system can acquire about it can be modeled using
*observations*which only give partial information on the state of the system. The problem of
*diagnosis*is then to find the most likely state given a sequence of observations.
*Prognosis*is akin to predicting the future state of the system given a sequence of observation and, thus, is strongly linked to diagnosis in the case of Hidden Markov Model. Given a
proper corpus of diagnosis examples, AI algorithms enable the automated learning of an appropriate Hidden Markov Model that can be used for both diagnosis and prognosis. Rabiner
gives an excellent introduction to HMM and
describes the most frequently used algorithms.

While substantial progress has been made in planning and control of single agents, a similar formal treatment of multi-agent systems is still missing. Some preliminary work has been reported, but it generally avoids the central issue in multi-agent systems: agents typically have different information and different knowledge about the overall system and they cannot share all this information all the time. To address the problem of coordination and control of collaborative multi-agent systems, we are conducting both analytical and experimental research aimed at understanding the computational complexity of the problem and at developing effective algorithms for solving it. The main objectives of the project are:

To develop a formal foundation for analysis, algorithm development, and evaluation of different approaches to the control of collaborative multi-agent systems that explicitly captures the notion of communication cost.

To identify the complexity of the planning and control problem under various constraints on information observability and communication costs.

To gain a better understanding of what makes decentralized planning and control a hard problem and how to simplify it without compromising the efficiency of the model.

To develop new general-purpose algorithms for solving different classes of the decentralized planning and control problem.

To demonstrate the applicability of new techniques to realistic applications and develop evaluation metrics suitable for decentralized planning and control.

In formalizing coordination, we take an approach based on distributed optimization, in part because we feel that this is the richest of such frameworks: it handles coordination problems in which there are multiple and concurrent goals of varying worth, hard and soft deadlines for goal achievement, alternative ways of achieving goals that offer a trade off between the quality of the solution and the resources required. Equally important is the fact that this decision-theoretic approach allows us to model explicitly the effects of environmental uncertainty, incomplete and uncertain information and action outcome uncertainty. Coping with these uncertainties is one of the key challenges in designing sophisticated coordination protocols. Finally, a decision-theoretic framework is the most natural one for quantifying the performance of coordination protocols from a statistical perspective.

As far as stochastic planning is concerned, since the mid-1990s, models based on Markov decision processes have been increasingly used by the AI research community, and more and more
researchers in this domain are now using MDPs. In association with the
*ARC INRIA LIRE*and with P. Chassaing of the OMEGA project, our research group has contributed to the development of this field of research, notably in co-organizing workshops for the
AAAI, IJCAI and ECAI conferences. We also maintain vivid collaborations with S. Zilberstein (on two NSF-INRIA projects) and with NASA (on a project entitled ``Self-directed cooperative
planetary rovers'') in association with S. Zilberstein and V. Lesser of the University of Massachusetts, E. Hansen of the Mississippi State University, R. Washington now at Google and A.-I.
Mouaddib of CRIL, Lens.

We have been using the strengths of the basic theoretical properties of the two major approaches for learning and planning that we follow, to design exact algorithms that are able to deal with practical problems of high complexity. Instances of these algorithms include the JLO algorithm for Bayesian networks, the Q-learning, TD( ) and Witness algorithms for problems based on the Markov decision process formalism, etc. While it is true that the majority of this work has been done in the United States, the French research community is catching up quickly by developing further this domain on its own. MAIA has been involved directly in making substantial contributions to this development, notably through our active participation in the (informally formed) group of French researchers working on MDPs. Thus, today there is a growing number of research labs in France with teams working on MDPs. To name a few, Toulouse-based labs such as IRIT, CERT, INRA, LAAS, etc., the GREYC at Caen, and certain Paris-based researchers such as R. Munos (Polytechnique) and O. Sigaud (Paris VI).

Most of the current work is focused on finding approximate algorithms. Besides applying these algorithms to a multi-agent system (MAS) framework, we have also been focusing on reducing the complexity of implementing these algorithms by making use of the meta-knowledge available in the system being modeled. Thus in implementing the algorithms, we seek temporal, spatial and structural dynamics or functions of the given problem. This is time-effective in finding approximate solutions of the problem. Moreover, we are seeking ways to combine rigorously these two forms of learning, and then to use them for applications involving planning or learning for agents located in an environment.

One of the research themes of the MAIA project is that of collective intelligence. Collective intelligence concerns the design of reactive multi-agent systems to collectively solve a problem. Reactive systems made up of simple-behavior agents with decentralized control that despite their individual simplicity are able to collectively solve problems whose complexity is beyond the scope of individuals: ``intelligence'' of the system can be envisaged as a collective property.

One of the difficulties in the design of reactive multi-agent systems is to specify simple interactions between agents and between them and their environment so as to make the society be able to fulfill its requirements with a reasonable efficiency. This difficulty is proportional to the distance between the simplicity of individuals and the complexity of the collective property.

We are interested in the design of such systems by the transposition of natural self-organized systems.

Reactive multi-agent systems are characterized by decentralized control (no agent has a knowledge of the whole system) and simple agents that have limited (possibly no) representation of themselves, of the others, and of the environment. Agent behaviors are based upon stimulus-response rules, decision-making is based on limited information about the environment and on limited internal states, and they do not refer to explicit deliberation.

Thus the collective complexity that is observed comes out of the individual simplicity and is the consequence of successive actions and interactions of agents through the environment. Such systems involve two levels of description: one for individual behavior (with no reference to the global phenomena) and one to express collective phenomena.

The design problem can be summarized as the two following questions:

Considering a global desired property or behavior, how to build individual behaviors and system dynamics in order to obtain it?

Considering a set of individual behaviors and a system dynamics, how to predict (or guarantee) the global property?

Such a methodology is still missing and we contribute to this goal. We organize our research in three parts:

understanding collective intelligence by studying examples of such (natural) systems,

transposing principles found in example systems to solve problems, and

providing a framework to help analyze and formalize such systems.

The first part is to model existing self-organized phenomena and thus have a better understanding of the underlying mechanisms. For instance, social phenomena in biology provide many examples in which a collection of simple, situated entities (such as ants) can collectively exhibit complex properties which can be interpreted as a collective response to an environmental problem. We have worked with biologists and provided several models of self organized activities in case of spiders and rats.

Since individual models and system dynamics are established, the second part consists in transposing them in order to solve a given problem. The transposition corresponds to encode the problem such as to be an input for the swarm mechanism ; to adapt the swarm mechanism to the specificities of the problem, and if necessary to improve it for efficiency purpose ; and then to interpret the collective result of the swarm mechanism as a solution of the problem.

The third part aims at providing a framework to face the following issues:

Is it possible to describe such mechanisms in order to easily adapt and reuse them for several different instances of the problem (
*generic or formal description*)?

If such a generic description of a system is available, is it possible to assess the behaviour of the system in order to derive properties that will be conserved in its
instantiations (
*analyze and assessment of system*)?

Among the two principal approaches to the study of multi-agent systems (MAS), we have chosen the line of ``collective'' systems which emphasizes the notions of interactions and organization. This choice is reflected in the numerous collaborations that we have undertaken with researchers of this field as well as in the kinds of research groups we associate and work with:

the AgentLink community in Europe, especially the members interested in self-organization, and

the research group ``Colline'' (under the aegis of GDR I3 and the AFIA) since 1997.

The approach that we have adopted for the design of multi-agent systems is based on the notion of self-organization, and it notably also includes the study of their emerging properties. If the research community working in this specific sub-domain is even smaller, it is growing interestingly, especially through the work being done at IREMIA (at the University of Réunion), at IRIT (Toulouse), at LIRIS (Lyon), at LIRMM (Montpelier) and in certain other laboratories of USA (D. Van Parunak, R. Brooks for example) and Europe F. Zambonelli (University of Modena, Italy), P. Marrow (British Telecom ICT Research Centre, UK), G. Di Marzo Serugendo (University of Geneva, Switzerland), etc.

Some of these researchers have taken inspiration from biological models to envisage the emerging properties. Principally, this current work is inspired by ant-colony models (such as at LIP6 and LIRMM in France or at the IRIDIA of Brussels in Belgium). We consider the use of the models such as the spider colonies or the groups of rats as an original contribution from us toward this study, it having never been utilized before. It must be mentioned that this field has been influenced to a considerable extent by the work of J.-L. Deneubourg of CENOLI (Brussels) which concerns phenomena involving self-organization in such colonies and the mechanisms of interaction by pheromones in ant-colonies.

In order to carry on its basic research program, the MAIA team has developed and is developing a strong known-how in sequential or distributed decision making. In particular, mathematical tools such as Markov decision processes, hidden Markov models or Bayesian Networks are appropriate and are used by the team for the development of real applications such as:

monitoring the hydration state of patients suffering from kidney disease.

Through ``Dialhemo'' (see Sec. ), the Maia team helps physicians to monitor patients by using stochastic models.

elderly fall prevention.

The PréDICA project (see Sec. ) illustrates the use of particular filtering to detect loss of autonomy for elderly people.

Coordination of intelligent vehicles.

Either flying drones (see Sec. ) or electric cars (see Sec. )

collaborative filtering.

Strong industrial interest, has shown by the various collaborations with ``Crédit Agricole'' (see Sec. ), Technoscope (see Sec. ), e-veille (see Sec. ), ESA (see Sec. )

Analysis of medical signals.

With Cardiabase (see Sec. ), continuous stochastic models allow to classify ECG recordings.

Bayabox is a toolbox for developping Bayesian networks applications in java. It supports algorithms for exact inference and parameter learning in directed graphicals models with discrete or continuous Gaussian variables. Baiabox is used in the Transplantelic project (see Sec. ).

*Availability*: Not distributed.

*Contributors*: Cherif Smaili and Cédric Rose and François Charpillet.

*Contact*: francois.charpillet@loria.fr

The Dialhemo project has the objective to develop a remote surveillance and telediagnosis system adapted to renal insufficiency patients treated by hemodialysis. The main objective is to insure people who are treated either at home, or in self-dialysis centers, the same level of security as in hospital. A first software developed in cooperation with Diatelic SA, Gambro and ALTIR is currently experimented in several sites. About 150 patients currently benefit of this first system.

*Availability*: distributed by Diatelic SA

*Contributors*: Cedric Rose, François Charpillet

*Contact*: francois.charpillet@loria.fr

FiatLux is a cellular automata simulator that allows the user to experiment with various models and to perturb them. These perturbations can be of two types. On the one hand, perturbations of dynamics change the type of updating, for example from a deterministic parallel updating to an asynchronous random updating. On the other hand, the user may perturb the topology of the grid by removing links between cells randomly.

FiatLux may be run in an interactive mode with a Graphical User Interface or in a batch mode for longer experiments. The interactive mode is suited for small size universes whereas the batch may be used for experiments involving several thousands of cells. The software uses two external libraries for the random generator and the real-time observations of variables ; it is also fitted with output procedures that writes in Gnuplot, Tex, HTML formats. The software is currently evolving towards the simulation of models of multi-agent systems.

*Availability*: Download it at
http://nazim.fates.free.fr/Logiciel.htm

*Contributors*: Nazim Fatès

*Contact*: Nazim.Fates@loria.fr

The key word for our recent work on stochastic models is ``distributed''. In term of decentralized control, we have developed exact and approximate methods for the Decentralized Partially Observable Markov Decision Processes framework (DEC-POMDP) and investigated the use of game theory inspired concepts for learning to coordinate. We have also unvailed strong links between optimal and harmonic control and discussed some implications of this link for the distributed computation of optimal trajectories. Several problems related to distributed knowledge have also been looked at, mainly through a combination of collaborative filtering and behavior modeling.

There is a wide range of application domains in which decision-making must be performed by a number of distributed agents that try to achieve a common goal. This includes information-gathering agents, distributed sensing, coordination of multiple distributed robots, decentralized control of a power grid, autonomous space exploration systems, network traffic routing, decentralized supply chains, as well as the operation of complex human organizations. These domains require the development of a strategy for each decision maker assuming that decision makers will have limited ability to communicate when they execute their strategies, and therefore will have different knowledge about the global situation.

Our research team is focusing on the development of a decision-theoretic framework for such collaborative multi-agent systems. The overall goal is to develop sophisticated coordination strategies that stand on a formal footing. This enabled us to better understand the strengths and limitations of existing heuristic approaches to coordination and, more importantly, to develop new approaches based on these more formal underpinnings. One important result is that we are showing that the theory of Markov Decision Processes is particularly powerful in this context. In particular, we are extending the MDP framework to problems of decentralized control.

By relying on concepts coming from the Decision Theory and Game Theory, we have proposed some algorithms for decentralized stochastic models. These new results are related to both planing and learning. This work is supported partly by the INRIA associated team Umass with S. Zilberstein.

Solving DEC-POMDPs can be separated into two categories. If the underlying model of the system is known in advance, the optimal solution can be planned prior to execution in a centralized way. We introduce two new planning algorithms.

The first one is a point-based multi-agent dynamic programming approach, which constitutes a synthesis of classical multi-agent dynamic programming, and point-based dynamic programming for single-agent POMDPs. Our approach is hence able to concentrate the computational effort on the relevant regions of the policy space , .

The second approach is an entirely new way of applying heuristic search techniques, such as A*, to decentralized decision problems. We introduce multi-agent A* (MAA*), the first heuristic search algorithm for solving DEC-POMDPs, both over finite and infinite horizons .

If the underlying model is not known, an optimal policy can be obtained by a trial-and-error approach based on reinforcement learning methods. We analyze the additional constraints in multi-agent learning vs. planning, before introducing a new multi-agent reinforcement learning algorithm based on mutual notifications of changes in the value function.

It is nevertheless interesting to take advantage of reinforcement learning techniques to build egoistic agents and to adapt these agents to produce collective behavior. To do so, we proposed a new formalism, the interac-DEC-POMP, a generalization of DEC-POMPD where direct interactions among agents are explicitly defined. On the basis of this formalism, we proposed a new decentralized learning algorithm where agents learn to act and to interact with others. This learning algorithm is based on distributed reinforcement learning techniques taking advantage of the formalization of direct interaction and uses heuristics for distributing the rewards among agents during these interactions. These heuristics distribute the collective task among the agents and leads to altruistic behavior among interacting agents. This algorithm manages to produce adaptive collective behaviour as shown by the experiments we undertook and opens new perspectives concerning the automatic computation of interactions and collective reinforcement learning techniques.

Decentralized Reinforcement Learning, so as to allow Multi-Agent Systems to learn to coordinate, can be studied in the general framework of Game Theory.

Whereas the notion of Nash equilibrium is widely used to derive learning algorithms, one major difficulty to overcome is the computation of all the Nash equilibria of a game when the set of possible strategies is big. In , we proposed a formulation of a general-sum bimatrix game as a bipartite directed graph with the objective of establishing a correspondence between the set of the relevant structures of the graph (in particular elementary cycles) and the set of the Nash equilibria of the game. We showed that finding the set of elementary cycles of the graph permits the computation of the set of equilibria. For games whose graphs have a sparse adjacency matrix, this serves as a good heuristic for computing the set of equilibria. The heuristic also allows the discarding of sections of the support space that do not lead to any equilibrium, thus serving as a useful preprocessing step for algorithms that compute the equilibria through support enumeration.

How can autonomous
*independent*agents learn strategies that can achieve Pareto efficiency? This problem is motivated by the fact that looking for Nash equilibrium can sometimes lead to suboptimal
behaviors and is often not compatible with the fact that agents have a limited knowledge of the problem or of other agents.

In the more specific case of imperfect monitoring (as in partially Observed Stochastic Games), learning should focus on Sequential Equilibrium as relying on the Nash Equilibrium of the normal form of the game can be misleading.

We consider the problem of learning strategy selection in games. The theoretical solution to this problem is a distribution over strategies that respond to a Nash equilibrium of the game. When the payoff function of the game is not known to the participants, such a distribution must be approximated directly through repeated play. Full knowledge of the payoff function, on the other hand, restricts agents to be strictly rational. In this classical approach, agents are bound to a Nash equilibrium, even when a globally better solution is obtainable. In , we presented an algorithm that allows agents to capitalize on their very lack of information about the payoff structure to be Pareto efficient. The principle we propose is that agents resort to the manipulation of their own payoffs, during the course of learning, to find a ``game'' that gives them a higher payoff than when no manipulation occurs. In essence, the payoffs are considered as an extension of the strategy set. At all times, agents remain rational vis-à-vis the information available. In self-play, the algorithm affords a globally efficient payoff (if it exists).

Previous work done in collaboration with O. Buffet

B. Girau (CORTEX Team, LORIA) is an external collaborator.

Artificial intelligence (AI) is a domain that deals with intelligent behavior, learning, and adaptation in machines. AI is currently made of many sub-disciplines that focus on specific problems: planning, multi-agent systems, neural networks, vision and so on. The following works are related to the belief that AI is fundamentally a control problem, and that the optimal control framework may play a more and more central role in AI. As we describe in more details below, we have showed this year that the optimal control framework allows to have a deeper understanding of some distributed AI approaches (like ant algorithms and neural networks) and of a planning technique known as harmonic control.

It has been known for decades that optimal control can be solved through parallel and distributed algorithms. Such a fact allows one to understand some AI distributed approaches such as ant algorithms and neural networks:

In , we built a simple ant model that solves a discrete foraging problem. We described simulations and provided a complete convergence analysis: we showed that the ant population computes the solution of some optimal control problem and converges in some well defined sense. We discussed the rate of convergence with respect to the number of ants: we gave experimental and theoretical arguments that suggest that this convergence rate is superlinear with respect to the number of agents. Such strong analytical results are rare in the multi-agent literature.

Optimal control and harmonic control have traditionally been considered as unrelated alternatives for trajectory planning and control. We have showed that they are in fact deeply related. We provide in formal evidence, in continuous domain and in a standard discretization, that harmonic control is the limit case of a some optimal control problem in which we make the noise level tend to infinity. In other words we have showed that optimal control subsumes harmonic control. This gives more insight into what harmonic control is. Also, we believe that this might be of interest to the many practitioners of harmonic control in the robotics community. In , we have begun to work on the implementation of harmonic control on an embedded massively parallel hardware architecture: we solve the navigation problem that computes trajectories along a harmonic potential, using an FPGA implementation. This architecture includes the iterated estimation of the harmonic function. The goals and obstacles of the navigation problem may be changed during computation. The trajectory decision is also performed on-chip, by means of local computations of the preferred direction at each point of the discretized environment. The proposed architecture uses a massively distributed grid of identical nodes that interact with each other within mutually dependant serial streams of data to perform pipelined iterative updates of the local harmonic function values until global convergence. In the future we will investigate how its more general form, optimal control, may be implemented on similar architectures.

J.-C. Lamirel and R. Kassab (CORTEX team, LORIA), A. Brun (PAROLE team, LORIA) are external collaborator for this action.

The amount of data exponentially increases on the Internet and it is becoming more and more difficult to extract the most relevant information within a very short time. Recommender systems support decision processes for identifying pertinent and reliable items among a huge collection of available resources. The interest in such systems has dramatically increased due to the demand of personalization technologies by for example e-commerce applications or information retrieval services. Approaches to solving recommendation problems can be classified as content based, knowledge based, demographic, utility based or collaborative, etc. Our work mainly relies on collaborative filtering processes even though we also investigate other approaches such as combining it with content based analysis. In this case, recommender systems help users to find interesting items by modeling their preferences and by comparing them with users having the same tastes. Nevertheless, there are a lot of aspects to consider when implementing such a recommender system such as scaling problems, privacy, sparsity of data, quality of prediction or security and trust.

The term ``collaborative filtering'' denotes techniques using the known tastes of a group of users to predict the unknown preference of a new user. The aim of such algorithms is to predict the interest of a resource for a given user, given his past consultations. In practical terms, it amounts to identifying the active user to a set of people having the same tastes, based on his past actions and his known preferences. This starts from the principle that people who liked the same items have the same topics of interests. Thus it is possible to predict the relevancy of data of the active user by taking advantage of experiences of a similar population. Current collaborative filtering processes are mostly centralized. The scientific problems we address have consisted in finding a way to distribute the calculation, in order to cater for several ten of thousands of people, and to preserve anonymity of users (personal data remain on client side). Because of the industrial context (collaboration with a company called ASTRA specialized in satellite website broadcasting), we consider the situation where the set of users is relatively stable, whereas the set of items may vary considerably from an execution to another. We proposeed a new generic model based on a client/server user-based collaborative filtering algorithm and a behavior modeling process . Our solution is particularly designed to address the issues of data sparsity, privacy and scalability .

We also transposed our algorithm to P2P architectures and within this framework we propose to adapt the computation of prediction to the density of neighborhood of the active user and propose an algorithm to prevent collaborative filtering from malicious attacks.

A. Brun (PAROLE Team, LORIA) is an external collaborator.

For several months we have been investigating how to build a statistical grammar of usage based on the analogy with statistical language modeling. Statistical language models have proved their efficiency in domains such as automatic speech recognition, optical character recognition, natural language processing, etc. We explore the use of the most well-known statistical language models such as n-grams, triggers, etc. to improve the quality of predictions by introducing the notion of sequentiality in resource consultations .

There are many situations which require us to deal with strongly interacting, massively parallel and decentralized systems. This is what brought us to work in the field of self-organized systems. These systems are described by various formal models such as reactive multi-agent systems or cellular automata. The work of the team mixes both theoretical and experimental approaches and seeks to provide applications in the field of image processing, localization and tracking, and bio-inspired problem solving.

C. Bernon (IRIT, Univ. Paul Sabatier TOULOUSE), V. Hilaire (SET, UT-Belfort Montbéliard) and P. Marrow (British Telecom ICT Research Centre UK) are external collaborators for this action.

A lot of work is devoted to formalizing and devising architectures for agents' cooperative behaviour, for coordinating the behaviour of individual agents within groups, as well as to designing agent societies using social laws. However, providing agents with abilities to automatically devise societies so as to form coherent emergent groups that coordinate their behaviour via social laws, is highly challenging. These systems are called self-organized. We are beginning to understand some of the ways in which self-organized agent systems can be devised. Inside the Technical Forum Group on ``Self-organization'' (see Sec. ), we proposed several criteria to analyze self-organized systems and used them to categorize several examples of multi-agent systems in which self-organization, based on different mechanisms, is used to solve complex problems and used several of the proposed criteria in order to compare the self-organization mechanisms of different applications.

F. Gechter (SeT, UT-Belfort-Montbéliard) is an external collaborator for this action.

J.-P. Mano (IRIT, Univ. Paul Sabatier, Toulouse), G. Lopardo (Agents Reasearch Lab, Univ.of Girona, Italy) and P. Glize (IRIT, Univ. Paul Sabatier, Toulouse) are external collaborators for this action.

Self-organization is a growing interdisciplinary field of research about a phenomenon that can be observed in the universe, in nature and in social contexts. Research on self-organization tries to describe and explain forms, complex patterns and behaviors that arise from a collection of entities without an external organizer. As researchers in artificial systems, our aim is not to mimic self-organizing phenomena arising in nature, but to understand and to control underlying mechanisms allowing desired emergence of forms, complex patterns and behaviors. Rather than attempting to eliminate such self-organization in artificial systems, we think that this might be deliberately harnessed in order to reach desirable global properties. In a published paper we analyzed three forms of self-organization: stigmergy, reinforcement mechanisms and cooperation. The amplification phenomena founded in stigmergic process or in reinforcement process are different forms of positive feedbacks that play a major role in building group activity or social organization. Cooperation is a functional form for self-organization because of its ability to guide local behaviors in order to obtain a relevant collective one. For each forms of self-organization, we present a case study to show how we transposed it to some artificial systems and then analyzes the strengths and weaknesses of such an approach.

Ant algorithms are one of the main programming paradigms in Swarm Intelligence. They are built on stochastic decision functions, which can also be found in other types of bio-inspired algorithms with the same mathematical form, as in the modeling of the behavior of social insects for example. However, though this modeling leads to high-performance algorithms, some phenomena, like symmetry breaking, are still not well understood or modeled at the ant level, especially in the double bridge experiment. We propose an original analysis of the problem with a reactive multi-agent system based on logistic nonlinear decision maps ; it is designed according to the influence-reaction scheme. Our proposition is an entirely novel approach to the mathematical foundations of ant algorithms: contrary to the current stochastic approaches, we show that an alternative deterministic model exists, which has its origin in deterministic chaos theory. The rewriting of the decision functions leads to a new way of understanding and visualizing the convergence behavior of ant algorithms.

Cellular automata can be seen as the environment part of a multi-agent system. Formally, they are discrete dynamical systems and they are widely used to model natural systems. Classically
they are run with perfect synchrony;
*i.e.*, the local rule is applied to each cell at each time step. A possible modification of the updating scheme consists in applying the rule with a fixed probability, called the
synchrony rate. It has been shown in a previous work that varying the synchrony rate continuously could produce a discontinuity in the behaviour of the cellular automaton. In
we investigate the nature of this change of
behaviour using intensive numerical simulations. We applied a two-step protocol to show that the phenomenon is a phase transition whose critical exponents are in good agreement with the
predicted values of directed percolation.

Our research is currently evolving towards the study of this phase transition phenomenon in the context of bio-inspired computing. Indeed, we wish to examine to which extent the phase
transition phenomenon could be applied for describing the brutal change of behaviour in cellular societies such as the
*Dictyostelium Discoidum amoebae*.

One of the main approach to deal with Swarm Intelligence is to mark the environment in order to use it as a common memory (pheromones deposit by ants is a well-known example of such an indirect way of communication). Inspired by such biological solutions we develop environment-based algorithms to deal with situated and distributed problems. We explore the use of digital marks, local signals to express artificial potential fields (see Sec. ) and sensor networks. This work focuses on three main problems: coordination of mobile robots/drones (see Sec. ), exploration task and optimal path planning in grid worlds, collective forraging (research and transport of resources by a set of autonomous agents/robots).

The proposed models relies on reactive agents,
*i.e.*, having no individual memory nor direct communication abilities. They are just able to write/read on their environment or to broadcast signals. Concerning exploration and
forraging tasks in unknown environment, we proposed an optimal solution based on digital marks, which is more efficient than pheromones-based approaches (results submit to ACM TAAS Journal).
This work is done in collaboration with E. Thierry from LIP, ENS Lyon.

In 2006, we began a collaboration with the team called ``Veille technologique'' of the company ``Crédit Agricole SA''. The aim of the collaboration is to investigate the benefits of collaborative filtering and social navigation in the framework of an intranet platform. First, we implement our client-server user-based collaborative algorithm within the information system of the company and therefore realize a proof of concept in an industrial context . We are now exploring new algorithms relying on the use of the tags let by users during the navigation process.

A. Brun (PAROLE team, LORIA) is an external collaborator for this action.

Technoscope is a press agency specialized in scientific domains. The objective of the collaboration, which began in 2006, is the improvement of the delivery process of information to the main clients of the company (it means international, national or regional media). When a document is produced by a journalist and put on the Technoscope website, the system we design has to determine to whom it has to be pushed. The approach relies both on collaborative filtering and statistical language models, in order to integrate within the model the notion of sequentiality of consultations.

Anne Boyer and Sylvain Castagnos decided to create a company in order to exploit in industrial contexts the various algorithms based on collaborative filtering they have designed. This
project called
*e-veille*has been the 2006 laureate of the national contest ``creation of innovative companies'' in the category ``Emergence''. This contest is organized by the French ``ministère de la
Recherche'' with the help of OSEO ANVAR. We plan to create the company at the beginning of 2007 and at this time, we are hosted by the ``incubateur lorrain''.

CardiaBase's core business is central reading of cardiac data (ECGs and holsters), in order to assess the cardiac consequences (side effects) of new drugs. The evolution of medical knowledge and the release of tighter regulatory guidelines demand more and more intensive controls of cardiac data to guarantee drugs safety, and notably :

assessing the ECG evolution of patients when under a new drug,

quickly receiving alerts in case of severe ECG abnormalities,

storing all the ECGs of a patient and tracking (comparing) any changes over time.

CardiaBase provides interpretation which are based on reading methods performed by trained cardiologists. In order to improve the process, we have developed a new approach to automatic ECG segmentation based on hierarchic continuous density hidden Markov models . We applied a wavelet transform to the signals in order to highlight the discontinuities in the modeled ECGs. A training base of standard 12-lead ECGs segmented by cardiologists was used to evaluate the performance of our method. We used a Bayesian HMM clustering algorithm to partition the training base, and we improved the method by using a multi-model approach. The developed software is on the verge of being integrated in the Cardiabase software.

We continue to develop telemedicine solutions for End Stage Renal Chronic patients. Transplantelic, is a telemedicine project with aims at improving the follow up for patients with kidney graft. A new system is being developed and a clinical trial in a three year project is scheduled. Transplantelic just started in the beginning of 2006 and it is funded both by Region Lorraine and ARH. We have developed this year a new expert system using Baiabox (see Sec. )for the surveillance of patients with graft kidney.

The main objective of this project is to create a coordination mechanism for Unmanned Aerial Vehicles for surveillance. The focus is on designing individual behaviors for every vehicle in order for them to develop auto-organization capabilities at the level of the group.

This year we obtained some results on what is called multi-agent patrolling using either Markov decision processes or ant algorithms and wrote a survey of the state of the art.

The program PréDICA is related to the theme of falls in the elderly, with aspects related to both prediction and detection included. The program is a continuation of the exploratory project PARAChute, which was financed by the RNTS in 2003, which included only those aspects related to fall prediction:

Definition of the characteristic parameters of a static balance ``signature'' using the stabilogram analysis produced by a personal scale.

Analysis of typical gait using a camera without images.

Given that the end result of an exploratory project is to demonstrate the feasibility of a method and/or a technology, the PARAChute project attained its aims quite well:

Realization of a prototype to analyst static balance.

Development of an algorithm to track movement during gait.

The results arising from this exploratory project enable a change into a pre-competitive stage for the technologies developed, as well as to incorporate a multi-center evaluation in order to validate the parameters extracted from the balance and gait signatures. It seems that the transformation into a pre-competitive project is the ``raison d'être'' of a completed exploratory project.

The underlying rationale for the PARAChute research project remains the same as when the initial project was proposed. In addition, the heat wave of summer 2003 in France provided further evidence were any needed of the severity of the problem, in particular the need for innovative approaches in terms of non-intrusive observation of the elderly in their daily environment.

Intelligent Autonomous Vehicles currently hold the attention of many researchers because they can bring solutions to many applications related to transport of passengers in urban
environments. An example of such a vehicle is the Cycab. The Mobivip project reach several goals in the domain of mobility services

Outdoor positioning and navigation systems often rely on road map database and GPS. However, GPS suffers from satellite masks occurring in urban environments, under bridges, tunnels or in forests. GPS appears then as an intermittently-available positioning system that needs to be backed up by other localization sensors. In order to obtain an accurate positioning and improve position tracking process of the Cycab, the MAIA team propose to augment the road map database by a 3D model of the Cycab environment geo-localized (matched) on the digital road map. The accuracy of DGPS geo-localization of the Cycab on the map database is not sufficient for autonomous navigation. The idea is to improve the metric localization provided by such a system to centimetric localization accuracy by using a 3D model which have a centimetric geo-accuracy. Sensors and information sources used for this task are GPS, inertial central, stereo-vision, Laser range sensor and road map database managed by a Geographical Information System. The approach of position tracking under study at MAIA project is based on the use of Particle Filter and Extended Kalman Filter (El Najjar 2004, Gustafsson 2002) for multi-sensor fusion, Belief theory and Hidden Markov Model for Road Reduction Filter .

Recent links between image analysis and image synthesis show that many problems in image analysis can be addressed as optimization problems. These new trends lead the scientific community to introduce bio-inspired optimization techniques (ant algorithms, artificial evolution, swarm intelligence, social spiders ...) as problem solving methods in image analysis. The fly algorithm is a stereo 3-D reconstruction algorithm based on an evolutionary strategy that explores the 3-D space searching for the set of points that best describe the scene. Objects in the scene are represented by a set of points (the flies) that are subject to evolution. Best fit flies, ones which position reflects the true positions of obstacles in the scene, are selected and survive throughout the generation. The algorithm has been implemented on a CyCab, an electric vehicle designed by INRIA, and applied to obstacle detection. When an obstacle is detected by the fly algorithm within a minimum safety distance from the vehicle an alarm is raised.

A partnership with the company SES ASTRA and the Centre Henry Tudor of Luxembourg via the ESA Sat'nSurf project allows to use their database and to carry out life-sized tests in order to verify efficiency of proposed solutions in the field of client-server distributed collaborative filtering with real users. The SES ASTRA company has finalized a system sponsored by advertisement and supplying to users a high bandwidth access to hundreds of web sites for free. This project aims at highlighting the benefits of collaborative filtering by including such a module in the architecture of their product. Our problem has been to scale this up to handle hundreds thousands of people, while preserving anonymity of users (personal data remain on client side). Thus, we use an existing clustering method, that we have improved so that it is distributed respectively on client and server side . Nevertheless, in the absence of numerical votes for marketing reasons, we have chosen to do an innovative combination of this decentralized collaborative filtering method with a user profiling technique. We have also been submitted to constraints such as a short answer time on client side, in order to be compliant with the ASTRA architecture.

MAIA is member of AgentLink that is the European Commission's IST-funded Coordination Action for Agent-Based Computing

Over the past years, a fruitful research collaboration has been established between MAIA and the RBR group at the University of Massachusetts, directed by S. Zilberstein. The collaboration was conceived at a meeting that took place in 1995 at the International Joint Conference on Artificial Intelligence in Montreal. During this meeting, we identified a high degree of overlap between our interests, research projects, and solution techniques. These common interests relate to the development of planning and monitoring techniques for autonomous systems that can operate in real-time and can cope with uncertainty and limited computational resources. At that time, the U.S. team investigated a solution technique based on ``anytime algorithms'' and our team investigated the ``progressive processing'' model.

Since then, we have worked together on both of these models and exploited the synergy to improve their applicability and effectiveness. This year this collaboration has been funded by INRIA as an associated team. This association of the two research teams has focused on the development of a decision-theoretic framework for planning and control of collaborative multi-agent systems by formalizing the problem as decentralized control of a Markov process. The overall goal is to develop sophisticated coordination strategies that stand on a formal footing. This enables us to better understand the strengths and limitations of existing heuristic approaches to coordination and, more importantly, to develop new approaches based on these more formal underpinnings. There is a wide range of application domains in which decision-making must be performed by a number of distributed agents that are trying to achieve a common goal. This includes information-gathering agents, distributed sensing, coordination of multiple distributed robots, decentralized control of a power grid, autonomous space exploration systems, as well as the operation of complex human organizations. These domains require the development of a strategy for each decision maker assuming that decision makers will have limited ability to communicate when they execute their strategies, and therefore will have different knowledge about the global situation.

This project is funded by STIC-Asie Program for a duration of two years. The partners are from Vietnam (IFI, centre MICA, CARGIS), China (LIAMA), Cambodia (ITC) and France (IRD, LRI-Paris Sud, MAIA-LORIA, IGN). The first meeting took place at Hanoi in November.

The project is in the context of a developing country and under its economical constraints. It aims at developing the technology supporting district-level decision-making in case of disasters using :

Teams of simple, cheap, communicating, ground and aerial self-organized robots dedicated to the gathering of information, with a variety of sensors, on damaged sites.

A data fusion system, to which the robots transmit their perceptions, which is specialized and trained for extracting relevant semantic information from them.

A 3D GIS that supports a simulation of a district, used by local decision-makers to monitor the progress of the robots, the extent of the damages, and assign new targets to the robots.

Shlomo Zilberstein from Umass University at Amherst (USA) came for a week in November

Martin Allen from Umass University at Amherst (USA) came for six weeks as an INRIA Internship

MAIA is a leading force in the
*PDMIA*group (Processus Decisionnels de Markov et Intelligence Artificielle) and played a great part in the annual meeting of the group. This year, the group annual meeting turned into
a full scale conference in Toulouse (JFPDA'06) where people from the
*planing*community exchanged with people from
*reinforcement learning*.

Vincent Chevrier is a member of:

the editorial board of Interstices

advisory board of EUMAS, the European Workshop on Multi-Agent Systems

the program committee of MA4CS'06, Multi-Agents for modeling Complex Systems, Oxford, September 2006.

Vincent Chevrier is the general chair of the JFSMA06, the French spoken conference on multi-agent systems.

Vincent Chevrier did a talk in ``Institut Henri Poincaré (Paris)'' as an ``avant première'' for a special issue of ``Pour La Science'' on the computer modeling for real phenomena.

Christine Bourjot and Vincent Chevrier are members of the working group ``Colline'' (AFIA, GDR I3).

Vincent Chevrier was reviewer in the following PhD Committees:

Atmane Hamel, ``Conception participative et coopérative de simulations multi-agents : application à la filière avicole'', the 14 March 2006, Université Paris-Dauphine,

Jean-Pierre Mano, ``Etude de l'émergence fonctionnelle au sein d'un réseau de neuro-agents coopératifs'', May 2006 Université Toulouse III.

Frédéric Armetta, ``Proposition d'une approche auto-organisationnelle pour le partage de ressources critiques'', 8 December 2006, Université Lyon 1,

Gireg Desmeulles, ``Réification des interactions pour l'expérience in virtuo de systèmes biologiques multi-modèles'', Université Bretagne Occidentale, 11 December 2006

François Charpillet was a member of the following conference committees:

Ninth International Symposium on Artificial Intelligence and Mathematics , January, 2006,

``13èmes Rencontres INRIA : Industrie en association avec l'Inserm. Les Sciences et Technologies de l'Information et de la Communication au service de la Médecine'', Rocquencourt, 24th of January 2006,

The Twenty-first National Conference on Artificial Intelligence (AAAI-06) July 16/20, 2006, Boston, Massachusetts,

The fourth European Workshop on Multi-Agent Systems, 2006.

JFSMA06, the French spoken conference on multi-agent systems.

François Charpillet was a member of the following PhD committees:

Hussein Atoui, ``Conception de systèmes Intelligents pour la télémédecine Citoyenne'', l'INSA de Lyon.

Aurélie Beynier, ``Une contribution à la résolution des processus décisionnels de Markov Décentralisés avec contraintes temporelles'', Université de Caen Basse-Normandie,

Jérome Chapelle, ``Une architecture multi-agents pour un apprentissage guidé par les émotions''. Université de Montpellier.

Bassam Baki, ``Planification et ordonnancement probabilistes sous contraintes temporelles'', Doctorat de l'Université de Caen.

Pierre Emmanuel Dumont, ``Tolérance active aux fautes des systèmes d'instrumentation. Application à un véhicule électrique'', Doctorat de l'Université des Sciences et Technologies de Lille.

Hassan Amoud, ``Détection d'une évolution du risque de chute chez les personnes âgées'', Thèse de l'Université de Technologies de Troyes

François Charpillet was a member of the following HDR committee : Maroua Bouzid, ``Temps, Agent et vers une exentsion à dimension spatiale'' HDR Université de Caen Basse-Normandie.

François Charpillet is member of evaluation committees of ANR program PSIROB and TECSAN

François Charpillet is member of the Specialist Committee (commissions de spécialiste) in Paris XI.

Anne Boyer was a member of the PhD committee of Salah El Falou (Caen University, December 2006, ``Programmation répartie, optimisation par agent mobile'').

Anne Boyer is ``chargée de mission auprès du Président de l'Université Nancy 2'' about ``new technologies of information and communication''.

Anne Boyer is a member of the editorial committee of the journal ``TSI''.

Anne Boyer is member of the ``Specialist Committees'' (commissions de spécialiste) of Nancy 2 and Strasbourg.

Christine Bourjot is member of the scientific council of CogniEst ``Reseau Grand Est des Sciences Cognitives''

Christine Bourjot is member of the administration council of ARCo ``Association for cognitiv research''

Christine Bourjot is member of the program committee of ARCo'06 colloquium, Bordeaux, December 2006

Olivier Simonin was a member of the following journal committees: Journal ``RIA'', special volume on ``Complex environments models for Multi-Agent Systems'', LNCS post-proceedings of the Third International Workshop on Environments for Multi-agent Systems

Olivier Simonin was a member of the PhD committee of Jérome Chapelle. ``Une architecture multi-agents pour un apprentissage guidé par les émotions''. Université de Montpellier II.

Olivier Simonin is a member of the ``Specialist Committees'' (commissions de spécialistes) in Nancy 1 (UHP) and Belfort (UTBM).

This work concerns the dissemination of research results to non-specialists. We wrote an article as a part of a special track of
*Pour la Science*about computer models for the simulation of real phenomena
. It focuses on the collective weaving in a
social spider species.

In the context of the constitution of the special issue of the wide-audience journal
*Sciences et Avenir - Hors-Série*, we were asked to provide a contribution on the theme of the ``History of the universe'' (March/April 2006). We wrote a short article explaining how the
limits of the computations observed with the cellular automata model could or could not serve as an analogy to understand the limits of our knowledge of the surrounding universe.