The development of interconnection networks has led to the emergence of new types of computing platforms. These platforms are characterized by heterogeneity of both processing and communication resources, geographical dispersion, and instability in terms of the number and performance of participating resources. These characteristics restrict the nature of the applications that can perform well on these platforms. Due to middleware and application deployment times, applications must be long-running and involve large amounts of data; also, only loosely-coupled applications may currently be executed on unstable platforms.

The new algorithmic challenges associated with these platforms have been approached from two different directions. On the one hand, the parallel algorithms community has largely concentrated on the problems associated with heterogeneity and large amounts of data. On the other hand, the distributed systems community has focused on scalability and fault-tolerance issues. The success of file sharing applications demonstrates the capacity of the resulting algorithms to manage huge volumes of data and users on large unstable platforms. Algorithms developed within this context are completely distributed and based on peer-to-peer (P2P for short) communication.

The goal of our project is to establish a link between
these two directions, by gathering researchers from the
distributed algorithms and data structures, parallel and
randomized algorithms communities. More precisely, the
objective of our project is to extend the application field
that can be executed on large scale distributed platforms.
Indeed, whereas protocols designed for P2P file exchange are
actually distributed, computationally intensive applications
executed on large scale platforms (BOINC

Projects must meet three basic technological requirements, to ensure benefits from grid computing:

Projects should have a need for millions of CPU hours of computation to proceed. However, humanitarian projects with smaller CPU hour requirements are able to apply.

The computer software algorithms required to accomplish the computations should be such that they can be subdivided into many smaller independent computations.

If very large amounts of data are required, there should also be a way to partition the data into sufficiently small units corresponding to the computations.

Given these constraints, applications using large data
sets should be such that they can be arbitrarily split into
small pieces of data (such as Seti@home

These constraints are both related to security and algorithmic issues. Security is of course an important issue, since executing non-certified code on non-certified data on a large scale, open, distributed platform is clearly unacceptable. Nevertheless, we believe that external techniques, such as Sandboxing, certification of data and code through hashcode mechanisms, should be used to solve these problems. Therefore, the focus of our project is on algorithmic issues and in what follows, we assume a cooperative environment of well-intentioned users, and we assume that security and cooperation can be enforced by external mechanisms. Our goal is to demonstrate that gains in performances and extension of the application field justify these extra costs but that, just as operating systems do for multi-users environments, security and cooperation issues should not affect the design of efficient algorithms nor reduce the application field.

Firstly, we aim both at building strong foundations for distributed algorithms (graph exploration, black-hole search,...) and distributed data structures (routing, efficient query, compact labeling...) to understand how to explore large scale networks in the context of failures and how to disseminate data so as to answer quickly to specific queries. Secondly, we aim at building simple (based on local estimations without centralized knowledge), realistic models to represent accurately resource performance and to build a realistic view of the topology of the network (based on network coordinates, geometric spanners, -hyperbolic spaces). Then, we aim at proving that these models are tractable by providing low complexity distributed and randomized approximation algorithms for a set a basic scheduling problems (independent tasks scheduling, broadcasting, data dissemination,...) and associated overlay networks. At last, our goal is to prove the validity of our approach through softwares dedicated to several applications (molecular dynamics simulations, continuous integration) as well as more general tools related to the model we propose (AlNEM for automatic topology discovery, SimGRID for simulations at large scale).

We will concentrate on the design of new services for computationaly intensive applications, consisting of mostly independent tasks sharing data, with application to distributed storage, molecular dynamics and distributed continuous integration, that will be described in more details in Section .

Most of the research (including ours) currently carried out on these topics relies on a centralized knowledge of the whole (topology and performances) execution platform, whereas recent evolutions in computer networks technology yield a tremendous change in the scale of these networks. The solutions designed for scheduling and managing compact data structures must be adapted to these systems, characterized by a high dynamism of their entities (participants can join and leave at will), a potential instability of the large scale networks (on which concurrent applications are running), and the increasing probability of failure.

P2P systems have achieved stability and fault-tolerance, as witnessed by their wide and intensive usage, by changing the view of the networks: all communication occurs on a logical network (fixed even though resources change over time), thus abstracting the actual performance of the underlying physical network. Nevertheless, disconnecting physical and logical networks leads to low performance and a waste of resources. Moreover, due to their original use (file exchange), those systems are well suited to exact search using Distributed Hash Tables (DHT's) and are based on fixed regular virtual topologies (Hypercubes, De Bruijn graphs...). In the context of the applications we consider, more complex queries will be required (finding the set of edges used for content distribution, finding a set of replicas covering the whole database) and, in order to reach efficiency, unstructured virtual topologies must be considered.

In this context, the main scientific challenges of our project are:

**Models:**

At a low level, to understand the underlying physical topology and to obtain both realistic and instanciable models. This requires expertise in graph theory (all the members of the project) and platform modelling (Olivier Beaumont, Nicolas Bonichon, Lionel Eyraud and Ralf Klasing). The obtained results will be used to focus the algorithms designed in Sections and .

At a higher level, to derive models of the dynamism of targeted platforms, both in terms of participating resources and resource performances (Olivier Beaumont, Philippe Duchon). Our goal is to derive suitable tools to analyze and prove algorithm performances in dynamic conditions rather than to propose stochastic modeling of evolutions (Section ).

**Overlays and distributed algorithms:**

To understand how to augment the logical topology in order to achieve the good properties of P2P systems. This requires knowledge in P2P systems and small-world networks (Olivier Beaumont, Nicolas Bonichon, Philippe Duchon, Nicolas Hanusse, Cyril Gavoille). The obtained results will be used for developing the algorithms designed in Sections and .

To build overlays dedicated to specific applications and services that achieve good performances (Olivier Beaumont, Nicolas Bonichon, Philippe Duchon, Lionel Eyraud, Ralf Klasing). The set of applications and services we target will be described in more details in Section and .

To understand how to dynamically adapt scheduling algorithms (in particular collective communication schemes) to changes in network performance and topology, using randomized algorithms (Olivier Beaumont, Nicolas Bonichon, Nicolas Hanusse, Philippe Duchon, Ralf Klasing) (Section ).

To study the computational power of the mobile agent systems under various assumptions on few classical distributed computing problems (exploration, mapping problem, exploration of the network in spite of harmful hosts. The goal is to enlarge the knowledge on the foundations of mobile agent computing. This will be done by developing new efficient algorithms for mobile agent systems and by proving impossibility results. This will also allow us to compare the different models (David Ilcinkas, Ralf Klasing, Evangelos Bampas) (Section ).

**Compact and distributed data structures:**

To understand how to dynamically adapt compact data structures to changes in network performance and topology (Nicolas Hanusse, Cyril Gavoille) (Section )

To design sophisticated labeling schemes in order to answer complex predicates using local labels only (Nicolas Hanusse, Cyril Gavoille) (Section )

We will detail in Section how the various expertises in the team will be employed for the considered applications.

We therefore tackle several problems related to two priorities that INRIA identified in its strategic plan (2008-2012): "Modeling, Simulation and Optimization of Complex Dynamic Systems" and "Information, Computation and Communication Everywhere "

The recent evolutions in computer networks technology,
as well as their diversification, yield a tremendous change
in the use of these networks: applications and systems can
now be designed at a much larger scale than before. This
scaling evolution is dealing with the amount of data, the
number of computers, the number of users, and the
geographical diversity of these users. This race towards
*large scale*computing has two major implications.
First, new opportunities are offered to the applications,
in particular as far as scientific computing, data bases,
and file sharing are concerned. Second, a large number of
parallel or distributed algorithms developed for average
size systems cannot be run on large scale systems without a
significant degradation of their performances. In fact, one
must probably relax the constraints that the system should
satisfy in order to run at a larger scale. In particular
the coherence protocols designed for the distributed
applications are too demanding in terms of both message and
time complexity, and must therefore be adapted for running
at a larger scale. Moreover, most distributed systems
deployed nowadays are characterized by a high dynamism of
their entities (participants can join and leave at will), a
potential instability of the large scale networks (on which
concurrent applications are running), and an increasing
individual probability of failure. Therefore, as the size
of the system increases, it becomes necessary that it
adapts automatically to the changes of its components,
requiring self-organization of the system to deal with the
arrival and departure of participants, data, or
resources.

As a consequence, it becomes crucial to be able to understand and model the behavior of large scale systems, to efficiently exploit these infrastructures, in particular w.r.t. designing dedicated algorithms handling a large amount of users and/or data.

In the case of parallel computation solutions, some strategies have been developed in order to cope with the intrinsic difficulty induced by resource heterogeneity. It has been proved that changing the metric (from makespan minimization to throughput maximization) simplifies most scheduling problems, both for collective communications and parallel processing. This restricts the use of target platforms to simple and regular applications, but due to the time needed to develop and deploy applications on large scale distributed platforms, the risk of failures, the intrinsic dynamism of resources, it is unrealistic to consider tightly coupled applications involving many tight synchronizations. Nevertheless, (1) it is unclear how the current models can be adapted to large scale systems, and (2) the current methodology requires the use of (at least partially) centralized subroutines that cannot be run on large scale systems. In particular, these subroutines assume the ability to gather all the information regarding the network at a single node (topology, resource performance, etc.). This assumption is unrealistic in a general purpose large size platform, in which the nodes are unstable, and whose resource characteristics can vary abruptly over time. Moreover, the proposed solutions for small to average size, stable, and dedicated environments do not satisfy the minimal requirements for self-organization and fault-tolerance, two properties that are unavoidable in a large scale context. Therefore, there is a strong need to design efficient and decentralized algorithms. This requires in particular to define new metrics adapted to large scale dynamic platforms in order to analyze the performance of the proposed algorithms.

As already noted, P2P file sharing applications have been successfully deployed on large scale dynamic platforms. Nevertheless, since our goal is the design of efficient algorithms in terms of actual performance and resource consumption, we need to concentrate on specific P2P environments. Indeed, P2P protocols are mostly designed for file sharing applications, and are not optimized for scientific applications, nor are they adapted to sophisticated database applications. This is mainly due to the primitive goal of designing file sharing applications, where anonymity is crucial, exact queries only are used, and all large file communications are made at the IP level.

Unfortunately, the context strongly differs for the applications we consider in our project, and some of the constraints appear to be in contradiction with performance and resource consumption optimization. For instance, in these systems, due to anonymity, the number of neighboring nodes in the overlay network (i.e. the number of IP addresses known to each peer) is kept relatively low, much lower than what the memory constraints on the nodes actually impose. Such a constraint induces longer routes between peers, and is therefore in contradiction with performance. In those systems, with the main exception of the LAND overlay, the overlay network (induced by the connections of each peer) is kept as far as possible separate from the underlying physical network. This property is essential in order to cope with malicious attacks, i.e. to ensure that even if a geographic site is attacked and disconnected from the rest of the network, the overall network will remain connected. Again, since actual communications occur between peers connected in the overlay network, communications between two close nodes (in the physical network) may well involve many wide area messages, and therefore such a constraint is in contradiction with performance optimization. Fortunately, in the case of file sharing applications, only queries are transmitted using the overlay network, and the communication of large files is made at IP level. On the other hand, in the case of more complex communication schemes, such as broadcast or multicast, the communication of large files is done using the overlay network, due to the lack of support, at IP level, for those complex operations. In this case, in order to achieve good results, it is crucial that virtual and physical topologies be as close as possible.

Our aim is to target large scale platforms. From parallel processing, we keep the idea that resource heterogeneity dramatically complicates scheduling problems, what imposes to restrict ourselves to simple applications. The dynamism of both the topology and the performance reinforces this constraint. We will also adopt the throughput maximization objective, though it needs to be adapted to more dynamic platforms and resources.

From previous work on P2P systems, we keep the idea that there is no centralized large server and that all participating nodes play a symmetric role (according to their performance in terms of memory, processing power, incoming and outgoing bandwidths, etc.), which imposes the design of self-adapting protocols, where any kind of central control should be avoided as much as possible.

Since dynamism constitutes the main difficulty in the design of algorithms on large scale dynamic platforms, we will consider several layers in dynamism:

**Stable:**In order to establish the complexity
induced by dynamism, we will first consider fully
heterogeneous (in terms of both processing and
communication resources) but fully stable platforms
(where both topology and performance are constant over
time).

**Semi-stable:**In order to establish the complexity
induced by fault-tolerance, we will then consider fully
heterogeneous platforms where resource performance
varies over time, but topology is fixed.

**Unstable:**At last, we will target systems facing
the arrival and departure of participants, data or
resources.

The article "Could any graph be turned into a small world ?" written by Philippe Duchon, Nicolas Hanusse, Emmanuelle Lebhar and Nicolas Schabanel has been ranked as a "Top cited article 2005-2010" of Theoretical Computer Science from Elsevier. This article deals with the small-world phenomenom and its algorithmic perspectives: how to turn a graph, of large diameter, into a small-world adding a small number of shorcuts ?

The members of CEPAGE have been involved in the following program committees IPDPS 2011 (Vice-Chair, Algorithm Track), EuroPar 2011 (Local Chair, P2P Track), STACS 2011, PODC 2010, SIROCCO 2011, FOMC 2011, ADHOC-NOW'2011, IWOCA 2011, IC3 2011, RENPAR 2011, ISCIS 2011, STACS 2010, IPDPS 2010, SSS 2010, HIPC 2010, PASCO 2010, MARAMI 2010, MajesTIC 2010, DYNAS 2010, DIALM-POMC 2010, ALGOSENSORS 2010, IWOCA 2010, ADHOC-NOW 2010.

The members of CEPAGE were strongly involved in the organization of ICALP'10 in Bordeaux.

The members of CEPAGE accepted many collective duties (Vice Chairman of the project committee at INRIA Bordeaux Sud-Ouest (OB), correspondent for International Affairs for INRIA Bordeaux (OB) and LaBRI (RK), responsible of the Algorithms and Combinatorics team at LaBRI (OB then RK), Vice Chairman for Doctoral Studies (NH), responsible of the Distributed Algorithm group at LaBRI (NH).

Modeling the platform dynamics in a satisfying manner, in order to design and analyze efficient algorithms, is a major challenge. In a semi-stable platform, the performance of individual nodes (be they computing or communication resources) will fluctuate; in a fully dynamic platform, which is our ultimate target, the set of available nodes will also change over time, and algorithms must take these changes into account if they are to be efficient.

There are basically two ways one can model such evolution:
one can use a
*stochastic process*, or some kind of
*adversary model*.

In a stochastic model, the platform evolution is governed by some specific probability distribution. One obvious advantage of such a model is that it can be simulated and, in many well-studied cases, analyzed in detail. The two main disadvantages are that it can be hard to determine how much of the resulting algorithm performance comes from the specifics of the evolution process, and that estimating how realistic a given model is – none of the current project participants are metrology experts.

In an adversary model, it is assumed that these unpredictable changes are under the control of an adversary whose goal is to interfere with the algorithms efficiency. Major assumptions on the system's behavior can be included in the form of restrictions on what this adversary can do (like maintaining such or such level of connectivity). Such models are typically more general than stochastic models, in that many stochastic models can be seen as a probabilistic specialization of a nondeterministic model (at least for bounded time intervals, and up to negligible probabilities of adopting "forbidden" behaviors).

Since we aim at proving guaranteed performance for our algorithms, we want to concentrate on suitably restricted adversary models. The main challenge in this direction is thus to describe sets of restricted behaviors that both capture realistic situations and make it possible to prove such guarantees.

On the other hand, in order to establish complexity and approximation results, we also need to rely on a precise theoretical model of the targeted platforms.

At a lower level, several models have been proposed to describe interference between several simultaneous communications. In the 1-port model, a node cannot simultaneously send to (and/or receive from) more than one node. Most of the “steady state” scheduling results have been obtained using this model. On the other hand, some authors propose to model incoming and outgoing communication from a node using fictitious incoming and outgoing links, whose bandwidths are fixed. The main advantage of this model, although it might be slightly less accurate, is that it does not require strong synchronization and that many scheduling problems can be expressed as multi-commodity flow problems, for which decentralized efficient algorithms are known. Another important issue is to model the bandwidth actually allocated to each communication when several communications compete for the same long-distance link.

At a higher level, proving good approximation ratios on general graphs may be too difficult, and it has been observed that actual platforms often exhibit a simple structure. For instance, many real life networks satisfy small-world properties, and it has been proved, for instance, that greedy routing protocols on small world networks achieve good performance. It is therefore of interest to prove that logical (given by the interactions between hosts) and physical platforms (given by the network links) exhibit some structure in order to derive efficient algorithms.

In order to analyze the performance of the proposed
algorithms, we first need to define a metric adapted to the
targeted platform. In particular, since resource performance
and topology may change over time, the metric should also be
defined from the optimal performance of the platform at any
time step. For instance, if throughput maximization is
concerned, the objective is to provide for the proposed
algorithm an approximation ratio with respect to
or at least
min
_{SimulationTime}
O
p
t
T
h
r
o
u
g
h
p
u
t(
t).

For instance, Awerbuch and Leighton , developed a very nice distributed algorithm for computing multi-flows. The algorithm proposed in consists in associating queues and potential to each commodity at each node for all incoming or outgoing edges. These regular queues store the flow that did not reach its destination yet. Using a very simple and very natural framework, flow goes from high potential areas (the sources) to low potential areas (the sinks). This algorithm is fully decentralized since nodes make their decisions depending on their state (the size of their queues), the state of their neighbors (the size of their queues), and the capacity of neighboring links.

The remarkable property about this algorithm is that if,
at any time step, the network is able to ship
(1 +
)
d
_{i}flow units for each capacity at each time step,
then the algorithm will ship at least
d_{i}units of flow at steady state. The proof of this
property is based on the overall potential of all the queues
in the network, which remains bounded over time.

It is worth noting that this algorithm is quasi-optimal
for the metrics we defined above, since the overall
throughput can be made arbitrarily close to
min
_{SimulationTime}
O
p
t
T
h
r
o
u
g
h
p
u
t(
t).

In this context, the approximation result is given under
an adversary model, where the adversary can change both the
topology and the performances of communication resources
between any two steps, provided that the network is able to
ship
(1 +
)
d
_{i}.

Most of Scheduling problems are NP-Complete and unapproximability results exist in on-line settings, especially when resources are heterogeneous. Therefore, we need to rely on simplified communication models (see next section) to prove theoretical results. In this context, resource augmentation techniques are very useful. It consists in identifying a weak parameter (a parameter whose value can be slightly increased without breaking any strong modeling constraint) and then to compare the solution produced by a polynomial time algorithm (with this relaxed constraint) with the optimal solution of the NP-Complete problem (without resource augmentation). This technique is both pertinent in a difficult setting and useful in practice.

In the context of large scale dynamic platforms, it is unrealistic to determine precisely the actual topology and the contention of the underlying network at application level. Indeed, existing tools such as Alnem are very much based on quasi-exhaustive determination of interferences, and it takes several days to determine the actual topology of a platform made up of a few tens of nodes. Given the dynamism of the platforms we target, we need to rely on less sophisticated models, whose parameters can be evaluated at runtime.

Therefore, we propose to model each node
by an incoming and an outgoing bandwidth and to neglect
interference that appears at the heart of the network
(Internet), in order to concentrate on local constraints.
We are currently implementing a script, based on Iperf
P_{0}must send data at rate
x_{i}^{out}to node
P_{i}and receive data at rate
y_{j}^{in}from node
P_{j}, the goal is to achieve the prescribed bitrates,
provided that all capacity constraints are satisfied at
each node. Our aim is to implement using Java RMI a
protocol able to both evaluate the parameters of our model
(incoming and outgoing bandwidths) and to ensure a
prescribed sharing of communication resources.

Under this communication model, it is possible to obtain pathological results. For instance, if we consider a master-slave setting (corresponding to the distribution of independent tasks on a Volunteer Computing platform such as BOINC), the number of slaves connected to the master may be unbounded. In fact, opening simultaneously a large number of TCP connections may lead to a bad sharing of communication resources. Therefore, we propose to add a bound on the number of connexions that can be handled simultaneously by a given node. Estimating this bound is an important issue to obtain realistic communication models.

Once low level modeling has been obtained, it is crucial to be able to test the proposed algorithms. To do this, we will first rely on simulation rather than direct experimentation. Indeed, in order to be able to compare heuristics, it is necessary to execute those heuristics on the same platform. In particular, all changes in the topology or in the resource performance should occur at the same time during the execution of the different heuristics. In order to be able to replicate the same scenario several times, we need to rely on simulations. Moreover, the metric we have tentatively defined for providing approximation results in the case of dynamic platforms requires to compute the optimal solution at each time step, which can be done off-line if all traces for the different resources are stored. Using simulation rather than experiments can be justified if the simulator itself has been proved valid. Moreover, the modeling of communications, processing and their interactions may be much more complex in the simulator than in the model used to provide a theoretical approximation ratio, such as in SimGrid. In particular, sophisticated TCP models for bandwidth sharing have been implemented in SimGRID.

At a higher level, the derivation of realistic models
for large scale platforms is out of the scope of our
project. Therefore, in order to obtain traces and models,
we will collaborate with MESCAL, GANG and ASAP projects. We
already worked on these topics with the members of GANG in
the ACI Pair-A-Pair (ACI Pair-A-Pair finished in 2006, but
ANR Aladdin Programme Blanc acts as a follow-up, with the
members of GANG and Cepage projects). On the other hand, we
also need to rely on an efficient simulator in order to
test our algorithms. We have not yet chosen the discrete
event simulator we will use for simulations. One attractive
possibility would be to adapt SimGRID, developed in the
Mescal project, to large scale dynamic environments.
Indeed, a parallel version of SimGrid, based on activations
is currently under development in the framework of
USS-Simgrid ANR Arpege project (with MESCAl, ALGORILLE and
ASAP Teams). This version will be able to deal with
platforms containing more than
10
^{5}resources. SimGrid has been developed by
Henri Casanova (U.C. San Diego) and Arnaud Legrand during
his PhD (under the co-supervision of O. Beaumont).

Finally, we propose several applications that will be described in detail in Section . These applications cover a large set of fields (molecular dynamics, distributed storage, continuous integration, distributed databases...). All these applications will be developed and tested with an academic or industrial partner. In all these collaborations, our goal is to prove that the services that we propose in Section can be integrated as steering tools in already developed software. Our goal is to assert the practical interest of the services we develop and then to integrate and to distribute them as a library for large scale computing.

In order to test our algorithms, we propose to implement these services using Java RMI. The main advantages of Java RMI in our context are the ease of use and the portability. Multithreading is also a crucial feature in order to schedule concurrent communications and it does not interfere with ad-hoc routing protocols developed in the project.

A prototype has already been developed in the project as a steering tool for molecular dynamic simulations (see Section ). All the applications will first be tested on small scale platforms (using desktop workstations in the laboratory). Then, in order to test their scalability, we propose to implement them either on the GRID 5000 platform or the partner's platform.

The optimization schemes for content distribution
processes or for handling standard queries require a good
knowledge of the physical topology or performance (latencies,
throughput, ...) of the network. Assuming that some rough
estimate of the physical topology is given, former
theoretical results described in Section
show how to pre-process the
network so that local computations are performed efficiently.
Due to the dynamism of large distributed platforms, some
requirements on the coding of local data structures and the
udpating mechanism are needed. This last process is done
using the maintenance of light virtual networks, so-called
*overlay networks*(see Section
). In our approach, we focus
on:

*Compression.*

The emergence of huge distributed networks does not allow the topology of the network to be totally known to each node without any compression scheme. There are at least two reasons for this:

In order to guarantee that local
computations are done efficiently, that is avoiding
external memory requests, it may be of interest that
the coding of the underlying topology can be stored
within
*fast memory*space.

The dynamism of the network implies many basic message communications to update the knowledge of each node. The smaller the message size is, the better the performance.

The compression of any topology description should not lead to an extra cost for standard requests: distance between nodes, adjacency tests, ... Roughly speaking, a decoding process should not be necessary.

*Routing tables.*

Routing queries and broadcasting information on large scale platforms are tasks involving many basic message communications. The maximum performance objective imposes that basic messages are routed along paths of cost as low as possible. On the other hand, local routing decisions must be fast and the algorithms and data structures involved must support a certain amount of dynamism in the platform.

*Local computations.*

Although the size of the data structures is less constrained in comparison with P2P systems (due to security reasons), however, even in our collaborative framework, it is unrealistic that each node manages a complete view of the platform with the full resource characteristic. Thus, a node has to manage data structures concerning only a fraction of the whole system. In fact, a partial view of the network will be sufficient for many tasks: for instance, in order to compute the distance between two nodes (distance labeling).

*Overlay and small world networks.*

The processes we consider can be highly dynamic. The
preprocessing usually assumed takes polynomial time.
Hence, when a new process arrives, it must be dealt with
in an
*on-line*fashion, i.e., we do not want to totally
re-compute, and the (partial) re-computation has to be
simple.

In order to meet these requirements,
*overlay networks*are normally implemented. These
are light virtual networks, i.e., they are sparse and a
local change of the physical network will only lead to a
small change of the corresponding virtual network. As a
result, small address books are sufficient at each
node.

A specific class of overlay networks are
*small-world*networks. These are efficient overlay
networks for (greedy) routing tasks assuming that
distance requests can be performed easily.

*Mobile Agent Computing.*

Mobile Agent Computing has been proposed as a powerful paradigm to study distributed systems. Our purpose is to study the computational power of the mobile agent systems under various assumptions. Indeed, many models exist but little is known about their computational power. One major parameter describing a mobile agent model is the ability of the agents to interact.

The most natural mobile agent computing problem is the exploration or mapping problem in which one or several mobile agents have to explore or map their environment. The rendezvous problem consists for two agents to meet at some unspecified node of the network. Two other fundamental problems deal with security, which is often the main concern of actual mobile agent systems. The first one consists in exploring the network in spite of harmful hosts that destroy incoming agents. An additional goal in this context is to locate the harmful host(s) to prevent further agent losses. We already mentioned the second problem related to security, which consists for the agents in capturing an intruder.

The goal is to enlarge the knowledge on the foundations of mobile agent computing. This will be done by developing new efficient algorithms for mobile agent systems and by proving impossibility results. This will also allow to compare the different models.

Of course, the main difficulty is to adapt the maintenance of local data structures to the dynamism of the network.

As mentioned in Section
, solutions
provided by the parallel algorithm community are dedicated to
stable platforms whose resource performances can be gathered
at a single node that is responsible for computing the
optimal solution. On the other hand, P2P systems are fully
distributed but the set of available queries in these systems
is much too poor for computationally intensive applications.
Therefore, actual solutions for large scale distributed
platforms such as BOINC

Requests and Task scheduling on large scale platforms;

New services for processing on large scale platforms.

Another interesting scheduling problem is the case of applications sharing (large) files stored in replicated distributed databases. We deal here with a particular instance of the scheduling problem mentioned in Section . This instance involves applications that require the manipulation of large files, which are initially distributed across the platform.

It may well be the case that some files are replicated. In the target application, all tasks depend upon the whole set of files. The target platform is composed of many distant nodes, with different computing capabilities, and which are linked through an overlay network (to be built). To each node is associated a (local) data repository. Initially, the files are stored in one or several of these repositories. We assume that a file may be duplicated, and thus simultaneously stored on several data repositories, thereby potentially speeding up the next request to access them. There may be restrictions on the possibility of duplicating the files (typically, each repository is not large enough to hold a copy of all the files). The techniques developed in Section will be used to dynamically maintain efficient data structures for handling files.

Our aim is to design a prototype for both maintaining data structures and distributing files and tasks over the network.

This framework occurs for instance in the case of Monte-Carlo applications where the parameters of new simulations depend on the average behavior of the simulations previously performed. The general principle is the following: several simulations (independent tasks) are launched simultaneously with different initial parameters, and then the average behavior of these simulations is computed. Then other simulations are performed with new parameters computed from the average behavior. These parameters are tuned to ensure a much faster convergence of the method. Running such an application on a semi-stable platform is a particular instance of the scheduling problem mentioned in Section .

We will focus on a particular algorithm picked from
Molecular Dynamics: calculation of Potential of Mean Force
(PMF) using the technique of Adaptive Bias Force (ABF). This
work is done via a collaboration with Juan Elezgaray, IECB,
Bordeaux. Here is a quick presentation of this context.
Estimating the time needed for a molecule to go through a
cellular membrane is an important issue in biology and
medicine. Typically, the diffusion time is far too long to be
computed with atomistic molecular simulations (the average
time to be simulated is of order of 1s and the integration
step cannot be chosen larger than
10
^{-15}, due to the nature of physical
interactions). Classical parallel approaches, based on domain
decomposition methods, lead to very poor results due to the
number of barriers. Another method to estimate this time is
by calculating the PMF of the system, which is in this
context the average force the molecule is subject to at a
given position within or around the membrane. Recently, Darve
et al.
presented a new method, called
ABF, to compute the PMF. The idea is to run a small number of
simulations to estimate the PMF, and then add to the system a
force that cancels the estimated PMF. With this new force,
new simulations are performed starting from different
configurations (distributed over the computing platform) of
the system computed during the previous simulations and so
on. Iterating this process, the algorithm converges quite
quickly to a good estimation of the PMF with a uniform
sampling along the axis of diffusion. This application has
been implemented and integrated to the famous molecular
dynamics software NAMD
.

Our aim is to propose a distributed implementation of ABF method using NAMD. It is worth noting that NAMD is designed to run on high-end parallel platforms or clusters, but not to run efficiently on instable and distributed platforms. The different problems to be solved in order to design this application are the following:

Since we need to start a simulation from a valid configuration (which can represent several Mbytes) with a particular position of the molecule in the membrane, and these configurations are spread among participating nodes, we need to be able to find and to download such configuration. Therefore, the first task is to find an overlay such that those requests can be handled efficiently. This requires expertise in overlay networks, compact data structures and graph theory. Olivier Beaumont, Nicolas Bonichon, Philippe Duchon, Nicolas Hanusse, Cyril Gavoille and Ralf Klasing will work on this part.

In our context, each participating node may offer some space for storing some configurations, some bandwidth and some computing power to run simulations. The question arising here is how to distribute the simulations to nodes such that computing power of all nodes are fully used. Since nodes may join and leave the network at any time, redistributions of configurations and tasks between nodes will also be necessary (but all tasks only contribute to update the PMF, so that some tasks may fail without changing the overall result). The techniques designed for content distribution will be used to spread and redistribute the set of configurations over the set of participating nodes. This requires expertise in task scheduling and distributed storage. Olivier Beaumont, Nicolas Bonichon, Philippe Duchon and Lionel Eyraud-Dubois will work on this part.

A prototype of a steering tool for NAMD has been developed in the project, that may be used to validate our approach and that has been tested on GRID'5000 up to 200 processors. This prototype supports the dynamicity of the platform: contributing processors can come and leave. The managment of configurations' location is now performed using a distributed hash table. This was done by integrating the library Bamboo in the prototype. We still have to solve numerical instability.

Continuous Integration is a development method in which developers commit their work in a version control system (such as CVS or Subversion) very frequently (typically several times per day) and the project is automatically rebuilt. One of the advantages of this technique is that merge problems are detected and corrected early.

The build process not only generates the binaries, it also runs automated tests, generates documentation, checks the code coverage of tests and analyzes code style...

The whole process can take several hours for large
projects. Therefore, the efficiency of this development
method relies on the speed of the feedback. There is a real
need to speed up the build process, and thus to distribute
it. This is one of the goal continuous integration server
xooctory

In order to obtain an efficient distribution of the build, the build process can be decomposed into nearly independent sub processes, executed on different nodes. Nevertheless, to be completed, a sub process must be run on a node that holds the appropriate version of the tools (compiler, code auditing software, ...), the appropriate version of the libraries, and the appropriate version of source code. Of course, if the target node does not have all these items, it can download them from another node, but these communications may be more expensive than the execution of the sub processes.

This raises several challenging problems:

Build a distributed data structure that can efficiently provide

one of the nodes that stores a
certain set
Sof files.

one of the nodes that stores a
maximum subset
S^{'}of a set
Sof files.

one of the nodes that can obtain
quickly a certain set
Sof files (i.e. a node that can download
efficiently the files of
Sthat it does not already holds).

Design distribution strategies of the build that take advantage of the processing and communication capabilities of the nodes.

We are collaborating with Xavier Hanin and Jayasoft in order to solve distribution problems in the context of distributed continuous integration. Our goal is to incorporate some of the services developed in Cepage to obtain a large scale distributed version of the continuous integration server xooctory.

Ludovic Courtès (INRIA SED Engineer) has been affected to the project from July 2009 to December 2010 in the framework of the ICPAGE ADT. He worked mostly with O. Beaumont and N. Bonichon, and also with Xavier Hanin, engineer at 4SH and main, contributor of Xooctory. Our goal was to validate scheduling algorithms for task graphs that have been proposed in the team in the context of continuous integration. Continuous integration is a tool for software development that consists to compiling and testing a software and its dependences in order to detect errors as soon as possible. In the framework of ICPAGE, our goal is to map these compilation and test tasks on clusters, grids and in general heterogeneous platforms.

First, we concentrated on the integration in the Xooctory
sofware (
http://

The software Hubble (
http://

We recently focused on two problems:

Data cube queries represent an important class of On-Line Analytical Processing (OLAP) queries in decision support systems. They consist in a pre-computation of the different group-bys of a database (aggregation for every combination of GROUP BY attributes) that is a very consuming task. For instance, databases of some megabytes may lead to the construction of a datacube requiring terabytes of memory and parallel computation has been proposed but for a static and well-identified platform . This application is typically an interesting example for which the distributed computation and storage can be useful in an heterogeneous and dynamic setting. We just started a collaboration with Sofian Maabout (Assistant Professor in Bordeaux) and Noel Novelli (Assistant Professor of Marseille University) who is a specialist of datacube computation. Our goal is to rely on the set of services defined in Section to compute and maintain huge datacubes. For the moment, we developped:

a centralized tool that sums up an whole datacube until dimension 20 and that outperforms usual data cube reduction scheme .

a view selection algorithm whose output is the set of materialized views to store in order to give some guarantee for the request time for a given set of targeted views .

a parallel implementation of a Maximal Frequent Itemsets in C++ and OpenMP.

In query optimization, the input of many algorithms is
the size of some views. Ioannis Atsonios, Olivier Beaumont,
Nicolas Hanusse and Yusik Kim proposed a new estimator in
order to get a quick and reliable estimation of view size.
Our tool (
http://

In the framework of the Alcatel-Lucent Bell collaboration, we are developping a simulator of routing algorithms. This developpement is performed by the engineer F. Majorczyk in Bordeaux site, and in collaboration with INRIA Sophia-Antipolis site.

The main objective is to give a complete experimental
study of the Compact Routing Scheme given recently
by Abraham, Gavoille, Malkhi,
Nisan, Thorup in 2008. This algorithm garantees, for every
weighted
n-node network routing tables of size
while the stretch factor is at most 3, i.e., the
length of the routes induced by the scheme is never more than
three time the optimal length (the distance). The bound on
the stretch and in the memory are both optimal. Moreover, the
scheme is “Name-Independent”, that is the routing decision at
the source router is made on the base of the original name of
the destination node. No information can be implicitly
encoded in the node names, like coordinates in a grid
network. This extra feature is important in practice since in
many contexts, node names cannot be renamed according to some
global state on the network, in particular whenever the
network is growing and dynamic.

This study, if one succeeds, would be the first to report experiments on a name-independent routing scheme. Our simulator implement several graph generators, and the target algorithm currently works on 3000 nodes. We plain to extend the experiments up to 10.000 nodes, and in parallel to give a message efficient distributed algorithm and implementation of this algorithm.

In 2010, we add our compact routing scheme in the routing simulator DRMSim co-developped with Mascotte, another INRIA project located at Nice. We are able to run simulation on AS networks of 16 000 nodes coming from CAIDA.

The problem of discovering the topology of a platform based on application-level measurements has been an important concern for researchers of the team in the past years. The ALNeM prototype was developed with this goal in mind, and was designed for precise models in which the network is described by an arbitrary graph.

The work of Young Won as a postdoc in the ANR USS-SimGrid project went in the same direction, but took place in the context of less complex models, which consist in embedding the network in geometric structures to have more compact representations. This resulted in a small software for computing and comparing the accuracy of different types of network embeddings.

Once computed by our softwares, these embeddings can be used as input in SimGrid, in order to provide insights about the behavior of an application on this platform. This requires to code the network models corresponding to the different types of embeddings inside SimGrid. This ongoing work will greatly increase the capacity of SimGrid to handle large platforms, by using a much simpler description, at a cost of accuracy.

_{k}-graphs are geometric graphs that appear in the
context of graph navigation. The shortest-path metric of
these graphs is known to approximate the Euclidean complete
graph up to a factor depending on the cone number
kand the dimension of the space.

TD-Delaunay graphs, a.k.a. triangular-distance Delaunay triangulations introduced by Chew, have been shown to be plane 2-spanners of the 2D Euclidean complete graph, i.e., the distance in the TD-Delaunay graph between any two points is no more than twice the distance in the plane.

Orthogonal surfaces are geometric objects defined from independent sets of points of the Euclidean space. Orthogonal surfaces are well studied in combinatorics (orders, integer programming) and in algebra. From orthogonal surfaces, geometric graphs, called geodesic embeddings can be built.

Using these new bridges between these three fields, we establish:

Every
-graph is the union of two spanning TD-Delaunay
graphs. In particular,
_{6}-graphs are 2-spanners of the Euclidean graph.
It was not known that
_{6}-graphs are
t-spanners for some constant
t, and
_{7}-graphs were only known to be
t-spanners for
.

Every plane triangulation is TD-Delaunay realizable, i.e., every combinatorial plane graph for which all its interior faces are triangles is the TD-Delaunay graph of some point set in the plane. Such realizability property does not hold for classical Delaunay triangulations.

In collaboration with Jean-François Markert, we also studied the asymptotic behavior of these spanners .

In collaboration with Ljubomir Perković, we have also worked on the question of bounded degree planar spanner: what is the minimum such that there exists a planar spanner of degree at most for any point set? We have proposed an algorithm that computes a 6-spanner of degree at most 6 . The best previous known bound on the maximum degree of planar spanner was 14 with a stretch factor of 3.53. The construction is based on combinatorial equivalences shown in .

There are several techniques to manage sub-linear size routing tables (in the number of nodes of the platform) while guaranteeing almost shortest paths (cf. for a survey of routing techniques).

Some techniques provide routes of length at most 1 + times the length of the shortest one (which is the definition of a stretch factor of 1 + ) while maintaining a poly-logarithmic number of entries per routing table , , . However, these techniques are not universal in the sense that they apply only on some class of underlying topologies. Universal schemes exist. Typically they achieve -entry local routing tables for a stretch factor of 3 in the worst case , . Some experiments have shown that such methods, although universal, work very well in practice, in average, on realistic scale-free or existing topologies .

While the fundamental question is to determine the
best stretch-space trade-off for universal schemes, the
challenge for platform routing would be to design
specific schemes supporting reasonable dynamic changes in
the topology or in the metric, at least for a limited
class of relevant topologies. In this direction
have constructed (in
polynomial time) network topologies for which nodes can
be labeled once such that whatever the link weights vary
in time, shortest path routing tables with compacity
kcan be designed, i.e., for each routing table the
set of destinations using the same first outgoing edge
can be grouped in at most
kranges of consecutive labels.

One other aspect of the problem would be to model a realistic typical platform topology. Natural parameters (or characteristic) for this are its low dimensionality: low Euclidean or near Euclidean networks, low growing dimension, or more generally, low doubling dimension.

In 2007, we have improved compact routing scheme for planar networks, and more generally for networks excluding a fixed minor . This later family of networks includes (but is not rectrict to) networks embeddable on surfaces of bounded genus and networks of bounded treewidth. The stretch factor of our scheme is constant and the size of each routing table is only polylogarithmic (independently of the degree of the nodes), and the scheme does not require renaming (or a new addressing) of the nodes: it is name-independent. More importantly, the scheme can be constructed efficiently in polynomial time, and complexities do not hid large constant as we may encounter in Minor Graph Theory. This construction has been achieved by the design of new sparse cover for planar graphs, solving a problem open since STOC '93.

In 2007, we also gave an invited lecture on compact routing schemes at a workshop on Peer-to-Peer, Routing in Complex Graphs, and Network Coding in Thomson Labs in Paris.

In 2008, we have proposed a minimum stretch compact name-independent routing . This scheme is the based of the Compact Routing Simulator we are developping in the Alcatel-Lucent Bell project.

In order to optimize applications the platform topology itself must be discovered, and thus represented in memory with some data structures. The size of the representation is an important parameter, for instance, in order to optimize the throughput during the exploration phase of the platform.

Classical data structures for representing a graph
(matrix or list) can be significantly improved when the
targeted graph falls in some specific classes or obeys to
some properties: the graph has bounded genus (embeddable
on surface of fixed genus), bounded tree-width (or
c-decomposable), or embeddabble into a bounded page
number
,
. Typically, planar
topologies with
nnodes (thus embeddable on the plane with no edge
crossings) can by efficiently coded in linear time with
at most
5
n+
o(
n)bits supporting adjacency
queries in constant time. This improves the classical
adjacency list within a non negligible
log
nfactor on the size (the size
is about
6
nlog
nbits for edge list), and also
on the query time
,
,
.

In 2008, we gave a compact encoding scheme of
pagenumber
kgraphs
.

The basic routing scheme and the overlay networks must also allow us to route other queries than routing driven by applications. Typically, divide-and-conquer parallel algorithms require to compute many nearest common ancestor (NCA) queries in some tree decomposition. In a large scale platform, if the current tree structure is fully or partially distributed, then the physical location of the NCA in the platform must be optimized. More precisely, the NCA computation must be performed from distributed pieces of information, and then addressed via the routing overlay network (cf. for distributed NCA algorithms).

Recently, a theory of localized data structures has been developed (initialized by ; see for a survey). One associates with each node a label such that some given function (or predicate) of the node can be extracted from two or more labels. Theses labels are usually joined to the addresses or inserted into a global database index.

In relation with the project, queries involving the flow computation between any sink-target pair of a capacitated network is of great interest . Dynamic labeling schemes are also available for tree models , , and need further work for their adaptation to more general topologies.

Finally, localized data structures have applications to platforms implementing large database XML file types. Roughly speaking pieces of a large XML file are distributed along some platform, and some queries (typically some SELECT ... FROM extractions) involve many tree ancestor queries , the XML file structure being a tree. In this framework, distributed label-based data structures avoid the storing of a huge classical index database.

In 2007, we have proved that it is possible to
assigned with each node of
n-node planar networks a label of
2log
n+
O(loglog
n)bits so that adjacency
between two nodes can be retrieved from there
labels
. Classical representations
of planar graphs in the distributed setting where based
on the Three Schnyder Trees decomposition, leading to
3log
n+
O(log
^{*}
n)bit labels (FOCS '01).
An intriguing question is to know whether
clog
n-bit representation exists for
planar graphs with
c<2.

For trees, we have can solve
k-ancestry and distance-
kqueries with shorter labels
,
. Previous solutions achieve
log
n+
O(
k
^{2}loglog
n)-bit labels
[Alstrup-Bille-Rauhe 2005], whereas we have prove that
log
n+
O(
kloglog
n)-bit labels suffice. For
interval graphs, we have given an optimal distance
labeling scheme
, and we proposed a localized
and compact data structure for comparability graphs
.

In , , , we also analyzed the locality of the construction of sparse spanners. In , we proposed an efficient first-order model checking using short labels.

Finally, we have started a collaboration with Andrew
Twigg (Thomson - Labs) and Bruno Courcelle (LaBRI) about
connectivity in semi-dynamic planar networks (see
preliminary results here
and here
). In this model, the must
precompute some localized data-structure (given as a
label associate with each node) and for a planar graph
G, so that connectivity between any two nodes in
where
Xis any subset of nodes or edges, can be determined
from the labels of the two nodes and the labels of the
nodes (or end-point of edges) of
X. This field looks promising since it capture a
kind of dynamicity of the network, and we hope to
generalize this model and our results.

We also proposed a new algorithm that allow the
administrator or user of a SGBD to choose which part of
the data cube to optimize. This problem is called in the
litterature
*the views selection problem*. The goal consists in
chosing the best part of the whole data cube to
precompute. Our contribution is to consider that the main
constraint is the time to answer to individual queries
whereas the memory constraint is usually taken
.

The next step consists in turning our approach into a
parallel and distributed algorithm. We are currently
experiencing a parallel algorithm with a theoretical
guarantee of performance. More precisaly, given a
constant
f, the query time is at most
ftimes the optimal query (defined whenever the
result has already been computed).

It turns out that our solution can be adapted to the problem of finding quikly the maximal frequent itemsets within a transaction tables . A transaction consists in a list of items. For a given frequency, we aim at computing the maximal itemsets that are frequent in list of transactions. To our knowledge, there is no parallel algorithm with a guarantee of performance that compute the maximal frequent itemsets. Our solution for the view selection algorithm should be experienced on real instances.

We study the amount of knowledge about a communication
network that must be given to its nodes in order to
efficiently disseminate information. Our approach is
*quantitative*: we investigate the minimum total
number of bits of information (minimum size of advice)
that has to be available to nodes to solve the given
problem, regardless of the type of information
provided.

An overlay network is a virtual network whose nodes correspond either to processors or to resources of the network. Virtual links may depend on the application; for instance, different overlay networks can be designed for routing and broadcasting.

These overlay networks should support insertion and deletion of users/resources, and thus they inherently have a high dynamism.

We should distinguish
*structured*and
*unstructured*overlay networks:

In the first case, one aims at designing a network in which queries can be answered efficiently: greedy routing should work well (without backtracking), the spreading of a piece of information should take a very short time and few messages. The natural topology of these networks are graph of small diameter and bounded degree (De Bruijn graph for instance). However, dynamic maintenance of a precise structure is difficult and any perturbation of the topology gives no guarantee for the desired tasks.

In the case of unstructured networks, there is no strict topology control. For the information retrieval task, the only attempt to bound the total number of messages consists of optimizing a flooding by taking into account statistics stored at each peer: number of requests that found an item traversing a given link, ...

In both approaches, the physical topology is not involved. To our knowledge, there exists only one attempt in this direction. The work of Abraham and Malhki deals with the design of routing tables for stable platforms.

We are interested in designing overlay topologies that take into account the physical topology.

Another work is promising. If we relax the condition
of designing an overlay network with a precise topology
but with some topological properties, we might construct
very efficient overlay networks. Two directions can be
considered:
*random graphs*and
*small-world*networks.

Random graphs are promising for broadcast and have been proposed for the update of replicated databases in order to minimize the total number of messages and the time complexity , . The underlying topology is the complete graph but the communication graph (pairs of nodes that effectively interact) is much more sparse. At each pulse of its local clock, each node tries to send or receive any new piece of information. The advantage of this approach is fault-tolerance. However, this epidemic spreading leads to a waste of messages since any node can receive many times the same update. We are interested in fixing this drawback and we think that it should be possible.

For several queries, recent solutions use small-world networks. This approach is inspired from experiments in social sciences . It suggests that adding a few (non uniform) random and uncoordinated virtual long links to every node leads to shrink drastically the diameter of the network. Moreover, paths with a small number of hops can be found , , .

Solutions based on network augmentation (i.e. by
adding virtual links to a base network) have proved to be
very promising for large scale networks. This technique
is referred to as turning a network into a small-world
network, also called the
*small-worldization*process. Indeed, it allows to
transform many arbitrary networks into networks in which
search operations can be performed in a greedy fashion
and very quickly (typically in time poly-logarithmic in
the size of the network). This property implies that some
information can be easily (or locally) accessed like the
distance between nodes. More formally, a network is
f-navigable if a greedy routing can be used to get
routing paths of
O(
f)hops. Recently, many authors
aim at finding some networks that be turned into
log
^{O(1)}-navigable
network.

Our goal is to study more precisely the algorithmic performance of these new small-world networks (w.r.t. time, memory, pertinence, fault-tolerance, auto-stabilization, ...) and to propose new networks of this kind, i.e. to construct the augmentation of the base network as well as to conceive the corresponding navigation algorithm. Like classical algorithms for routing and navigation (that are essentially based on greedy algorithms), the proposed solutions have to take into account that no entity has a global knowledge of the network. A first result in this direction is promising. In , we proposed an economic distributed algorithm to turn a bounded growth network into a small-world. Moreover, the practical challenge will be to adapt such constructions to dynamic networks, at least under the models that are identified as relevant.

Can the
*small-worldization*process be supported in dynamic
platforms? Up to now, the literature on small-world
networks only deals with the routing task. We are
convinced that small-world topologies are also relevant
for other tasks: quick broadcast, search in presence of
faulty nodes, .... In general, we think that maintaining
a small-world topology can be much more realistic than
maintaining a rigidly structured overlay network and much
more efficient for several tasks in unstructured overlay
networks.

In 2007, we have two contributions dealing with
overlay networks: (1) in
, there is a formal
description of an algorithm turning any network into a
n^{1/3}-navigable network. This article is particularly
interesting since it is the first one that considers any
input network in the small-worldization process; (2) in
,
, we prove that local
knowledge is not enough to search quickly for a target
node in scale-free networks. Recent studies showed that
many real networks are scale-free: the distribution of
nodes degree follows a power law on the form
with
[2,
3], that is the number of nodes of degree
kis proportional to
. More precisely, we formally prove that in usual
scale-free models, it takes
(
n^{1/2})steps to reach the target.

In 2008, we gave a small stretch polylogarithmic network navigability scheme using compact metrics .

A mobile agent is an autonomous entity able to move in its environment. We are interested here in mobile agents moving in geometric terrains, or in networks that are either virtual (computer networks for instance) or real (cave or pipe network, etc.). Such a mobile agent may be a software agent able to execute itself on its host computer, or a physical robot.

The results published in 2010 concern both continuous environments (geometric terrains) and discrete environments (graphs/networks). Below is a summary of the obtained results.

In the effort to understand the algorithmic limitations of computing by a swarm of weak robots, we consider in a team of anonymous oblivious robots, endowed with visibility sensors (but otherwise unable to communicate), and operating in Look-Compute-Move cycles performed asynchronously for each robot. The goal of these robots is to collectively explore (visit all the nodes of) the whole graph/network and enter a quiescent state.

In this paper, we study the (already non-trivial) case when the graph is an arbitrary tree. We determine asymptotically optimal bounds on the minimal size of a team of robots able to explore trees of a given size. Hiding some technical details, our main result is that robots are sometimes necessary and always sufficient.

We propose a universal strategy (i.e. working in every network), combining phases where the agent follows the advice and phases where the agent randomly selects an incident edge. We prove that this strategy is efficient for two classes of regular graphs with extremal values of expansion, namely, for rings and for random regular graphs (an important class of expanders).

A mobile robot represented by a
point moving in the plane has to explore an unknown
terrain with obstacles. Both the terrain and the
obstacles are modeled as arbitrary polygons. In
, we
consider two scenarios: the
*unlimited vision*, when the robot situated at a
point
pof the terrain explores (sees) all points
qof the terrain for which the segment
pqbelongs to the terrain, and the
*limited vision*, when we require additionally
that the distance between
pand
qbe at most 1. All points of the terrain
(except obstacles) have to be explored and the
performance of an exploration algorithm is measured
by the length of the trajectory of the robot.

For unlimited vision, we show an exploration
algorithm with optimal complexity
, where
Pis the total perimeter of the terrain
(including perimeters of obstacles),
Dis the diameter of the convex hull of the
terrain, and
kis the number of obstacles. For limited
vision, we show exploration algorithms with optimal
complexity
, where
Ais the area of the terrain (excluding
obstacles). Our latter algorithms work for arbitrary
terrains, under the condition that one of the
parameters
Aor
kis known.

Two mobile agents (robots) have to meet in an a priori unknown bounded terrain modeled as a polygon, possibly with polygonal obstacles. Robots are modeled as points, and each of them is equipped with a compass. Compasses of robots may be incoherent. Robots construct their routes, but the actual walk of each robot is decided by the adversary that may, e.g., speed up or slow down the robot (asynchronous setting).

Two anonymous mobile agents (robots) moving in asynchronous manner have to meet in an infinite grid of dimension >0. Since the problem is infeasible in such general setting, we assume in the paper that the grid is embedded in a -dimensional Cartesian space and each agent knows the coordinates of its own initial position (but not the one of the other agent).

We design an algorithm permitting two agents (at
distance at most
d) to meet after traversing a trajectory of
length
. The algorithm is almost optimal, since the
lower bound is straightforward. Furthermore,
our technique can be extended in other models, and in
particular for asynchronous rendezvous in the
continuous plane of two agents having a (possibly
different) positive radius of visibility.

One of the recently considered models of robot-based computing makes use of identical, memoryless mobile units placed in nodes of an anonymous graph. The robots operate in Look-Compute-Move cycles; in one cycle, a robot takes a snapshot of the current configuration (Look), takes a decision whether to stay idle or to move to one of the nodes adjacent to its current position (Compute), and in the latter case makes an instantaneous move to this neighbor (Move). Cycles are performed asynchronously for each robot.

The proposed symmetry-preserving approach, which is complementary to symmetry-breaking techniques found in related work, appears to be new and may have further applications in robot-based computing.

Within the wider context of the project, we have published two book chapters on data gathering and energy consumption in wireless networks, respectively , . We have also considered the problem of approximating variants of the traveling salesman problem with precedence constraints .

Even if the application field for large scale platforms is currently too poor, targeted platforms are clearly not suited to tightly coupled codes and we need to concentrate on simple scheduling problems in the context of large scale distributed unstable platforms. Indeed, most of the scheduling problems are already NP-Complete with bad approximation ratios in the case of static homogeneous platforms when communication costs are not taken into account.

Recently, many algorithms have been derived, under several communication models, for master slave tasking , and Divisible Load Scheduling (DLS) , , .

In this case, we aim at executing a large bag of independent, same-size tasks. First we assume that there is a single master, that initially holds all the (data needed for all) tasks. The problem is to determine an architecture for the execution. Which processors should the master enroll in the computation? How many tasks should be sent to each participating processor? In turn, each processor involved in the execution must decide which fraction of the tasks must be computed locally, and which fraction should be sent to which neighbor (these neighbors must be determined too).

Parallelizing the computation by spreading the
execution across many processors may well be limited by
the induced communication volume. Rather than aiming at
makespan minimization, a more relevant objective is the
optimization of the throughput in steady-state mode.
There are three main reasons for focusing on the
steady-state operation. First is
*simplicity*, as the steady-state scheduling is in
fact a relaxation of the makespan minimization problem in
which the initialization and clean-up phases are ignored.
One only needs to determine, for each participating
resource, which fraction of time is spent computing for
which application, and which fraction of time is spent
communicating with which neighbor; the actual schedule
then arises naturally from these quantities.

In , we have considered the case task scheduling for parallel multi-frontal methods, what corresponds to map a set of tasks whose dependencies are depicted by a tree. In , we have proposed several distributed scheduling algorithms when several applications are to be simultaneously mapped onto an heterogeneous platform.

Another important and still open issue for Divisible Load Scheduling deals with return communication. Under the classical model, it is assumed that the communication time of the results between the slaves and the master node can be neglected, what strongly limits the application field. In particular, the complexity of the problem with return messages is still opened. This question has been studied in cooperation with Abhay Ghatpande, from Waseda University in , , . In particular, we have proposed two heuristics for scheduling return messages with different computational costs. We have also solved several special cases (when return messages are very small, or communication times are very small with respect to processing times), and we have provided the first known approximation algorithm for DLS with return messages in .

In this context, we have participated in the writing of two book chapters, about different possible modelisations of communications and about steady-state scheduling.

We have revisited several classical scheduling problems (Broadcasting, independent tasks scheduling) under more realistic communication models, whose parameters can be instanciated at runtime. We have proved that the use of resource augmentation techniques enables to derive quasi-optimal algorithms even if the underlying scheduling problems are strongly NP-Complete.

In many distributed applications on large distributed
systems, nodes may offer some local resources and request
some remote resources. For instance, in a distributed
storage environment, nodes may offer some space to store
remote files and request some space to duplicate remotely
some of their files. In the context of broadcasting,
offer may be seen as the outgoing bandwidth and request
as the incoming bandwidth. In the context of load
balancing, overloaded nodes may request to get rid of
some tasks whereas underloaded nodes may offer to process
them. In this context, we propose a distributed
algorithm, called
*dating service*which is meant to randomly match
demands and supplies of some resource of many nodes into
couples. In a given round it produces a matching between
demands and supplies which is of linear size (compared to
the optimal one), even if available resources of
individual nodes are very heterogeneous, and is chosen
uniformly at random from all matchings of this size.

We believe that this basic operation can be of great interest in many practical applications and could be used as a building block for writing efficient software on large distributed unstable platforms. We plan to demonstrate its practical efficiency for content distribution, management of large databases and distributed storage applications described in Section .

We also have ongoing work on using this dating service for the maintenance of a randomized overlay network against arbitrary arrivals and departures of nodes, and are trying to remove the requirement for the algorithm to work in a succession of rounds.

In this context, we would like to propose a distributed algorithm to dynamically build clusters of nodes able to process large tasks. These sets of nodes should satisfy constraints on the overall available memory, on its processing power together with constraints on the maximal latency between nodes and the minimal bandwidth between two participating nodes.

We believe that such a distributed service would enable to consider a much larger application field. We plan to demonstrate first its practical efficiency for the application of molecular dynamics (based on NAMD) described in more detail in Section .

In
we present a modeling of this
problem called
*bin-covering problem with distance constraint*and
we propose a distributed approximation algorithm in the
case where the elements are in a space of dimension 1.
In
, we describe a generic
2-phases algorithm, based on resource augmentation and
whose approximation ratio is 1/3. We also propose a
distributed version of this algorithm when the metric
space is
(for a small value of
D) and the
norm is used to define distances. This algorithm
takes
O((4
^{D})log
^{2}n)rounds and
O((4
^{D})
nlog
n)messages both in expectation
and with high probability, where
nis the total number of hosts. The analysis of the
algorithm involves a detailed analysis of the
*skip graph*data structure, for which we obtained
new results (extending older results on
*skip lists*) in .

Recently, we have proposed several algorithms for both Bin Packing and Bin Covering in the context of specific embeddings of latencies in metric spaces (such as those generated by Sequoia or Vivaldi) and we have compared the performance of those embeddings in the context of a specific application.

In many applications on large scale distributed platforms, the application data files are distributed among the platform and the volatility in the availability of resources forbids to rely on a centralized system to locate data.

In this context, complex queries, such as finding a node holding a given set of files, or holding a file whose index is close to a given value, or a set of (close) nodes covering a given set of files, should be treated in a distributed manner. Queries built for P2P systems are much too poor to handle such requests.

We plan to demonstrate the usefulness and efficiency of such requests on the molecular dynamics application and on the continuous integration application described in Section . Again, we strongly believe that these operations can be considered as useful building blocks for most large scale distributed applications that cannot be executed in a client-server model, and that providing a library with such mechanisms would be of great interest.

A sound approach is to structure them in such a way that they reflect the structure of the application. Peers represent objects of the application so that neighbours in the peer to peer network are objects having similar characteristics from the application's point of view. Such structured peer to peer overlay networks provide a natural support for range and complex queries. We have proposed in to use complex structures such as a Voronoï tessellation, where each peer is associated to a cell in the space. Moreover, since the associated cost to compute and maintain these structures is usually extremely high for dimensions larger than 2, we have proposed to weaken the Voronoï structure to deal with higher dimensional spaces .

We are currently adapting the techniques proposed in these papers to the molecular dynamics application in collaboration with Juan Elezgaray from IECB.

Many large-scale applications, especially those requiring high throughput communications (such as peer-to-peer video on demand or file sharing), can benefit from the prediction of available bandwidth between two participating nodes. However, measuring available bandwidth incurs a large overhead, it is thus infeasible to perform the measure between all pairs of nodes. We have proposed to predict available bandwidth by using the last-mile model, in which each participating node is characterized by its upload and download capacity. We have also designed a distributed procedure for computing the appropriate values. A study based on a dataset obtained from PlanetLab shows that this model gives more precise predictions with fewer measurements than the existing solutions. This work was submitted to CC-Grid 2011, and a research report is being written.

The title of this study is “Dynamic Compact Routing Scheme”. The aim of this projet is to develop new routing schemes achieving better performances than current BGP protocols. The problems faced by the inter-domain routing protocol of the Internet are numerous:

The underlying network is dynamic: many observations of bad configurations show the instability of BGP;

BGP does not scale well: the convergence time toward a legal configuration is too long, the size of routing tables is proportional to the number of nodes of network (the network size is multiplied by 1.25 each year);

The impact of the policies is so important that the many packets can oscillated between two Autonomous Systems.

In this collaboration, we mainly focus on the scalability
properties that a new routing protocol should guarantee. The
main measures are the size of the local routing tables, and
the time (or message complexity) to update or to generate
such tables. The design of schemes achieving sub-linear space
per routers, say in
where
nis the number of AS routers, is the main challenge.
The target networks are AS-network like with more than
100,000 nodes. This projet, in colloboration with the MASCOTE
INRIA-project in Nice Sophia-Antipolis, makes the use of
simulation, developped at both sites.

The project has been used as a building block for a wider
project called EULER, Experimental UpdateLess Evolutive
Routing (see
http://

Alcatel-Lucent Bell, Antwerpen, Belgium

3 projects from INRIA: CEPAGE, GANG and MASCOTTE, France

Interdisciplinary Institute for Broadband Technology (IBBT),Belgium

Laboratoire d'Informatique de Paris 6 (LIP6), UniversitÃ© Pierre Marie Curie (UPMC), France

Department of Mathematical Engineering (INMA) UniversitÃ© Catholique de Louvain, Belgium

RACTI, Research Academic Computer Technology Institute University of Patras, Greece

CAT, Catalan Consortium: Universitat PolitÃ¨cnica de Catalunya, Barcelona and University of Girona, Spain

Cyril Banino (Yahoo!, Trondheim, Norway) did his Master degree at the University of Bordeaux in 2002 under the supervision of Olivier Beaumont and his PhD in Trondheim (N.T.N.U.). During his PhD, he worked with Olivier Beaumont on decentralized algorithms for independent tasks scheduling. This collaboration is manifested by several research visits (for a total of 5 weeks since 2003) and several joint papers (IEEE TPDS, Europar'06, IPDPS'03). He has been recently appointed at Yahoo! (Trondheim), and we started an informal collaboration with Yahoo! Research, that led to the publication of . We now plan to establish a formal collaboration on document storage in large distributed databases, request scheduling and independent tasks distribution across large distributed platforms.

We started an informal collaboration with Xavier Hanin (4SH) who has developed Xooctory and who has initiated the project Ivy which is now a project of the Apache Software Foundation. This collaboration is supported by INRIA who delegated Ludovic Courtès (INRIA SED Engineer) to work in Cepage for one year from July 2009 on a distributed version of Xooctory.

We are testing and implementing in Xooctory different scheduling algorithms that distributes the build process. We now plan to analyse the graph of dependencies of tasks in the build process to proposed dedicated scheduling algorithms.

This project in coordination with the INRIA project RUNTIME aims at designing models for communication times on heterogeneous platforms of two types: high-scale platforms for volunteer computing, and high performance NUMA machines. The goal is to reach a compromise between precision and algorithmic tractability.

The scientific objectives of ALADDIN are to solve what are identified as the most challenging problems in the theory of interaction networks. The ALADDIN project is thus an opportunity to create a full continuum from fundamental research to applications in coordination with both INRIA projects CEPAGE and GANG.

The objective of the ADT INRIA Aladdin

The objectives of USS SimGrid is to create a simulation framework that will answer (i) the need for simulation scalability arising in the HPC community; (ii) the need for simulation accuracy arising in distributed computing. The Cepage team will be involved in the development of tools to provide realistic model instantiations.

The project involves the following INRIA and CNRS teams: AlGorille, ASAP, Cepage, Graal, MESCAL, SysCom, CC IN2P3.

The goal of this ANR is the study of identifying codes in evolving graphs. Ralf Klasing is the overall leader of the project.

International Joint Project, 2011-2013, entitled “SEarch, RENdezvous and Explore (SERENE)”, on foundations of mobile agent computing, in collaboration with the Department of Computer Science, University of Liverpool. Funded by the Royal Society, U.K. Principal investigator on the UK side: Leszek Gasieniec. Ralf Klasing is the principal investigator on the French side.

The goal of ComplexHPC is to coordinate European groups
working on the use of heterogeneous and hierarchical
systems for HPC as well as the development of collaborative
activities among the involved research groups, to tackle
the problem at every level (from cores to large-scale
environments) and to provide new integrated solutions for
large-scale computing for future platforms (see
http://

Prosenjit Bose, Carleton University Canada, 18/10-22/10/2010

Jurek Czyzowicz, Université du Québec en Outaouais, February 16-March 5 and June 25-July 9, 2010

Thomas Sauerwald, MPI Saarbrücken, Germany, 01/11-15/11/2010

Andrew Collins, University of Liverpool, UK, 21/09-20/12/2010

Davide Bilo, University of L'Aquila, Italy, 30/08-15/09/2010

Colin Cooper, King's College London, UK, 04-11/06/2010

Tomasz Radzik, King's College London, UK, 07-11/06/2010

Hirotaka Ono, Kyushu University, Japan, 07-11/06/2010

Adrian Kosowski, Gdansk University of Technology, Pologne, 17-21/05/2010

Przemyslaw Uznanski, Wroclaw University, Pologne, 17-21/05/2010

Adrian Kosowski, Gdansk University of Technology, Pologne, 16-25/03/2010

Univerité Paris 6 (LIP 6), January 29-February 8, 2010 (D. Ilcinkas)

Olivier Beaumont is an Associate
Editor for
*IEEE Transcations on Parallel and Distributed
Algorithms*

Ralf Klasing is a member of the
Editorial Board of
*Theoretical Computer Science*,
*Discrete Applied Mathematics*,
*Wireless Networks*,
*Networks*,
*Journal of Interconnection Networks*,
*Parallel Processing Letters*,
*Algorithmic Operations Research*,
*Fundamenta Informaticae*, and
*Computing and Informatics*.

Cyril Gavoille is member of the Streering Commitee (as treasurer) of the PODC '10 conference.

Vice-Chair (Algorithm Track), IPDPS 2011, IEEE International Parallel and Distributed Processing Symposium, Anchorage, EU, 2011 (O. Beaumont)

Local-Chair (P2P Track), EuroPar 2011, 17th International European Conference on Parallel and Distributed Computing, Bordeaux, France, 2011 (O. Beaumont)

STACS 2011, 28th Symposium on Theoretical Aspects of Computer Science, Dortmund, Germany, March 10-12, 2011 (N. Hanusse)

SIROCCO 2011, 18th International Colloquium on Structural Information and Communication Complexity, June 26-29, 2011, Gdansk, Poland (R. Klasing)

FOMC 2011, 7th ACM SIGACT/SIGMOBILE International Workshop on Foundations of Mobile Computing (formerly known as DIALM-POMC), June 9, 2011, San Jose, California, USA (R. Klasing)

ADHOC-NOW 2011, 10th International Conference on Ad Hoc Networks and Wireless, July 18-20, 2011, Paderborn, Germany (R. Klasing)

IWOCA 2011, 22nd International Workshop on Combinatorial Algorithms, June 20-22, 2011, University of Victoria, Canada (R. Klasing)

IC3 2011, 4th International Conference of Contemporary Computing, New Delhi, Aug 8-10, 2011 (O. Beaumont)

RENPAR 2011, Rencontres francophones du Parallélisme (RenPar'20), Saint Malo, Mai 2011 (O. Beaumont)

ISCIS 2011, 26th International Symposium on Computer and Information Sciences 26-28 September 2011, London, UK, 2011 (O. Beaumont)

PODC 2010, Twenty-Ninth Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing Zurich, Switzerland, July 25-28, 2010 (O. Beaumont)

SSS 2010, 12th International Symposium on Stabilization, Safety, and Security of Distributed Systems, New York City, USA September 20-22, 2010 (O. Beaumont)

HIPC 2010, International Conference on High Performance Computing, December 19-22, Goa, India (O. Beaumont)

PASCO 2010, International Workshop on Parallel symbolic Computation, July 21st - July 23rd, 2010, Grenoble, France (O. Beaumont)

IPDPS 2010 PhD Forum, IEEE International Parallel and Distributed Processing Symposium, Atlanta, EU, 2010 (O. Beaumont, L. Eyraud-Dubois)

MARAMI 2010, 1ère conférence sur les Modèles et l'Analyse des Réseaux : Approches Mathématiques et Informatique, October 11-12, 2010 - Toulouse, France (N. Hanusse and D. Ilcinkas)

MajecSTIC 2010, MAnifestation des JEunes Chercheurs en Sciences et Technologies de l'Information et de la Communication, October 13-15, 2011, Bordeaux

DYNAS 2010 (co-chair), 2nd International Workshop on DYnamic Networks: Algorithms and Security, July 5, 2010 - Bordeaux, France (D. Ilcinkas)

DIALM-POMC 2010, 6th ACM SIGACT/SIGMOBILE International Workshop on Foundations of Mobile Computing, September 16, 2010 - Cambridge, Massachusetts, USA (D. Ilcinkas)

STACS 2010, 27th International Symposium on Theoretical Aspects of Computer Science, March 4-6, 2010, Nancy, France (R. Klasing)

ALGOSENSORS 2010, 6th International Workshop on Algorithmic Aspects of Wireless Sensor Networks, July 2010, Bordeaux, France (R. Klasing)

IWOCA 2010, 21st International Workshop on Combinatorial Algorithms, July 26-28, 2010, London, United Kingdom (R. Klasing)

ADHOC-NOW 2010, 9th International Conference on Ad Hoc Networks and Wireless, August 20-22, 2010, Edmonton, Canada (R. Klasing)

Nicolas Bonichon, Lionel Eyraud-Dubois, Cyril Gavoille, and Ralf Klasing were involved in the organization of ICALP 2010, held in July 2010 in Bordeaux. (C. Gavoille was the Conference Co-Chair, R. Klasing was the Workshops Chair, N. Bonichon and L. Eyraud-Dubois were Publicity Chairs)

Ralf Klasing (co-)organized the 1st Bordeaux Graph Workshop (BGW 2010).

Nicolas Hanusse is responsible for the working group on "Distributed Algorithms" at the LaBRI.

Ralf Klasing is the Head of the "Combinatorics and Algorithms" team of the LaBRI.

Ralf Klasing is responsible for the International Relations of the LaBRI.

Ralf Klasing is in charge of the seminar “Distributed Algorithms” at the LaBRI.

Ralf Klasing was a member of the recruitment committee MCF 481 (Université d'Evry).

Ralf Klasing acted as a referee for an NSERC Discovery Grant, the ANR Jeunes Chercheurs, and for an application for tenure and promotion.

Olivier Beaumont is the local correspondant for INRIA International Relations at INRIA Bordeaux Sud-Ouest.

Olivier Beaumont leads the local funding commitee at INRIA Bordeaux Sud-Ouest.

Olivier Beaumont was the Head of the "Combinatorics and Algorithms" team of the LaBRI (up to November 2010).

Olivier Beaumont was appointed as external member of the Ph.D. committee of Brett Becker (University of Dublin, Ireland)

Olivier Beaumont was external Ph.D. reviewer (rapporteur) and member of the Ph.D. committee of Sékou Diakite ( Université de Franche-Comté, Besaçon, France)

Olivier Beaumont was president of the Ph.D. committee of Jeremie Albert (LaBRI, University of Bordeaux)

Philippe Duchon was external Ph.D. reviewer (rapporteur) and member of the Ph.D. committee of Nicolas Le Scouarnec (INRIA Project Team ASAP, IRISA, Rennes, France)

Ralf Klasing was reviewer and discussion leader for the licentiate degree of Georgios Georgiadis (Chalmers University of Technology, Sweden, May 2010)

02/03.2010: Optimal Exploration of Terrains with Obstacles. Séminaire "Algorithmique et combinatoire" du LIAFA.

18.08.2010: Network exploration by oblivious robots. Research Meeting and School on Distributed Computing by Mobile Robots (MAC 2010).

16/09/2010:
*Bounded Multi Port Model: Motivations, Feasibility
& Application to Broadcast*. New Challenges in
Scheduling Theory, Frejus (L. Eyraud-Dubois)

The members of CEPAGE are heavily involved in teaching activities at undergraduate level (Licence 1, 2 and 3, Master 1 and 2, Engineering Schools ENSEIRB). The teaching is carried out by members of the University as part of their teaching duties, and for CNRS (at master 2 level) as extra work. It represents more than 500 hours per year.

At master 2 level, here is a list of courses taught the last two years:

Communication and Routing (last year of engineering school ENSEIRB, 2010) O. Beaumont, L. Eyraud, N. Hanusse, R. Klasing, A. Kosowski (16h)

P2P Networks and Social Graphs (2nd year of engineering school ENSEIRB, 2009) O. Beaumont (16h)

Ralf Klasing

Communication Algorithms in Networks (2nd year MASTER "Algorithms and Formal Methods" - 2010)