EN FR
EN FR


Section: Overall Objectives

General Objectives

The development of interconnection networks has led to the emergence of new types of computing platforms. These platforms are characterized by heterogeneity of both processing and communication resources, geographical dispersion, and instability in terms of the number and performance of participating resources. These characteristics restrict the nature of the applications that can perform well on these platforms. Due to middleware and application deployment times, applications must be long-running and involve large amounts of data; also, only loosely-coupled applications may currently be executed on unstable platforms.

The new algorithmic challenges associated with these platforms have been approached from two different directions. On the one hand, the parallel algorithms community has largely concentrated on the problems associated with heterogeneity and large amounts of data. On the other hand, the distributed systems community has focused on scalability and fault-tolerance issues. The success of file sharing applications demonstrates the capacity of the resulting algorithms to manage huge volumes of data and users on large unstable platforms. Algorithms developed within this context are completely distributed and based on peer-to-peer (P2P for short) communication.

The goal of our project is to establish a link between these two directions, by gathering researchers from the distributed algorithms and data structures, parallel and randomized algorithms communities. Indeed, the change in scale for distributed applications raises several new questions.

The first set of questions is related to distributed computations, where the Internet is the underlying network. Since the topology of the underlying network is unknown, the use of logical networks (overlays) is required. In turn, the choice of the overlay will have an impact on the complexity of the algorithms. In this context, only the performance of the whole chain is meaningful, which requires to collect raw data and then to propose network models and algorithms based on these models, such that the performance of the resulting algorithm is good on raw data. This also requires studying the influence of the topology of the overlay network (the underlying graph) on the complexity of fundamental questions, such as graph exploration or black-hole search.

The second set of questions is related to distributed data structures. In general, the question is related to the compromise between the size of the data structure to be stored on each node and the time to answer a request (estimate the bandwidth between two nodes, compute the closest common ancestor of two nodes in a tree) or to perform a task (route a message in a network based on the information stored at the router nodes).

In order to study these questions, our research plan is based on the following goals. Firstly, we aim both at building strong foundations for distributed algorithms (graph exploration, black-hole search,...) and distributed data structures (routing, efficient query, compact labeling...) to understand how to explore large scale networks in the context of failures and how to disseminate data so as to answer quickly to specific queries. Secondly, we aim at building simple (based on local estimations without centralized knowledge), realistic models to accurately represent resource performance and to build a realistic view of the topology of the network (based on network coordinates, geometric spanners, δ-hyperbolic spaces). Then, we aim at proving that these models are tractable by providing low complexity distributed and randomized approximation algorithms for a set of basic scheduling problems (independent tasks scheduling, broadcasting, data dissemination,...) and associated overlay networks. At last, our goal is to prove the validity of our approach through softwares dedicated to several applications (molecular dynamics simulations, continuous integration) as well as more general tools related to the model we propose (AlNEM for automatic topology discovery, SimGRID for simulations at large scale) and collections of datasets (Hubble for the continuous integration DAGs, Bedibe for latency and bandwidth measurements).

We will concentrate on the design of new services for computationally intensive applications, consisting of mostly independent tasks sharing data, with application to distributed storage, molecular dynamics and distributed continuous integration, that will be described in more details in Section  5

Most of the research (including ours) currently carried out on these topics relies on a centralized knowledge of the whole (topology and performances) execution platform, whereas recent evolutions in computer networks technology yield a tremendous change in the scale of these networks. The solutions designed for scheduling and managing compact data structures must be adapted to these systems, characterized by a high dynamism of their entities (participants can join and leave at will), a potential instability of the large scale networks (on which concurrent applications are running), and the increasing probability of failure.

P2P systems have achieved stability and fault-tolerance, as witnessed by their wide and intensive usage, by changing the view of the networks: all communication occurs on a logical network (fixed even though resources change over time), thus abstracting the actual performance of the underlying physical network. Nevertheless, disconnecting physical and logical networks leads to low performance and a waste of resources. Moreover, due to their original use (file exchange), those systems are particularly well suited to exact search using Distributed Hash Tables (DHT's) and are based on fixed regular virtual topologies (Hypercubes, De Bruijn graphs...). In the context of the applications we consider, more complex queries and services will be required (finding the set of edges used for content distribution, finding a set of replicas covering the whole database) and, in order to reach efficiency, unstructured virtual topologies must be considered.

In this context, the main scientific challenges of our project are:

  • Models:

    • At a low level, to understand the underlying physical topology and to obtain both realistic and instanciable models. This requires expertise in graph theory (all the members of the project) and platform modelling (Olivier Beaumont, Nicolas Bonichon, Lionel Eyraud-Dubois and Ralf Klasing). The obtained results will be used to focus the algorithms designed in Section  6.1 and Section  6.2 .

    • At a higher level, to derive models of the dynamism of targeted platforms, both in terms of participating resources and resource performances (Olivier Beaumont, Philippe Duchon). Our goal is to derive suitable tools to analyze and prove algorithm performances in dynamic conditions rather than to propose stochastic modeling of evolutions (see Section  2.2 ).

  • Overlays and distributed algorithms:

    • To understand how to augment the logical topology in order to achieve the good properties of P2P systems. This requires knowledge in P2P systems and small-world networks (Olivier Beaumont, Nicolas Bonichon, Philippe Duchon, Nicolas Hanusse, Cyril Gavoille). The obtained results will be used for developing the algorithms designed in Sections  6.2 and  6.3 .

    • To build overlays dedicated to specific applications and services that achieve good performances (Olivier Beaumont, Nicolas Bonichon, Philippe Duchon, Lionel Eyraud-Dubois, Ralf Klasing, Adrian Kosowski). The set of applications and services we target will be described in more details in Section  4.1 .

    • To understand how to dynamically adapt scheduling algorithms (in particular collective communication schemes) to changes in network performance and topology, using randomized algorithms (Olivier Beaumont, Nicolas Bonichon, Nicolas Hanusse, Philippe Duchon, Adrian Kosowski, Ralf Klasing) (see Section  6.2 ).

    • To study the computational power of the mobile agent systems under various assumptions on few classical distributed computing problems (exploration, mapping problem, exploration of the network in spite of harmful hosts. The goal is to enlarge the knowledge on the foundations of mobile agent computing. This will be done by developing new efficient algorithms for mobile agent systems and by proving impossibility results. This will also allow us to compare the different models (David Ilcinkas, Ralf Klasing, Adrian Kosowski, Evangelos Bampas) (see Section  6.2 ).

  • Compact and distributed data structures:

    • To understand how to dynamically adapt compact data structures to changes in network performance and topology (Nicolas Hanusse, Cyril Gavoille) (Section  6.3 )

    • To design sophisticated labeling schemes in order to answer complex predicates using local labels only (Nicolas Hanusse, Cyril Gavoille) (Section  6.3 )

We will detail in Section  5 how the various expertises in the team will be employed for the considered applications.

We therefore tackle several problems related to two priorities that INRIA identified in its strategic plan (2008-2012): "Modeling, Simulation and Optimization of Complex Dynamic Systems" and "Information, Computation and Communication Everywhere "