MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes

grand-large Calcul parallèle et distribué à grande échelle NUM Franck Cappello Research Director at INRIA-Futurs Franck Cappello Middleware Design, Implementation and Test Joffroy Beauquier Verification Serge Petiton Large Scale Numerical Computing Gina Grisvard Administrative assistant INRIA-Futurs Joffroy Beauquier Professor at Paris-Sud University Franck Cappello Research Director at INRIA-Futurs Gilles Fedak Junior Researcher at INRIA-Futurs Thomas Hérault Assistant Professor at Paris Sud University Serge Petiton Professor at University of Science and Technology of Lille Brigitte Rozoy Professor at Paris-Sud University Sébastien Tixeuil Assistant Professor at Paris-Sud University Lamine Aouad Teaching Assistant at Lille 1 University Samir Djilali Teaching Assistant at Paris XII University Pierre Lemarinier Teaching Assistant at Paris-Sud University Derrick Kondo INRIA Post-Doctoral Fellow Bertier Marin Teaching Assistant at Paris-Sud University Philippe Gauron Teaching Assistant at Paris-Sud University Aurélien Bouteiller MESR Grant (LRI) Matthieu Cargnelli EADS Industrial Grant (CIFRE) Laurent Choy INRIA et Région Nord (LIFL) Toussaint Guglielmi MESR Grant (LIFL) William Hoarau MESR Grant (LRI) Benoit Hudzia Franco-Irish Grant (LIFL) David Ilcinkas MESR Grant (LRI) Nicolas Nisse MESR Grant (LRI) Oleg Lodygensky LaL Engineer (Laboratoire de l'Accelerateur Lineaire) Benjamin Quettier MESR Grant (LRI) Baohua Wei Industrial Chinese Grant (LRI) Pierre Fraigniaud Research Director at CNRS Rosaz Laurent Assistant Professor at Paris Sud University Julien Leduc INRIA Exper Engineer Phillipe Marty INRIA Expert Engineer Vincent Neri CNRS Study Engineer Eric Rodriguez INRIA Associate Engineer Overall Objectives Grand-Large General Objectives

Grand-Large is a Grid research project investigating the issues raised by computing on Large Scale Distributed Systems (LSDS), where participants execute different applications on shared resources belonging to other participants, possibly geographically and administratively independent. More specifically, we consider large scale parallel and distributed computing on P2P, Global Computing and Desktop Grid systems. Our research focuses on middleware and low level programming environments design, proof and experiments. Fundamentally, we address the impact of LSDS, gathering several methodological tools: theoretical models, simulators, emulators and real size systems.

The project aims:

to study experimentally, and formally, the fundamental mechanisms of LSDS for high performance computing;

to design, implement, validate and test real software, middleware and platform;

to define, evaluate and experiment approaches for programming applications on these platforms.

Compared to other European and French projects, we gather skills in large scale systems (large scale scheduling, volatility tolerance, heterogeneity, inter administration domain security, etc.) acquired with the XtremWeb project (LRI, Cluster and Grid team), formal design and validation of algorithms and protocols for distributed systems (LRI, Parallelism team) and programming, evaluation, analysis and definition of programming languages and environments for parallel architectures and distributed systems (LIFL, methodologies and parallel algorithms).

This project pursues short and long term researches aiming to have scientific and industrial impacts. Research topics include:

the design of a middleware enlarging the application domain of Desktop Grid;

resource discovery engine on large scale system with volatil participants;

large scale storage on volatile nodes;

simulation of large scale scheduling;

fault tolerant MPI for large scale systems;

algorithm for large scale fault tolerance;

protocol verification;

algorithms, programming and evaluation of scientific applications on desktop Grids;

tools and languages for large scale computing.

These researches should have some applications in the domain of LSDS, Grid and large clusters.

At a longer term, we investigate the convergence conditions of Global Computing, P2P and Grid systems (how Grid Services can be used in Desktop Grid) and experimental tools for improving the methodology associated with research in LSDS. For example we have the responsibility of the Grid eXplorer project founded by the French ministry of research and we are deeply involved in the Grid5000 project.

Scientific Foundations Large Scale Distributed Systems (LSDS)

What makes a fundamental difference between pioneer Global Computing systems such as Seti@home, Distributed.net and other early systems dedicated to RSA key cracking and former works on distributed systems is the large scale of these systems. The notion of Large Scale is linked to a set of features that has to be taken into account if the system should scale to a very high number of nodes. An example is the node volatility: a non predictable number of nodes may leave the system at any time. Some researches even consider that they may quit the system without any prior mention and reconnect the system in the same way. This feature raises many novel issues: under such assumptions, the system may be considered as fully asynchronous (it is impossible to provide bounds on message transits, thus impossible to detect some process failures), so as it is well known no consensus could be achieved on such a system. Another example of feature is the complete lack of control of nodes and networks. We cannot decide when a node contributes to the system nor how. This means that we have to deal with the in place infrastructure in terms of performance, heterogeneity and dynamicity but also with the fact that any node may intermittently inject Byzantine faults. These features set up a new research context in distributed systems. The Grand-Large project aims at investigating theoretically as well as experimentally the fundamental mechanisms of LSDS, especially for the high performance computing applications.

Computing on Large Scale Global Computing systems

Currently, largest LSDS are used for Computing (SETI@home, Folding@home, Decrypthon, etc.), file exchanges (Napster, Kazaa, eDonkey, Gnutella, etc.), networking experiments (PlanetLab, Porivo) and communication such as instant messaging and phone over IP (Jabber, Skype). In the High Performance Computing domain, LSDS have emerged while the community was considering clustering and hierarchical designs as good performance-cost tread-offs.

LSDS as a class of Grid systems, essentially extends the notion of computing beyond the frontier of administration domains. The very first paper discussing this type of systems presented the Worm programs and several key ideas that are currently investigated in autonomous computing (self replication, migration, distributed coordination, etc.). LSDS inherit the principle of aggregating inexpensive, often already in place, resources, from past research in cycle stealing/resource sharing. Due to its high attractiveness, cycle stealing has been studied in many research projects like Condor , Glunix and Mosix , to cite a few. A first approach to cross administration domains was proposed by Web Computing projects such as Jet , Charlotte , Javeline , Bayanihan , SuperWeb , ParaWeb and PopCorn . These projects have emerged with Java taking benefit of the virtual machine properties: high portability across heterogeneous hardware and OS, large diffusion of virtual machine in Web browsers and a strong security model associated with bytecode execution. Performance and functionality limitations are some of the fundamental motivations of the recent generation of Global Computing systems like COSM , BOINC and XtremWeb .

The high performance potential of LSDS platforms has also raised a significant interest in the industry. Companies like Entropia , United Devices , Platform , Grid systems and Datasynapse propose LSDS middleware often known as Desktop Grid or PC Grid systems. Performance demanding users are also interested by these platforms, considering their cost-performance ratio which is even lower than the one of clusters. Thus, several Desktop Grid platforms are daily used in production in large companies in the domains of pharmacology, petroleum, aerospace, etc.

LSDS systems share with Grid a common objective: to extend the size and accessibility of a computing infrastructure beyond the limit of a single administration domain. In , the authors present the similarities and differences between Grid and Global Computing systems. Two important distinguishing parameters are the user community (professional or not) and the resource ownership (who own the resources and who is using them). From the system architecture perspective, we consider two main differences: the system scale and the lack of control of the participating resources. These two aspects have many consequences, at least on the architecture of system components, the deployment methods, programming models, security (trust) and more generally on the theoretical properties achievable by the system.

Building a Large Scale Distributed System for Computing

This set of studies considers the XtremWeb project as the basis for research, development and experimentation. This LSDS middleware is already operational. This set gathers 4 studies aiming at improving the mechanisms and enlarging the functionalities of LSDS dedicated to computing. The first study considers the architecture of the resource discovery engine which, in principle, is close to an indexing system. The second study concerns the storage and movements of data between the participants of a LSDS. In the third study, we will address the issue of scheduling in LSDS in the context of multiple users and applications. Finally the last study seeks to improve the performance and reduce the resource cost of the MPICH-V fault tolerant MPI for desktop grids.

The resource discovery engine

A multi-users/multi-applications LSDS system for computing would be in principle very close to a P2P file sharing system such as Napster , Gnutella and Kazaa , except that the ultimate shared resource is the CPUs instead of files. The scale and lack of control are common features of the two kinds of systems. Thus, it is likely that similar solutions will be adopted for their fundamental mechanisms such as lower level communication protocols, resource publishing, resource discovery and distributed coordination. As an example, recent P2P projects have proposed distributed indexing systems like CAN , CHORD , PASTRY and TAPESTRY that could be used for resource discovery in a LSDS dedicated to computing.

The resource discovery engine is composed of a publishing system and a discovery engine, which allow a client of the system to discover the participating nodes offering some desired services. Currently, there is as much resource discovery architectures as LSDS and P2P systems. The architecture of a resource discovery engine is derived from some expected features such as speed of research, speed or reconfiguration, volatility tolerance, anonymity, limited used of the network, matching between the topologies of the underlying network and the virtual overlay network. The currently proposed architectures are not well motivated and seem to be derived from arbitrary choices.

This study has two objectives: a) compare some existing resource discovery architectures (centralized, hierarchical, fully distributed) with relevant metrics; and b) potentially propose a new protocol improving some parameters. Comparison will consider the theoretical aspects of the resource discovery engines as well as their actual performance when exposed to real experimental conditions.

Data storage and movement

Application data movements and storage are major issues of LSDS since a large class of computing applications requires the access of large data sets as input parameters, intermediary results or output results.

Several architectures exist for application parameters and results communication between the client node and the computing ones. XtremWeb uses an indirect transfer through the task scheduler which is implemented by a middle tier between client and computing nodes. When a client submits a task, it encompasses the application parameters in the task request message. When a computing node terminates a task, it transfers it to the middle tier. The client can then collect the task results from the middle tier. BOINC follows a different architecture using a data server as intermediary node between the client and the computing nodes. All data transfers still pass through a middle tier (the data server). DataSynapse allows direct communications between the client and computing nodes. This architecture is close to the one of file sharing P2P systems. The client uploads the parameters to the selected computing nodes which return the task results using the same channel. Ultimately, the system should be able to select the appropriate transfer approach according to the performance and fault tolerance issues. We will use real deployments of XtremWeb to compare the merits of these approaches.

Currently there is no LSDS system dedicated to computing that allows the persistent storage of data in the participating nodes. Several LSDS systems dedicated to data storage are emerging such as OCEAN Store and Ocean . Storing large data sets on volatile nodes requires replication techniques. In CAN and Freenet, the documents are stored in a single piece. In OceanStore, Fastrack and eDonkey, the participants store segments of documents. This allows segment replications and the simultaneous transfer of several documents segments. In the CGP2P project, a storage system called US has been proposed. It relies on the notion of blocs (well known in hard disc drivers). Redundancy techniques complement the mechanisms and provide raid like properties for fault tolerance. We will evaluate the different proposed approaches and the how replication, affinity, cache and persistence influence the performances of computational demanding applications.

Scheduling in large scale systems

Scheduling is one of the system fundamental mechanisms. Several studies have been conducted in the context of Grid mostly considering bag of tasks, parameter sweep or workflow applications , . Recently some researches consider scheduling and migrating MPI applications on Grid . Other related researches concern scheduling for cycle stealing environments . Some of these studies consider not only the dynamic CPU workload but also the network occupation and performance as basis for scheduling decisions. They often refer to NWS which is a fundamental component for discovering the dynamic parameters of a Grid. There are very few researches in the context of LSDS and no existing practical ways to measure the workload dynamics of each component of the system (NWS is not scalable). There are several strategies to deal with large scale system: introducing hierarchy or/and giving more autonomy to the nodes of the distributed system. The purpose of this research is to evaluate the benefit of these two strategies in the context of LSDS where nodes are volatile. In particular we are studying algorithms for fully distributed and asynchronous scheduling, where nodes take scheduling decisions only based on local parameters and information coming from their direct neighbors in the system topology. In order to understand the phenomena related to full distribution, asynchrony and volatility, we are building a simulation framework called V-Grid. This framework, based on the Swarm multi-agent simulator, allows describing an algorithm, simulating its execution by thousands of nodes and visualizing dynamically the evolution of parameters, the distribution of tasks among the nodes in a 2D representation and the dynamics of the system with a 3D representation. We believe that visualization and experimentation are a first necessary step before any formalization since we first need to understand the fundamental characteristics of the systems before being able to model them.

Extension of MPICH-V

MPICH-V is a research effort with theoretical studies, experimental evaluations and pragmatic implementations aiming to provide a MPI implementation based on MPICH , featuring multiple fault tolerant protocols.

There is a long history of research in fault tolerance for distributed systems. We can distinguish the automatic/transparent approach from the manual/user controlled approach. The first approach relies either on coordinated checkpointing (global snapshot) or uncoordinated checkpointing associated with message logging. A well known algorithm for the first approach has been proposed by Chandy and Lamport . This algorithm requires restarting all processes even if only one process crashes. So it is believed not to scale well. Several strategies have been proposed for message logging: optimistic , pessimistic , causal . Several optimizations have been studied for the three strategies. The general context of our study is high performance computing on large platforms. One of the most used programming environments for such platforms is MPI.

Whithin the MPICH-V project, we have developed and published 3 original fault tolerant protocols for MPI: MPICH-V1 , MPICH-V2 , MPICH-V/CL . The two first protocols rely on uncoordinated checkpointing associated with either remote pessimistic message logging or sender based pessimistic message logging. We have demonstrated that MPICH-V2 outperforms MPICH-V1. MPICH-V/CL implements a coordinated checkpoint strategy (Chandy-Lamport) removing the need of message logging. MPICH-V2 and V/CL are concurrent protocols for large clusters. We have compared them considering a new parameter for evaluating the merits of fault tolerant protocols: the impact of the fault frequency on the performance. We have demonstrated that the stress of the checkpoint server is the fundamental source of performance differences between the two techniques. Under the considered experimental conditions, message logging becomes more relevant than coordinated checkpoint when the fault frequency reach 1 fault every 4 hours, for a cluster of 100 nodes sharing a single checkpoint server, considering a data set of 1 GB on each node and a 100 Mb/s network.

The next step in our research is to investigate a protocol dedicated for hierarchical desktop Grid (it would also apply for Grids). In such context, several MPI executions take place on different clusters possibly using heterogeneous networks. An automatic fault tolerant MPI for HDG or Grids should tolerate faults inside clusters and the crash or disconnection of a full cluster. We are currently considering a hierarchical fault tolerant protocol combined with a specific runtime allowing the migration of full MPI executions on clusters independently of their high performance network hardware.

The performance and volatility tolerance of MPICH-V make it attractive for :

large clusters;

clusters made from collection of nodes in a LAN environment (Desktop Grid);

Grid deployments harnessing several clusters;

and campus/industry wide desktop Grids with volatile nodes (i.e. all infrastructures featuring synchronous networks or controllable area networks).

Volatility and Reliability Processing

In a global computing application, users voluntarily lend the machines, during the period they don't use them. When they want to reuse the machines, it is essential to give them back immediately. There is no time for saving the state of the computation. Because the computer may not be available again, it is necessary to organize checkpoints. When the owner takes control of his machine, one must be able to continue the computation on another computer from a checkpoint as near as possible from the interrupted state. The problem that arises from this way of managing computations are numerous and difficult. They can be put into two categories: synchronization and repartition problems.

Synchronization problems (example). Suppose that the machine that is supposed to continue the computation is fixed and has a recent checkpoint. It would be easy to consider that this local checkpoint is a component of a global checkpoint and to simply rerun the computation. But on one hand the scalability and on the other hand the frequency of disconnections makes the use of a global checkpoint totally unrealistic. Then the checkpoints have to be local and the problem of synchronizing the recovery machine with the application is raised.

Repartition problems (example). As it is also unrealistic to wait for the computer to be available again before rerunning the interrupted application. One has to design a virtual machine organization, where a single virtual machine is implemented as several real ones. With too few real machines for a virtual one, one can produce starvation; with too many, the efficiency is not optimal. The good solution is certainly in a dynamic organization.

These types of problems are not new ( ). They have been studied deeply and many algorithmic solutions and implementations are available. What is new here and makes these old solutions not usable is scalability. Any solution involving centralization is impossible to use in practice. Previous works validated on former networks can not be reused.

Reliability Processing

We voluntarily presented in a separate section the volatility problem because its specificity both with respect to type of failures and to frequency of failures. But in a general manner, as any distributed system, a global computing system has to resist to a large set of failures, from crash failures to Byzantine failures, that are related to incorrect software or even malicious actions (unfortunately, this hypothesis has to be considered as shown by DECRYPTON project or the use of erroneous clients in SETI@HOME project), with transient failures as loss of message duplication in between. On the other hand, failures related accidental or malicious memory corruptions have to be considered because they are directly related of the very nature of the Internet. Traditionally, two approaches (masking and non-masking) have been used to deal with reliability problems. A masking solution hides the failures to the user, while a non-masking one may let the user notice that failures occur. Here again, there exists a large literature on the subject (cf. for surveys). Masking techniques, generally based on upon consensus, because they systematically use generalized broadcasting are not scalable. The self-stabilizing approach (a non-masking solution) is well adapted (specifically its time adaptive version, cf. , , , ) for three main reasons:

Low overhead when stabilized. Once the system is stabilized, the overhead for maintaining correction is slow because it only involves communications between neighbors.

Good adaptivity to the reliability level. Except when considering a system that is continuously under attacks, self-stabilization provides very satisfying solutions. The fact that during the stabilization phase, the correctness of the system is not necessarily satisfied is not a problem for all kind of application.

Lack of global administration of the system. A peer to peer system does not admit a centralized administrator that would be recognized by all components. A human intervention is thus not feasible and the system has to recover by itself from the failures of one or several components, that is precisely the feature of self-stabilizing systems.

We propose:

To study the reliability problems arising from a global computing system, and to design self-stabilizing solutions, with a special care for the overhead.

For problem that can be solved despite continuously unreliable environment (such as information retrieval in a network), to propose solutions that minimize the overhead in space and time resulting from the failures when they involve few components of the system.

For most critical modules, to study the possibility to use consensus based methods.

To build an adequate model for dealing with the tradeoff between reliability and cost.

Verification of Protocols

For the past few years, a number of distributed algorithms or protocols that were published in the best conferences or scholar journals were found to be incorrect afterwards. Some have been exploited for several years, appearing to behave correctly. We do not pretend to design and implement fault free and vulnerability free systems, but we want at least to limit their failures. This goal is achieved by the formal verification, at an abstract level, of the implemented solutions. Obviously, algorithms are not to be verified by hand (incorrect algorithms were provided with proofs), but rather by verification tools we developed (MARELLA) or proof assistants. We propose that a substantial effort is done towards modelization and verification of probabilistic protocols, which offer in a large number of cases efficient and low cost solutions. We also propose to design a model that includes the environment. Indeed, computations of a distributed system are non-deterministic due to the influence of numerous external factors, such as the communication delays due to traffic overhead, the fact that failures can occur somewhere rather than somewhere else, etc. To prove a protocol independently of its environment is pointless, and this is why the environment must be part of the model.

Parallel Programming on Peer-to-Peer Platforms (P5)

Scientific applications that have traditionally performed on supercomputers may now run on a variety of heterogeneous resources geographically distributed. New grand challenge applications would have to be solved on large scale P2P systems. Peer-to-Peer computing paradigm for large scale scientific and engineering applications is emerging as a new potential solution for end-user scientists and engineers. We have to experiment and to evaluate such programming to be able to propose the larger possible virtualisation of the underlying complexity for the end-user.

Large Scale Computational Sciences and Engineering

Parallel and distributed scientific application developments and resource managements in these environments are a new and complex undertaking. In scientific computation, the validity of calculations, the numerical stability, the choices of methods and software are depending of properties of each peer and its software and hardware environments; which are known only at run time and are nondeterministic. The research to obtain acceptable frameworks, methodologies, languages and tools to allow end-users to solve accurately their applications in this context is capital for the future of this programming paradigm.

GRID scientific and engineering computing exists already since a decade. Since the last few years, the scale of the problem sizes and the global complexity of the applications increase rapidly . The scientific simulation approach is now general in many scientific domains, in addition to theoretical and experimental aspects, often link to more classic methods. Several applications would be computed on world-spread networks of heterogeneous computers using some web-based Application Server Provider (ASP) dedicated to targeted scientific domains. New very strategic domains, such as Nanotechnologies, are in the forefront of these applications. The development in this very important domain and the leadership in many scientific domains will depend in a close future to the ability to experiment very large scale simulation on adequate systems , . The P2P scientific programming is a potential solution, which is based on existing computers and networks. The present scientific applications on such systems are only concerning problems which are mainly data independents: i.e. each peer does not communicate with the others. To come at his age, P2P programming has to be able to develop parallel programming with more sophisticate dependencies between peers. It is the goal of our researches.

Experimentations and Evaluations

We have, first, to experiment on large P2P platforms to be able to obtain a realistic evaluation of the performance we can expect. We can also set some hypothesis on peers, networks, and scheduling to be able to have theoretical evaluations of the potential performance. We follow these two tracks. We choose a classical linear algebra method well-adapted to large granularity parallelism and asynchronous scheduling: the block Gauss-Jordan method to invert dense very large matrices. We also choose the calculation of one matrix polynomial, which generate computation schemes similar to many linear algebra iterative methods, well-adapted for very large sparse matrices. Thus, we were able to theoretically evaluate the potential throughput with respect to several parameters such as the matrix size and the multicast network speed. Since these evaluations, we begin to experiment the same parallel methods on a few dozen peer XtremWeb P2P Platform. We plan to continue these experimentations on larger platforms to compare these results to the theoretical ones. Then, we would be able to extrapolate and obtain potential performance for some scientific applications. Experimentations and evaluation for several linear algebra methods for large matrices on P2P systems will always be developed all along the Grand Large project, to be able to confront the different results to the reality of the existing platforms. As a challenge, we would like to efficiently invert a dense matrix of size one million using a several thousand peer platform.

Beyond the experimentations and the evaluations, we propose the basis of a methodology to efficiently program such platforms, which allow us to define languages, tools and interface for the end-user.

Languages, Tools and Interface

The underlying complexity of the Large Scale P2P programming has to be mainly virtualized for the end-user. We have to propose an interface between the end-user and the middleware which may extract the end-user expertise or propose an on-the-shelf general solution. Targeted applications concern very large scientific problems which have to be developed using component technologies and up-to-dated software technologies.

We may develop component-based technology interface which express the dependencies between computing tasks which composed the parallel applications. Then, instead of computing task we will manage components. We introduced the YML language which allows us to express the dependencies between components, specified using XML. Nevertheless, many component criteria depend of peer characteristics and are known only at runtime. Then, we had to introduce different classes of components, depending of the level of abstraction they are concern to. A component catalogue has to be at the end-user level and another one has to be at the middleware and peer level. Then, a scheduler has to attribute a computing component to a peer with respect to the software proposed by this one, or has to decide to load new software to the targeted peer.

The YML framework and language propose a solution to develop scientific applications to P2P platform. An end-user can directly develop programs using this framework. Nevertheless, many end-users would prefer to do not program at this component and dependency graph level. Then, an interface has to be proposed, using the YML framework. This interface may be dedicated to a special scientific domain to be able to focus on the end-user vocabulary and P2P programming knowledge.

Based on the SPIN project, we plan to develop such version based on the YML framework and language. The first targeted scientific domain will be very large linear algebra for dense or sparse matrices.

Methodology and Technologies for Large Scale Distributed Systems

Research in the context of LSDS involves understanding large scale phenomena from the theoretical point of view up to the experimental one under real life conditions. The general research context should also considers the fundamental technological trend toward a convergence between Grid and P2P systems.

Metodology

One key aspects of the impact of large scale on LSDS is the emergence of phenomena which are not coordinated, intended or expected. These phenomena are the results of the combination of static and dynamic features of each component of LSDS: nodes (hardware, OS, workload, volatility), network (topology, congestion, fault), applications (algorithm, parameters, errors), users (behavior, number, friendly/aggressive).

Grand-Large aims at gathering several complementary techniques to study the impact of large scale in LSDS: theoretical models, simulation, emulation and experimentation on real platforms. Fundamental aspects of LSDS as well as the development of middleware platforms are already existing in Grand-Large. We are also involved in the development and deployment of simulators and emulators and real platforms (testbed).

We are currently developing a simulator of LSDS called V-Grid aiming at discovering, understanding and managing implicit uncoordinated large scale phenomena. Several Grid simulators have been developed by other teams: SimGrid GridSim , Briks . All these simulators considers relatively small scale Grids. They have not been designed to scale and simulate 10 K to 100 K nodes. Other simulators have been designed for large multi-agents systems such as Swarm but many of them considers synchronous systems where the system evolution is guided by phases. V-Grid is built from Swarm and adds asynchrony in the simulator, node volatility and a set of specialized features for controlling and measuring the simulation of LSDS. To exemplify the need of such simulator, we are first considering the fully distributed scheduling problem. Using V-Grid for comparing several algorithms, we have already demonstrate the need for complementary visualization tools, showing the evolution of key system parameters, presenting the distributed system topology, nodes and network global trends in a 2 dimensional shape and presenting the dynamics of the system component activity in a 3 dimensional shape. Using this last representation, we have discover unexpected large scale phenomena which would be very difficult to predict by a theoretical analysis of the simulated platform features and the scheduling algorithms.

Emulation is another tool for experimenting systems and networks with a higher degree of realism. Compared to simulation, emulation can be used to study systems or networks 1 or 2 orders of magnitude smaller in terms of number of components. However, emulation runs the actual OS/middleware/applications on actual platform. Compared to real testbed, emulation considers conducting the experiments on a fully controlled platform where all static and dynamic parameters can be controlled and managed precisely. Another advantage of emulation over real testbed is the capacity to reproduce experimental conditions. Several implementations/configurations of the system components can be compared fairly by evaluating them under the similar static and dynamic conditions. Grand-Large is leading one of the largest Emulator project in Europe called Grid explorer. This project uses a 1K CPUs cluster as hardware platform and gathers 24 experiments of 80 researchers belonging to 13 different laboratories. Experiments concern developing the emulator itself and use of the emulator to explore LSDS issues. ( http://www.lri.fr/~fci/GdX/).

Grand-Large members are also involved in the French Grid 5000 project which intents to deploy an experimental Grid testbed for computer scientists. This testbed may feature up to 5000 K CPUs gathering the resources of about 10 clusters geographically distributed over France. The clusters will be connected by a high speed network (Renater or/and other). Grand-Large is a leading team in Grid 5000, chairing the eGrid 5000 Specific Action of the CNRS which is intended to prepare the deployment and installation of Grid 5000. eGrid 5000 gathers about 30 engineers, researchers and team directors who have frequent meetings, discussing about the testbed security infrastructure, experiment setup, cluster coordination, experimental result storage, etc. ( http://www.lri.fr/~fci/AS1/).

Technological Trends

The development of LSDS has followed a trajectory parallel to the one of Grid systems such as Globus and Unicore . Nevertheless we can observe some convergence elements between LSDS and Grid. The paper gives many details about the similarities and differences between P2P and Grid systems. From the technological perspective, the evolution of Globus to GT3 with the notion of Grid services is one reason of this convergence. The evolution of LSDS toward more generic and secure systems being able to provide CPU, storage and communication sharing among participants is another element of this convergence, since the notion of controllable services is likely to emerge from this perspective of more generality and flexibility.

Nowadays, Grid Computing is considering the notion of services through OGCSA and OGSI . A Grid service is an entity that must be auto-descriptive, dynamically published, creatable and destructible, remotely invoked and manageable (including life time cycle). The standardization effort also includes the use of well defined standards (WSDL, SOAP, UDDI...) of Web Services . A typical LSDS platform gathering client nodes submitting requests to a coordination service which schedules them on a set of participating nodes can be implemented in term of services: the coordination service publishes application services and schedules their instantiations on workers; the client service requests task (association of application and parameters) executions corresponding to published application services and collects results from the coordination service; the worker service computes tasks and sends their results back to the coordination service. Note that the implementation of the coordination service can rely on sub-services such as a scheduler, a data server for parameters and results, a service repository/factory which themselves may be implemented in centralized or distributed way.

Thus we believe that LSDS could benefit from the standardization effort conducted in the Grid context by reusing the same concepts of services and by adopting the same standards (OGSA and OGSI). For example, the next version of XtremWeb will be implemented by a set of Grid services.

Application Domains Building a Large Scale Distributed System for Computing

The main application domain of the Large Scale Distributed System developed in Grand-Large is high performance computing. The two main programming models associated with our platform (RPC and MPI) allow to program a large variety of distributed/parallel algorithms following computational paradigms like bag of tasks, parameter sweep, workflow, dataflow, master worker, recursive exploration with RPC, and SPMD with MPI. The RPC programming model can be used to execute concurrently different applications codes, the same application code with different parameters and library function codes. In all these cases, there is no need to change the code. The code must only be compiled for the target execution environment. LSDS are particularly useful for users having large computational needs. They could typically be used in Research and Development departments of Pharmacology, Aerospace, Automotive, Electronics, Petroleum, Energy, Meteorology industries. LSDS can also be used for other purposes than CPU intensive applications. Other resources of the connected PCs can be used like their memory, disc space and networking capacities. A Large Scale Distributed System like XtremWeb can typically be used to harness and coordinated the usage of these resources. In that case XtremWeb deploys on Workers services dedicated to provide and manage a disc space and the network connection. The storage service can be used for large scale distributed fault tolerant storage and distributed storage of very large files. The networking service can be used for server tests in real life conditions (workers deployed on Internet are coordinated to stress a web server) and for networking infrastructure tests in real like conditions (workers of known characteristics are coordinated to stress the network infrastructure between them).

Security and Reliability of Network Control Protocols

The main application domain for self-stabilizing and secure algorithms is LSDS where correct behaviours must be recovered within finite time. Typically, in a LSDS (such as a high performance computing system), a protocol is used to control the system, submit requests, retrieve results, and ensure that calculus is carried out accordingly to its specification. Yet, since the scale of the system is large, it is likely that nodes fail while the application is executing. While nodes that actually perform the calculus can fail unpredictably, a self-stabilizing and secure control protocol ensures that a user submitting a request will obtain the corresponding result within (presumably small) finite time. Examples of LSDS where self-stabilizing and secure algorithms are used, include global computing platforms, or peer to peer file sharing systems. Another application domain is routing protocols, which are used to carry out information between nodes that are not directly connected. Routing should be understood here in its most general acceptance, e.g. at the network level (Internet routing) or at the application level (on virtual topologies that are built on top of regular topologies in peer to peer systems). Since the topology (actual or virtual) evolves quickly through time, self-stabilization ensures that the routing protocol eventually provides accurate information. However, for the protocol to be useful, it is necessary that it provides extra guarantees either on the stabilization time (to recover quickly from failures) or on the routing time of messages sent when many faults occur. Finally, additional applications can be found in distributed systems that are composed of many autonomous agents that are able to communicate only to a limited set of nodes (due to geographical or power consumption constraints), and whose environment is evolving rapidly. Examples of such systems are wireless sensor networks (that are typically large of 10000+ nodes), mobile autonomous robots, etc. It is completely unrealistic to use centralized control on such networks because they are intrinsically distributed; still strong coordination is required to provide efficient use of resources (bandwidth, battery, etc.).

End-User Tools for Computational Science and Engineering

Another Grand Large application domain is Large Scale Programming for Computational Science and Engineering. Two main approaches are proposed. First, we have to experiment and evaluate such programming. Second, we have to develop tools for end-users.

In addition to the classical supercomputing and the GRID computing based on virtual organization, the large scale P2P approach proposes new computing facilities for computational scientists and engineers. Thus, on one hand, it exists many applications, some of them are classical, such as Computational Fluid Dynamic or Quantum Physic ones, for example, and others are news and very strategic such as Nanotechnologies, which will have to use a lot of computing power for long period of time in the close future. On another hand, it emerges a new large scale programming paradigm for existing computers which can be accessible by scientific and engineer end-users for all classical application domains but also by new ones, such as some Non-Governmental Organisations. During a first period, many applications would be based on large simulations rather than classical implicit numerical methods, which are more difficult to adapt for such large problems and new programming paradigm. Nevertheless, we expected that more complex implicit methods would be adapted in the future for such programming. The potential number of peer and the planed evolution of network communications, especially multicast ones, would permit to contribute to solve some of the larger grand challenge scientific applications.

Simulations and large implicit methods would always have to compute linear algebra routines, which will be our first targeted numerical methods (we also remark that the powerful worldwide computing facilities are still rated using a linear algebra benchmark [ http://www.top500.org]). We will especially first focus on divide-and-conquer and block-based matrix methods to solve dense problems and on iterative hybrid methods to solve sparse matrix problems. As these applications are utilized for many applications, it is possible to extrapolate the results to different scientific domains.

Many smart tools have to be developed to help the end-user to program such environments, using up-to-date component technologies and languages. At the actual present stage of maturity of this programming paradigm for scientific applications, the main goal is to experiment on large platforms, to evaluate and extrapolate performance, and to propose tools for the end-users; with respect to many parameters and under some specify hypothesis concerning scheduling strategies and multicast speeds . We have to always replace the end-user at the center of this scientific programming. Then, we have to propose a framework to program P2P architectures which completely virtualized the P2P middleware and the heterogeneous hardware. Our approach is based, on one hand, on component programming and coordination languages, and on one another hand, to the development of an ASP, which may be dedicated to a targeted scientific domain. The conclusion would be a P2P scientific programming methodology based on experimentations and evaluation on an actual P2P development environment.

Software XtremWeb

XtremWeb is an open source middleware, generalizing global computing plarforms for a multi-user and multi-parallel programming context. XtremWeb relies on the notion of services to deploy a Desktop Grid based on a 3 tiers architecture. This architecture gathers tree main services: Clients, Coordinators and Workers. Clients submit requests to the coordinator which uses the worker resources to execute the corresponding tasks. Currently tasks concern computation but we are also considering the integration of storage and communication capabilities. Coordinator sub-services provide resource discovery, service construction, service instantiation and data repository for parameters and results. A major concern is fault tolerance. XtremWeb relies on passive replication and message logging to tolerate Clients mobility, Coordinator transient and definitive crashes and Worker volatility.

The Client service provides a Java API which unifies the interactions between the applications and the Coordinator. Three client applications are available: the Java API that can be used in any Java applications, a command line (shell like) interface and a web interface allowing users to easily submit requests, consult status of their tasks and retrieve results. A second major issue is the security. The origins of the treats are the applications, the infrastructure, the data (parameters and results) and the participating nodes. Currently XtremWeb provides user authentication, application sandboxing and communication encryption. We have developed deployment tools for harnessing individual PCs, PCs in University or Industry laboratories and PCs in clusters. XtremWeb provides a RPC interface for bag of tasks, parameter sweep, master worker and workflow applications. Associated with MPICH-V, XtremWeb allows the execution of unchanged MPI applications on Desktop Grids.

XtremWeb has been tested extensively harnessing a thousand of Workers and computing a million of tasks. XtremWeb is deployed in several sites: University of Lille, University of Geneva, University of Tsukuba, University of Paris Sud, University of California San Diego. In this last site, XtremWeb is the Grid engine of the Paris Sud University Desktop Grid gathering about 500 PCs. Two multi-parametric applications are to be used in production since the beginning of 2004: Aires belonging to the HEP Auger project and a protein conformation predictor using a molecular dynamic simulator.

The software, papers and presentations are available at http://www.xtremweb.net.

MPICH-V

Currently, MPICH-V proposes 6 protocols: MPICH-V1, MPICH-V2, MPICH-V/CL, and 3 algorithms for MPICH-Vcausal. MPICH-V1 implements an original fault tolerant protocol specifically developed for Desktop Grids relying on uncoordinated checkpoint and remote pessimistic message logging. It uses reliable nodes called Channel Memories to store all in transit messages. MPICH-V2 is designed for homogeneous networks like clusters where the number of reliable component assumed by MPICH-V1 is too high. It reduces the fault tolerance overhead and increases the tolerance to node volatility. This is achieved by implementing a new protocol splitting the message logging into message payload logging and event logging. These two elements are stored separately on the sender node for the message payload and on a reliable event logger for the message events. The third protocol, called MPICH-V/CL, is derived from the Chandy-Lamport global snapshot algorithm. It implements coordinated checkpoint without message logging. This protocol exhibits less overhead than MPICH-V2 for clusters with low fault frequencies. MPICH-Vcausal concludes the set of message logging protocols, implementing a causal logging. It provides less synchrony than the pessimistic logging protocols, allowing messages to influence the system before the sender can be sure that non deterministic events are logged, to the cost of appending some information to every communication. This sum of information may increase with the time, and different causal protocols, with different cut techniques, have been studied with the MPICH-V project.

MPICH-V3 will be studied for the Grids. It will rely on a new protocol mixing causal message logging and pessimistic remote logging of message events. This is a hierarchical protocol able to tolerate fault inside Grid sites (inside clusters) and faults of sites (the complete crash of clusters).

Another effort is pushed on the performances of MPICH-V for high-bandwidth networks. This introduces the necessity of zero-copy implementations and raises new problems with respect to the algorithms and their realization. The goal sought here is to provide fault tolerance without losing high performances.

In addition to fault tolerant properties, MPICH-V:

provides a full runtime environment detecting and re-launching MPI processes in case of faults;

works on high performance networks such as Myrinet, Infiniband, etc (the performances are still divided by two);

allows the migration of a full MPI execution from one cluster to another, even if they are using different high performance networks.

The software, papers and presentations are available at http://www.mpich-v.net/

YML

The complexity of P2P platforms is important. An end-user cannot manage manually such complexity. YML is a software package which allows to make use of the large scale platforms such as computing grids and Peer-to-Peer systems. It offers a set of integrated tools to describe and execute applications for that type of architectures.

YML is based on a language specially created for this project and which clearly separates computations from communications. This language is defined to make possible to program the applications independently of the middlewares used. Each grid or Peer-to-Peer middleware actually rests on its own communication, interraction and remote execution mechanisms. YML currently supports two middlewares: the Xtremweb P2P platform, and the OmniRPC framework.

YML defines a model of components in order to reach the goal of independence of the underlying middleware or middlewares. Those components are organized in two levels of catalogues. The first catalogue lists the pieces of information which are independent of the middlewares and the second family of catalogues contains information specific to a given middleware. The two catalogues are respectively known as development and execution catalogues.

YML is mainly based on two programms. First of them is a compiler for the dedicated programming language and the other one is a real time scheduler. The former deals with the development step and does not depend on the underlying middleware whereas the latter exclusively handles the execution step. These two programmes are managed by a programme in charge of dealing with the client connections named the manager.

To illustrate our approach, we did first experimentations for basic linear algebra routines on an XtremWeb P2P platform with a small number of peers. We did performance evaluations and discussed on the necessity to introduce a new accurate performance model for this new computing paradigm.

YML project was launched at the ASCI CNRS lab in 2001 and is developed now in collaboration with the University of Versailles. YML is under integration into SPIN to propose a GUI ASP. It has been successfully demonstrated at SC'05 Seattle at the INRIA booth.

The Scientific Programming InterNet (SPIN)

SPIN (Scientific Programming on the InterNet), is a scalable, integrated and interactive set of tools for scientific computations on distributed and heterogeneous environments. These tools create a collaborative environment allowing the access to remote resources.

The goal of SPIN is to provide the following advantages: Platform independence, Flexible parameterization, Incremental capacity growth, Portability and interoperability, and Web integration. The need to develop a tool such as SPIN was recognized by the GRID community of the researchers in scientific domains, such as linear algebra. Since the P2P arrives as a new programming paradigm, the end-users need to have such tools. It becomes a real need for the scientific community to make possible the development of scientific applications assembling basic components hiding the architecture and the middleware. Another use of SPIN consists in allowing to build an application from predefined components ("building blocks") existing in the system or developed by the developer. The SPIN users community can collaborate in order to make more and more predefined components available to be shared via the Internet in order to develop new more specialized components or new applications combining existing and new components thanks to the SPIN user interface.

SPIN was launched at ASCI CNRS lab in 1998 and is now developed in collaboration with the University of Versailles, PRiSM lab. SPIN is currently under adaptation to incorporate YML, cf. above. Nevertheless, we study another solution based on the Linear Algebra KErnel (LAKE), developped by the Nahid Emad team at the University of Versailles, which would be an alternative to SPIN as a component oriented integration with YML.

V-Grid

This project is in its early stage. It started officially in September 2004. V-Grid is a virtualization software for large scale distributed system emulation. This software allows folding a distributed systems 100 or 1000 times larger than the experimental testbed. V-Grid virtualizes distributed systems nodes on PC clusters, providing every virtual node its proper and confined operating system and execution environment. Thus compared to large scale distributed system simulators or emulators (like MicroGrid), V-Grid virtualizes and schedules a full software environment for every distributed system node. V-Grid research concerns emulation realism and performance. A first work concerns the definition and implementation of metrics and methodologies to compare the merits of distributed system virtualisation tools. Since there is no previous work in this domain, it is important to define what and how to measure in order to qualify a virtualization system relatively to realism and performance. We defined a set of metrics and methodologies in order to evaluate and compared virtualisation tools for sequential system. For example a key parameter for the realism is the event timing: in the emulated environment, events should occur with a time consistent with a real environment. An example of key parameter for the performance is the linearity. The performance degradation for every virtual machine should evolve linearly with the increase of the number of virtual machines. We conducted a large set of experiments, comparing several virtualisation tools including Vserver, VMware, User Mode Linux, Xen, etc. The result demonstrates that none of them provides both enough realism and performance. As a consequence, we are currently studying approaches to cope with these limits.

FAult Injection Language (FAIL)

FAIL (FAult Injection Language) is a new project that was started in 2004. The goal of this project is to provide a controllable fault injection layer in existing distributed applications (for clusters and grids). A new language was designed to implement expressive fault patterns, and a preliminary implementation of the distributed fault injector based on this language was developped.

New Results Large Scale Distributed Systems

*A survey of Grid research tools: simulators, emulators and real life platforms*

Grid infrastructures are becoming the largest and most complex distributed systems ever built. Because of their size and complexity, they raise many algorithmic challenges for security, fault tolerance, faire share and performance. When investigating a research issue, researchers are using different methodologies and different tools. Most of the published Grid studies were conducted on real produc tion infrastructures or simulators. There are others research tools such as mathematical models, emulators and large scale experimental testbeds. In , we present a survey of existing tools and methodologies to investigate Grid research issues. We describe the some mathematical models, the main generic simulators (Bricks, SimGrid, GridSim, GangSim and OptorSim), a couple of emulators (MicroGrid and Grid eXplorer) and a couple of experimental testbeds (DAS2 and Grid'5000). We briefly discuss their respective advantages and limitations and present the validation approach used by their authors.

*V-Meter: Microbenchmark pour évaluer les utilitaires de virtualisation dans la perspective de systèmes d'émulation à grande échelle*

V-GRID is a large scale emulator to test applications which need a large number of machines. To do this, we need to have many (100) virtual machines on each physical machine. We needed to choose between 4 virtualization tools to make this emulator : Vserver, Xen, UML and VMware. In , we compare performances of 3 of these systems : Vserver, UML and Xen and we show none meets all the condition specified (scalability, speed, usability,... ) for our emulator.

A Case for Efficient Execution of Data-Intense Applications with BitTorrent on Computational Desktop Grid,

Data-centric applications are still a challenging issue for large scale distributed computing systems. The emergence of new protocols and software for collaborative content distribution over Internet offers a new opportunity for efficient and fast delivery of high volume of data. In this paper, we investigate BitTorrent as a protocol for data diffusion in the context of Computational Desktop Grid. We show that BitTorrent is efficient for large file transfers, scalable when the number of nodes increases but suffers from a high overhead when transmitting small files. The paper also investigates two approaches to overcome these limitations. First, we propose a performance model to select the best of FTP and BitTorrent protocols according to the size of the file to distribute and the number of receiver nodes. Next we propose enhancement of the BitTorrent protocol which provides more predictable communication patterns. We design a model for communication performance and evaluate BitTorrent-aware versions BT-MinMin, BT-MaxMin and BT-Sufferage scheduling heuristics against a synthetic parameter-sweep application.

Impact of Event Logger on Causal Message Logging Protocols for Fault Tolerant MPIFault tolerance in MPI becomes a main issue in the HPC community. Several approaches are envisioned from user or programmer controlled fault tolerance to fully automatic fault detection and handling. For this last approach, several protocols have been proposed in the literature. In a recent paper, we have demonstrated that uncoordinated checkpointing tolerates higher fault frequency than coordinated checkpointing. Moreover causal message logging protocols have been proved the most efficient message logging technique. These protocols consist in piggybacking non deterministic events to computation message. Several protocols have been proposed in the literature. Their merits are usually evaluated from four metrics: a) piggybacking computation cost, b) piggyback size, c) applications performance and d) fault recovery performance. In this paper, we investigate the benefit of using a stable storage for logging message events in causal message logging protocols. To evaluate the advantage of this technique we implemented three protocols: 1) a classical causal message protocol proposed in Manetho, 2) a state of the art protocol known as LogOn, 3) a light computation cost protocol called Vcausal. We demonstrate a major impact of this stable storage for the three protocols, on the four criteria for micro benchmarks as well as for the NAS benchmark.

Hybrid Preemptive Scheduling of MPI Applications on the GridsTime sharing between cluster resources in a Grid is a major issue in cluster and Grid integration. Classical Grid architecture involves a higher level scheduler which submits non-overlapping jobs to the independent batch schedulers of each cluster of the Grid. The sequentiality induced by this approach does not fit with the expected number of users and job heterogeneity of Grids. Time sharing techniques address this issue by allowing simultaneous executions of many applications on the same resources.

Co-scheduling and gang scheduling are the two best known techniques for time sharing cluster resources. Co-scheduling relies on the operating system of each node to schedule the processes of every application. Gang scheduling ensures that the same application is scheduled on all nodes simultaneously. Previous work has proven that co-scheduling techniques outperforms gang scheduling when physical memory is not exhausted. In this paper, we introduce a new hybrid sharing technique providing checkpoint-based explicit memory management. It consists in co-scheduling parallel applications within a set, until the memory capacity of the node is reached, and using gang scheduling related techniques to switch from one set to another one. We compare experimentally the merits of the three solutions: Co, Gang and Hybrid Scheduling, in the context of out-of-core computing, which is likely to occur in the Grid context, where many users share the same resources. Additionally, we address the problem of heterogeneous applications by comparing hybrid scheduling to an optimized version relying on paired-scheduling. The experiments show that the hybrid solution is as efficient as the co-scheduling technique when the physical memory is not exhausted, can benefit from paired-scheduling optimization technique when applications are heterogeneous, and is more efficient than gang scheduling and co-scheduling when physical memory is exhausted.

MPICH-V Project: a Multiprotocol Automatic Fault Tolerant MPI

High performance computing platforms like Clusters, Grid and Desktop Grids are becoming larger and subject to more frequent failures. MPI is one of the most used message passing library in HPC applications. These two trends raise the need for fault tolerant MPI. The MPICH-V project focuses on designing, implementing and comparing several automatic fault tolerance protocols for MPI applications. We present an extensive related work section highlighting the originality of our approach and the proposed protocols. We present then four fault tolerant protocols implemented in a new generic framework for fault tolerant protocol comparison, covering a large spectrum of known approaches from coordinated checkpoint, to uncoordinated checkpoint associated with causal message logging. We measure the performance of these protocols on a micro-benchmark and compare them for the NAS benchmark, using an original fault tolerance test. Finally, we outline the lessons learned from this in depth fault tolerant protocol comparison for MPI applications.

Large Scale Peer to Peer Performance Evaluations

Peer to Peer Large Scale Linear Algebra, programming and experimentations

We discuss the deployment of large scale numerical algorithms on a Grid. We minimize the communications needs by using persistent storage of data and we introduce out-of-core programming for the task farming paradigm. We discuss the performances of the bisection method to compute the eigenvalues of a real symmetric tridiagonal matrix and a block-based matrix-vector product. As experimental middleware, we use the XtremWeb system on two geographic sites: the university of Lille 1 and Paris-XI university at Orsay.

Matrix Peer-to-Peer Computing With Very Large Heterogeneous Plateforms

After a short overview of global computing, also known as peer-to-peer computing, we study the deployment of linear algebra problems on such distributed environments. Some applications are very easy to adapt by means of parametric parallelism. We propose several techniques such that the persistence of data and out-of-core programming which aim to decrease communications and to deal with limited quantity of memory on peers. The experimentations use an XtremWeb platform deployed on two geographic sites in Lille, France, and Tsukuba, Japan.

Large Scale Linear System Global Computing

We present a typical parallel method GMRES to solve large sparse linear systems by the use of a lightweight GRID system XtremWeb. We discuss the performances of this implementation deployed on two XtremWeb networks: a local network with 128 nondedicated PCs in Polytech-Lille of University of Lille I in France, a remote network with 3 clusters (91 CPUs) at the HPCS laboratory of Tsukuba in Japan. We do the tests as well on the platform of supercomputer IBM SP4 and in a LAN MPI computing environment LAM-MPI. We present the advantages and drawbacks of our implementations on the three computing systems.

GMRES Method on Lightweight GRID System

We have implemented an important algorithm GMRES which is one of the key methods to resolve large, nonsymmetric, linear problems. We discuss the performances of this algorithm deployed on two XtremWeb networks: a local network with 128 non-dedicated PCs in Polytech-Lille of University of Lille I, a remote network with 3 clusters (91 CPUs) in the High Performance Computing Center of University of Tsukuba. We compare these performances with those of a MPI implementation of GMRES on the same platform.

Toward global and grid computing for large scale linear algebra problems

In this paper, we gather resources of global and grid computing platforms in order to solve a linear algebra problem. We fit the algorithm of bisection on the platform of global computing, XtremWeb, and on the platform of RPC programming, OmniRPC. Those software are deployed on two different geographic sites at the engineer school of Polytech'Lille, France, and at the HPCS laboratory of Tsukuba, Japan. The combination of two different software and two geographic sites allows to do and analyse a wide range of tests.

Cluster and Grid Matrix Computation with Persistent Storage and Out-of-core Programming

We present a performance evaluation of a large-scale numerical application on a cluster and a global Grid/Cluster platform. The computational resources are a cluster of clusters (34 nodes, 84 processors) and a local area network Grid (128 nodes), distributed on two geographic sites: Tsukuba university (Japan) and university of Lille I (France). As experimental Grid middleware we use the XtremWeb. We compare a classical MPI version with global Grid/Cluster versions. We also present and test some techniques based on out-of-core programming and an efficient data placement. We discuss the performances of a block-based Gauss-Jordan method for large matrix inversion.

Towards a scheduling policy for hybrid methods on computational grids

We propose a cost model for running particular component based applications on a computational Grid. This cost is evaluated by a metascheduler and negotiated with the user by a broker. A specific set of applications is considered: hybrid methods, where components have to be launched simultaneously.

A Hybrid GMRES-LS-Arnoldi method to accelerate the parallel solution of linear systems

We present a parallel hybrid asynchronous method to solve large sparse linear systems by the use of a large parallel machine. This method combines a parallel GMRES (m) algorithm with the Least Squares method that needs some eigenvalues obtained from a parallel Arnoldi's algorithm. All of the algorithms run on the different processors of an IBM SP3 or IBM SP4 computer simultaneously. This implementation of this hybrid method allows to take advantage of the parallelism available and to accelerate the convergence by decreasing considerably the number of iterations.

Multiple Explicitly Restarted Arnoldi Method for Solving Large Eigenproblems

We propose a new approach for calculating some eigenpairs of large sparse non-Hermitian matrices. This method, called Multiple Explicitly Restarted Arnoldi (MERAM), is well suited for environments that combine different parallel programming paradigms. This technique is based on a multiple use of the Explicitly Restarted Arnoldi method (ERAM) and improves its convergence.

This technique is implemented and tested on a distributed environment consisting of two interconnected parallel machines. The MERAM technique is compared with ERAM, and one can notice that the convergence is improved. In some cases, more than a twofold improvement can be seen in MERAM results. We also implemented MERAM on a cluster of workstations. According to our experiments, MERAM converges better than the Explicitly Restarted Block Arnoldi method and, for some matrices, more quickly than the PARPACK package, which implements the Implicitly Restarted Arnoldi method.

Volatility and Reliability Processing

Fault-Injection and Dependability Benchmarking for Grid Computing Middleware, , , In a network consisting of several thousands computers, the occurrence of faults is unavoidable. Being able to test the behavior of a distributed program in an environment where we can control the faults (such as the crash of a process) is an important feature that matters in the deployment of reliable programs.

We developped FAIL-FCI (for Fault Injection Language, and FAIL Cluster Implementation, respectively), a software tool that permits to elaborate complex fault scenarios in a simple way, while relieving the user from writing low level code. In particular, we show that not only we are able to fault-load existing distributed applications (as used in most current papers that address fault-tolerance issues), we are also able to inject qualitative faults, i.e. inject special faults at very special moments in the program code of the application under test. Finally, and although this was not the primary purpose of the tool, we are also able to inject special patterns of workload, in order to stress test the application under test. Interestingly enough, the whole process is driven by a simple unified description language, that is totally independent from the language of the application, so that no code changes or recompilation are needed on the application side. We also investigated the possibility of injecting software faults in distributed java applications. Our scheme is by extending the FAIL-FCI software , and does not require any modification of the source code of the application under test, while retaining the possibility to write high level fault scenarios. As a proof of concept, we use our tool to test FreePastry, an existing java implementation of a Distributed Hash Table (DHT), against node failures.

In the context of the Coregrid Network of Excellence, we presented in an overview of the state of the art, followed by a presentation of the FAIL-FCI system from INRIA that provides a tool for fault-injection in large distributed systems. Then we presented DBGS, a dependable Benchmark for Grid Services and we present some experimental results.

Self-stabilization, ,

We generalized in the classic dining philosophers problem to allow critical section entry conflicts between non-neighbor processes. We described a deterministic self-stabilizing solution to the new problem. We extended our solution to handle a similarly generalized drinking philosophers problem. As another extension, we described the variant that has finite failure locality. This extension allows our algorithm to tolerate process crashes.

We presented in a generic distributed algorithm for solving silents tasks such as shortest path calculus, depth-first-search tree construction, best reliable transmitters, in directed networks where communication may be only unidirectional. Our solution is written for the asynchronous message passing communication model, and tolerates multiple kinds of failures (transient and intermittent). First, our algorithm is self-stabilizing, so that it recovers correct behavior after finite time starting from an arbitrary global state caused by a transient fault. Second, it tolerates fair message loss, finite message duplication, and arbitrary message reordering, during both the stabilizing phase and the stabilized phase. This second property is most interesting since, in the context of unidirectional networks, there exists no self-stabilizing reliable data-link protocol. A formal proof establishes its correctness for the considered problem, and subsumes previous proofs for solutions in the simpler reliable shared memory communication model.

We reported in the first self-stabilizing Border Gateway Protocol (BGP). BGP is the standard inter-domain routing protocol in the Internet. Self-stabilization is a technique to tolerate arbitrary transient faults. The routing instability in the Internet can occur due to errors in configuring the routing data structures, the routing policies, transient physical and data link problems, software bugs, and memory corruption. This instability can increase the network latency, slow down the convergence of the routing data structures, and can also cause the partitioning of networks. Most of the previous studies concentrated on routing policies to achieve the convergence of BGP while the oscillations due to transient faults were ignored. The purpose of self-stabilizing BGP is to solve the routing instability problem when this instability results from transient failures. The self-stabilizing BGP presented here provides a way to detect and automatically recover from this type of faults. Our protocol is combined with an existing protocol to make it resilient to policy conflicts as well.

Byzantine Tolerance, We presented in Byzantine-robust solutions to the topology discovery problem. Our programs allow each process to learn the complete topology of the network (up to the neighborhoods of the faulty nodes). The program tolerates up to a fixed number of faults. The network topology is arbitrary. The processes do notknow either the diameter or the size of the network. The execution model is asynchronous. The processes do not use cryptographic cryptographic primitives such as digital signatures.

Self-stabilizing protocols can tolerate any type and any number of transient faults. However, in general, self-stabilizing protocols provide no guarantee about their behavior against permanent faults. We propose in a self-stabilizing link-coloring protocol resilient to (permanent) Byzantine faults in arbitrary networks. The protocol assumes the central daemon, and uses 2 $\Delta$ -1colors where $\Delta$ is the maximum degree in the network. This protocol guarantees that any link ( u, v)between nonfaulty processes uand vis assigned a color within 2 $\Delta$ + 2rounds and its color remains unchanged thereafter.Our protocol is Byzantine insensitive in the sense that the subsystem of correct processes remains operating properly in spite of unbounded Byzantine faults.

Sensor Networks, In large scale multihop wireless networks, flat architectures are not scalable. In order to overcome this major drawback, clusterization is introduced to support self-organization and to enable hierarchical routing. When dealing with multihop wireless networks, the robustness is a main issue due to the dynamicity of such networks. Several algorithms have been designed for the clusterization process. As far as we know, very few studies check the robustness feature of their clusterization protocols. In , we show that a clusterization algorithm, that seems to present good properties of robustness, is self-stabilizing. We propose several enhancements to reduce the stabilization time and to improve stability. The use of a Directed Acyclic Graph ensures that the self-stabilizing properties always hold regardless of the underlying topology. These extra criterion are tested by simulations.

We presented complexity analysis for a family of self-stabilizing vertex coloring algorithms in the context of sensor networks. First, we derived theoretical results on the stabilization time when the system is synchronous. Then, we provided simulations for various schedulings and topologies. We considered both the uniform case (where all nodes are indistinguishable and execute the same code) and the non-uniform case (where nodes make use of a globally unique identifier). Overall, our results show that the actual stabilization time is much smaller than the upper bound provided by previous studies. Similarly, the height of the induced DAG is much lower than the linear dependency to the size of the color domain (that was previously announced). Finally, it appears that symmetry breaking tricks traditionally used to expedite stabilization are in fact harmful when used in networks that are not tightly synchronized.

Space lower bounds for graph explorationWe consider the task of exploring graphs with anonymous nodes by a team of non-cooperative robots modeled as finite automata. These robots have no a prioriknowledge of the topology of the graph, or of its size. Each edge has to be traversed by at least one robot. We first show that, for any set of qnon-cooperative K-state robots, there exists a graph of size O( qK)that no robot of this set can explore. This improves the O( K^{O(
q)})bound by Rollik (1980). Our main result is an application of this improvement. It concerns exploration with stop, in which one robot has to explore and stop after completing exploration. For this task, the robot is provided with a pebble, that it can use to mark nodes. We prove that exploration with stop requires $\Omega$ (log n)bits for the family of graphs with at most nnodes. On the other hand, we prove that there exists an exploration with stop algorithm using a robot with O( Dlog $\Delta$ )bits of memory to explore all graphs of diameter at most Dand degree at most $\Delta$ .

Peer-to-peer systems conception

, , ,

Combining the use of clustering and scale-free nature of user exchanges into a simple and efficient P2P system,

It was recently observed that the user interests in P2P systems possess clusteringproperties that can be used to reduce the amount of traffic of flooding-based search strategies. Another observation shows that scale-freeproperties that can be used for the design of routing-based search strategies. In these papers, we show that the combination of these two properties enables the design of an efficient and simple fully decentralized search strategy. This search strategy is simple because it does not require maintaining any structured overlay network topology connecting the peers. It is efficient because that simulations processed on real-worldtraces show that the expected number of steps of the lookups is logarithmic in function of the size of the network.

Rechercher parmi ses pairs ou quand le hasard ne fait pas si bien les choses, tutoriel

This tutorial focus on data-search in large-scale distributed systems. We present the peer-to-peer systems and the search algorithms they use. These systems have several common properties with interaction networks, which are studied in a lot of disciplines. We show that these properties are linked to the application. We then see how to use them to design efficient peer-to-peer systems.

D2B: a de Bruijn Based Content-Addressable Network

D2B is a peer-to-peer system based on a Distributed Hash Table (DHT). DHTs allow to design large-scale distributed systems for which properties like degree and diameter can be proved. D2B uses the De Bruijntopology to route in a logarithmic number of steps in function of the number of users in the peer-to-peer system. The degree of D2B is constant in average and it is logarithmic with high probability.

Nation Wide Experimental Platforms (testbed)

Grid'5000The Grid is envisioned to become a main infrastructure to provide seamless and transparent access to computing, storage, communications and service facilities to Internet users. After a first experimentation phase, generally with a low number of resources, new projects are unveiled with the objective to build large scale Grids combining hundreds of computers around the world for thousands of users. The European EGEE project is one example in Europe.

However, Grids are very complex objects because they are fundamentally distributed systems gathering complex and potentially volatile nodes, featuring a deep software stack and connected by possibility asynchronous (best effort) networks. These systems are so complex that it is not known if one can model their behaviour with enough accuracy to predict their properties (performance, fault tolerance, security, QoS) without realizing actual experimentations. Thus observations of real Grid, experimentations with real conditions, phenomena isolation and behaviours understanding are certainly important steps towards accurate models. In that perspective, experimental testbed are fundamental methodological tools, allowing experimentation and observation of large scale phenomena in Grid and their applications. Those aspects have been surveyed in , .

The Grid 5000 project aims at building and developing a nation wide highly reconfigurable experimental testbed allowing a large variety of experiments on all the different layers of the software stack between the users and the hardware. Grid 5000 seeks to ease and support experimentations and to provide rigorous control and measurement mechanisms.

In its current state, this instrument for Grid researchers is built gathering the resources of 8 computers centres (Grid 5000 sites), connected by RENATER (the French national network for research and education), offering to the users thousands of CPUs. Each sites host a PC clusters providing from 256 to 1000 nodes (CPUs). The Grid 5000 control and provisioning environment allows to configure and install a full software stack on each Grid 5000 cluster nodes. This will give the users the unique capability to setup the exact software environment required for his experiment. Thus, the user may specify the OS, network protocols, middleware, runtimes, application and more generally all components of the software stack needed for his experiment. In addition to this configuration capabilities, Grid 5000 will offer a set of tools controlling the experimental conditions during the execution of the experiment. Basically, the user will be able to start and stop every Grid 5000 nodes, on demand.

Grid 5000 is a multi-institutions project, gathering funding from the French Ministry of research, INRIA, CNRS, University and several regional councils. The direction of the project is ensured by a Steering Committee (SC) involving the director of the ACI Grid, Thierry Priol, The director of the ACI Grid scientific committee, Brigitte Plateau, the director of RENATER, Dany Vandromme and all leaders of Grid 5000 sites. The project is implemented by a team of engineers belonging to the technical committee (TC). More than hundred researchers (permanents and Ph. D. students) will use this instrument involving about 50 engineers (10 at full time).

Regarding the INRIA, Grid'5000 is a collaborative effort of several INRIA projects (by alphabetical order): Apache, Caiman, Grand-Large, Oasis, Paris, Remap, Reso, Runtime, Scalaplix. Thus several Research Units are involved (by alphabetical order): Futurs, Rennes, Rhone Alpes and Sophia. An overview of the Grid5000 project has been published in

The role of Grand-Large in Grid 5000 is first the direction of the project, providing a vision, chairing the Steering Committee, preparing roadmaps and decisions to be discussed by the SC, preparing the SC meetings, etc. Second, Grand-Large, chairs the Technical Committee in charge of implementing the decisions of the SC and giving information to the SC to help the decision process. By being a central member in the SC and TC, Grand-Large plays a major role in Grid 5000.

One Expert Engineer funded by the INRIA is associated with this project at Orsay for its every day configuration and maintenance. Several Engineers from IDRIS and LRI have participated to the original design of Grid 5000 and help the Expert Engineer.

Grid eXplorerLarge scale distributed systems like P2P systems, Sensor Networks and Desktop Grids exhibit complex behaviours, difficult to understand because they fundamentally gather a large set of volatile nodes connected by an asynchronous network. Most of the well known techniques in distributed systems for fault tolerance do not work at this scale because their complexity is too high, they do not accept fault during some stabilisation phase or because the system is evolving in size too rapidly and too strongly. Like for the Grid, large scale distributed systems are complex to model and require prior experimentations and observations

To offer a respond to this challenge, the Grid eXplorer project aims at providing a large scale distributed system emulator. It consists first in building the emulator gathering hardware and software components and developing the unavailable software. In term of hardware, the project seeks to install a 1000 CPUs cluster using a non blocking Ethernet network as well as a non blocking high speed network for a subset of the cluster. As software, the emulation environment will allow users to configure all layers of the software stack for every experiment. In addition to this feature, shared with Grid 5000, Grid eXplorer will provide network emulators, virtualization mechanisms as well as fault injectors. The second part of the project is to address a variety of large scale distributed system issues by experimenting actual applications, distributed systems, OS and network protocols and testing new ones.

Grid eXplorer complements Grid 5000 by providing an experimental environment where the user has the capacity to control the network experimental conditions.

This project is supported by several funding sources: the ministry of research through the ACI Masses de données(Data Grid Explorer), INRIA, CNRS and the Ile de Franceregional council. The total budget of this project is about 2 Millions Euros.

Regarding the INRIA, Grid eXplorer is a collaborative effort of several INRIA projects (by alphabetical order): Apache, Grand-Large, Regal and Reso. Thus several Research Units are involved (by alphabetical order): Futurs, Rhone Alpes and Rocquencourt.

Grand-Large co-initiated this project, leads the ACI Masse de donnéesproject and managed the initial procurement as well as the hosting of the cluster, actually installed in the IDRIS laboratory at Orsay. One engineer is associated with this project at Orsay for its every day configuration and maintenance. This engineer is funded by the ACI Masse de données. Several Engineers from IDRIS and LRI have participated to the original design of Grid Explorer and help the Expert Engineer.

Other Grants and Activities Regional, National and International Actions

ACI Data Mass Grid eXplorer, 3 years, head: F. Cappello, chair: Serge Petiton

Specific Action of CNRS enabling Grid5000, 1 year, F. Cappello

Global Computing: Augernome XtremWeb, Multi-Disciplinary Project (University of Paris XI), 4 years, sub-projet chair: Franck Cappello

ACI GRID CGP2P: Global Peer to Peer Computing, 3 years, head: F. Cappello

ACI GRID 2. head: Jean Louis Pazat, sub-topic chair: F. Cappello, Serge Petiton

ACI DataGraal. head: Pierre Sens, sub-topic chair: F. Cappello

Specific Action of CNRS "Analyse Structurelle and Algorithmique des Reseaux Dynamiques" (DYNAMO), 1 year, head: P. Fraigniaud, Serge Petiton

INRIA Associated Team "F-J Grid" with University of Tsukuba, 1 year, head: Franck Cappello

ACI "GRID'5000", 3 years, head: Franck Cappello.

CIFRE EADS, 3 years, (still in discussion), head: Franck Cappello.

INRIA funding, MPI-V, collaboration with UTK, LALN and ANL, head: Franck Cappello

Sakura program with University of Tsukuba, 1 year, head: Gilles Fedak

Regional Council "Grid eXplorer", 1 year, co-chair: Franck Cappello

Mobicoop (Agents mobiles cooperatifs pour la recherche dinformations dans des reseaux non fiables) CNRS JemSTIC action, 2 years, head: S. Tixeuil.

STAR (Stabilisation des reseaux fondes sur la technologie Internet), CNRS JemSTIC action, 2 years, head: S. Tixeuil.

ACI Sécurité FRAGILE, 3 years, head : S. Tixeuil, P. Fraigniaud

ACI Sécurité SR2I, 3 years, subproject chair: S. Tixeuil

P2P Project of ACI ``Masse de Donnees'' : P. Fraigniaud

ANR Jeunes chercheurs XtremLab : G. Fedak

European CoreGrid Network of Excellence

Industrial Contacts

GIE EADS, Thesis founding (CIFRE) for Mathieu Caragnelli, from November 2004, 3 years. Title: Grid Services for semantics.

Dissemination Services to the Scientific Community Book/Journal edition

Ted Herman and Sébastien Tixeuil, editors. Self-stabilizing Systems, volume 3764 of Lecture Notes in Computer Science, Barcelona, Spain, October 2005. Springer Verlag.

P. Fraigniaud, "Distributed Computing", LNCS 3724, 2005.

Serge Petiton, « Informatique Parallèle et répartie » chez hermes

Conférence Organisation

Franck Cappello, HPDC'2006, "High Performance Distributed Computing", Paris, June 19-23, 2006

Franck Cappello, GP2PC'2005, "Global and Peer to Peer Computing", in association with CCGRID'2005, Cardiff, 9 May 2005.

Gilles Fedak , GP2PC'2006, "Global and Peer to Peer Computing", in association with CCGRID'2006, Singapore, 16-19 May 2006.

Serge Petiton, IMACS'2005, "World Congress Scientific Computation, Applied Mathematics and Simulation" Paris, France July 11 - 15, 2005

Editorial Committee membership

Franck Cappello, Journal of Grid Computing, Springer Netherlands

Franck Cappello, Journal of Grid and utility computing, Inderscience

Franck Cappello, Scientific Programming Journal Special Issue on Grids and Worldwide Computing, IOS Press, Winter 2005

Franck Cappello, "Technique et Science Informatiques", 2001-2005

Sébastien Tixeuil, "Technique et Science Informatiques", 2005-

P. Fraigniaud, Theory of Computing Systems (TOCS), Springer,

P. Fraigniaud, Journal of Interconnection Networks (JOIN), World Scientific,

Steering Committee membership

Franck Cappello, IEEE/ACM HPDC

Franck Cappello, IEEE/ACM CCGRID

P. Fraigniaud, International Symposium on Theoretical Aspects of Computer Science (STACS).

P. Fraigniaud, ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).

P. Fraigniaud, International symposium on Distributed Computing (DISC).

Program Committee membership

Franck Cappello, HCW 2005 – 14th Heterogeneous Computing Workshop, Denver, Colorado, USA, April 4, 2005

Franck Cappello, RenPar'2005, 16ème édition des Rencontres Francophones du Parallélisme, Croisic, France, 5-8 April 2005

Franck Cappello, IMACS'2005, 17th IMACS World Congress Scientific Computation, Applied Mathematics and Simulation, Paris, France July 11 - 15, 2005

Franck Cappello, 1st WSEAS International Symposium on GRID COMPUTING, Corfu Island, Greece, August 17-19, 2005.

Sébastien Tixeuil, SSS 2005 – 7th Symposium on Self-stabilizing Systems, Barcelona, Spain, Octobre 27-28, 2005

Franck Cappello, Grid'2005 – 6th IEEE/ACM International Workshop on Grid Computing, November 14, 2005, Seattle, Washington, USA

Franck Cappello, CDUR'2005 – Journées Francophones sur la Cohérence des Données en Univers Réparti, November 2 2005, Paris, France

Franck Cappello, HIPC'2005 – 12th Annual IEEE International Conference on High Performance Computing, Goa, India, December 18-21, 2005.

Joffroy Beauquier, OPODIS'2005, Pisa, Italy, December 12–14, 2005.

Franck Cappello, IPDPS'2006 – 20th Annual IEEE International Parallel and Distributed Processing Symposium, Rhodes, Grece, April 3-6, 2006.

Franck Cappello, HCW 2006 – 14th Heterogeneous Computing Workshop, Rodes Island, Greece, April 25-29, 2006

Franck Cappello, VECPAR'2006 –7th International Meeting High Performance Computing for Computational Science, Rio de Janeiro, Brazil, July 10-12, 2006

Sébastien Tixeuil, ICDCS'2006 – 26th IEEE International Conference on Distributed Computing Systems, Lisboa, Portugal, July 4-7, 2006

Franck Cappello, HotP2P'06, Hot Topic in P2P System, Greece – 2006

Sébastien Tixeuil, Algotel 2006 – 2006

Sébastien Tixeuil, DISC 2006, Stockholm Sweden, September 18-20 – 2006.

Franck Cappello, GP2PC'06, Singapor, April 2006

Franck Cappello, ECG'2006, European Grid Conference, June 7-8, 2006.

Franck Cappello, ICCP'2006, THE 2006 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, Columbus, Ohio, USA, August 14-18, 2006.

Franck Cappello, Grid'2006, Barcelona, Spain, September 2006

P. Fraigniaud, 4th International Workshop on Efficient and Experimental Algorithms (WEA), Santorini Island, Greece, May 10-13, 2005. http://ru1.cti.gr/wea05/

P. Fraigniaud, 31st International Workshop on Graph-Theoretic Concepts in Computer Science (WG), Metz, Franc, 23-25 juin 2005. http://lita.sciences.univ-metz.fr/~wg2005/

P. Fraigniaud, 19th International Symposium on Distributed Computing (DISC), (Program Chair), Cracow, Poland, September 26-29, 2005. http://www.mimuw.edu.pl/~disc2005/

P. Fraigniaud, 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rhodes Island, Greece, 25-29 April, 2006. http://www.ipdps.org/

P. Fraigniaud, 26th International Conference on Distributed Computing Systems (ICDCS), Lisboa, Portugal, July 4-7, 2006. http://icdcs2006.di.fc.ul.pt/

P. Fraigniaud, 12th European Conference on Parallel Computing (Euro-Par), Dresden, Germany, Aug. 29 - Sept. 1, 2006. http://www.europar2006.de/

P. Fraigniaud, 24th IASTED Conference on Parallel and Distributed Computing and Networks (PDCN), Innsbruck, Austria, February 14-16, 2006. http://www.iasted.org/conferences/2006/Innsbruck/pdcn.htm

P. Fraigniaud, 13th Colloquium on Structural Information and Communication Complexity (SIROCCO), Chester, UK, July 3-5, 2006. http://sirocco06.csc.liv.ac.uk/

Derrick Kondo GP2PC'2006, "Global and Peer to Peer Computing", in association with CCGRID'2006, Singapore, 16-19 May 2006.

Serge Petiton GP2PC'2006, "Global and Peer to Peer Computing", in association with CCGRID'2006, Singapore, 16-19 May 2006.

Serge Petiton IMACS 2005

Serge Petiton Hetepoar'2005

School and Workshop organization

Sébastien Tixeuil Ecotel 2006 (Ecole d'hiver des télécommunications), program comittee co-chair

Sébastien Tixeuil, Second Coregrid Workshop on Grid and P2P System Architecture, Paris, 16-17 january 2006.

Session Chairing

Franck Cappello, Session 2: ENABLING SYSTEMS, HCW 2005–14th Heterogeneous Computing Workshop, Denver, Colorado, USA, April 4, 2005

Franck Cappello, Session 14: Scheduling for Heterogeneous, Internet, and Grid Computing, IPDPS 2005, Denver, Colorado, USA, April 6, 2005

Sébastien Tixeuil, Session 1: Scheduling and MAC layer, WWAN 2005, Colombus, Ohio, USA, June 10, 2005

Sébastien Tixeuil, Session 3: Self-organization and Routing, WWAN 2005, Colombus, Ohio, USA, June 10, 2005

Sébastien Tixeuil, Session 1, SSS 2005, Barcelona, Spain, October 26-27, 2005

Serge Petiton IMACS'2005

Serge Petiton International Conference on Large Scale Scientific Computation en Bulgarie, 2005.

Participation to Workshops, Seminars and Miscellaneous Invitations Invited International Conference

Franck Cappello, "Grid projects in France and Europe", Colloquium on "25 years of collaboration between Instituto de Informatica de l'UFRGS and France", Porto Alegre, November 2005.

Sébastien Tixeuil, "Self-stabilization with Byzantine containment", Colloquium on Dynamic Systems, Rennes, september 19th, 2005.

Franck Cappello, "Grid'5000", Workshop Grid@large, in conjuction with Europar 2005, Lisboa, August 2005.

Franck Cappello, "Dependability in Grids", Workshop of the IFIP WG10.4 ON DEPENDABLE COMPUTING AND FAULT TOLERANCE, Yokoama, July, 2005.

Franck Cappello, "Grid research tools and Grid'5000", workshop on P2P : concept, outils et applications ; Geneve, Mai 2005

Franck Cappello, "Dependability in Grids", panel "Dependability Challenges and Education Perspectives", Fifth European Dependable Computing Conference, Budapest, April 2005.

Franck Cappello, "Desktop Grid, Global Computing and P2P Distributed Systems", wokshop on Advanced Grid Technologies, Systems & Services, Session: Grid Foundations for Business & Industry , IST Call 5, Brussels, February 2005.

Invited National Conference

Franck Cappello", "P2P ...", Ouverture de la conférence JRES, Marseille, Décembre 2005.

Franck Cappello, "Grid'5000", Une Grille BioInformatique en France, Expériences et Perspectives, IBCP, Lyon, 16 juin 2005.

Franck Cappello, "Grid 5000", Journée thématique Grilles et Clusters, Strasbourg, le 7 Juin 2005.

Franck Cappello, "Grid'5000", Centre de calcul de IN2P3, Grenoble, le 27 Mai 2005.

Gilles Fedak, ``Le projet XtremWeb'' SMAI 2005 Congrés de la Société de Mathématiques Appliquées, Evian 25-28 juin 2005

Schools, Workshops

Franck Cappello, "P2P et Desktop Grids", JRES'2005, 6ème Journées Réseaux, Marseilles, 5-9 December 2005.

Franck Cappello, "Grid'5000", Workshop LCG-France: 1ère rencontre IN2P3/STIC dans le cadre de la grille de calcul du LHC, Grenoble, 25 Février 2005.

Franck Cappello, "Recherche en Grille dans les STIC", Workshop LCG-France: 1ère rencontre IN2P3/STIC dans le cadre de la grille de calcul du LHC, Grenoble, 25 Février 2005.

Franck Cappello, "Grid'5000 : une plate-forme de grille expérimentale d'échelle nationale", Journée de veille scientifique et technologique sur les grilles, 4 février 2005 - Irisa - Campus de Beaulieu, Rennes Audio ( http://www.irisa.fr/videos/irisatech/lesgrilles/cappello/st-son01.rm) Vidéo ( http://www.irisa.fr/videos/irisatech/lesgrilles/cappello/st-video01.rm) pdf ( http://www.irisa.fr/videos/irisatech/lesgrilles/cappello/F-Cappello.pdf)

Franck Cappello, "Grid5000: une plate-forme de grille experimentale d'echelle nationale", Grappes et grilles d'ordinateurs: etat de l'art, INRI ARhone-Alpes, Montbonnot, 3 Février 2005 pdf ( http://rev.inrialpes.fr/intechslides/2005-02-03/desprez.pdf)

Franck Cappello, "Une introduction aux Grilles", Iliatech, Journée de veille Scientifique et Technologique , INRIA Rocquencourt, Mardi 18 janvier 2005

Franck Cappello, "Présentation du projet national Grid'5000", Iliatech, Journée de veille Scientifique et Technologique , INRIA Rocquencourt, Mardi 18 janvier 2005

P. Fraigniaud, "Routing and Lookup in Peer-to-Peer Systems", 3rd Complex Systems Summer School, Valparaíso, Chili, 10-21 janvier, 2005.

P. Fraigniaud, "Navigation dans les réseaux sociaux", Ecole thématique sur les Grands Réseaux d'Interactions, Paris, 25-29 avril, 2005.

P. Fraigniaud, "Greedy routing in tree-decomposed graphs", Workshop of the COST Action 295 DYNAMO on Dynamic Communication Networks: Foundations and Algorithms, Cracovie, Pologne, 29-30 sept., 2005.

P. Fraigniaud, "Greedy routing in tree-decomposed graphs", Workshop on Graph Classes, Width Parameters and Optimization, Prague, October 17 - 19, 2005.

P. Fraigniaud, "Routage glouton dans les décompositions arborescentes", 7èmes Journées Graphes et Algorithmes, Bordeaux, 3-4 novembre, 2005.

P. Fraigniaud, "Le graphe de de Bruijn, ou le Vilain Petit Canard deviendra-t-il Cygne ?", Colloque en l'honneur de Jean-Claude Bermond, Sophia-Antipolis, 8-9 décembre 2005.

P. Fraigniaud, "Graph exploration and graph searching", Descrete Mathematics Summer School, Valparaíso, Chili, 9-13 Jan, 2006.

P. Fraigniaud, "Aspects fondamentaux des réseaux décentralisés", Ecole de printemps GRID et P2P, Crans-Montana, Suisse, 6-10 mars 2006.

Gilles Fedak, ``Scheduling Independent Tasks Sharing Large Data Distributed with BitTorrent''NSF/INRIA Workshop: Scheduling for Large-Scale Distributed Platforms, La Jolla, California – November 12- 14, 2005

Seminaries

Franck Cappello, "Grid'5000", Meeting of the ACI "Masse de Données" Art3D project, Le Louvre, Paris, 23 Mars 2005.

Sébastien Tixeuil, "Beyond Self-stabilization", Meeting of the ACI "Sécurité et Informatique" SR2I project, Alcatel marcoussis, 7 march 2005

Sébastien Tixeuil, "On Self-stabilization and Sensor Networks", Kent State University invited seminar, USA, 6 april 2005

Franck Cappello, "When Parallel Computing takes risks", Department of computer science, Parallel Programming Laboratory, University of Illinois at Urbana Champaign, 13 April 2005

Franck Cappello, "Grid'5000 status", MatsuLab, TITECH, Japan, September 2005

Sébastien Tixeuil, "Self-stabilization with Byzantine Containment", LAMI invited seminar, Université d'Evry, France, 17 november 2005

Sébastien Tixeuil, "Introduction to Self-stabilization", Universidade Federal de Bahia (UFBA), Brésil, 16 décembre 2005

Sébastien Tixeuil, "Self-stabilization and Sensor Networks", Universidade Federal de Bahia (UFBA), Brésil, 19 décembre 2005

Philippe Gauron, "Exploiter les lois de puissance et les petits mondes pour le pair-à-pair", journées TAROT, Université d'Évry, France, 18 mars 2005

Gilles Fedak "Scheduling Independent Tasks Sharing Large Data Distributed with BitTorrent", NSF/INRIA Workshop Scheduling for Large-Scale Distributed Platforms, La Jolla, California November 13, 2005

Serge Petiton, "Peer to Peer Linear Algebra Computing", University College of Dublin, Irlande, 18 février 2005

Serge Petiton, "Matrix Global and Grid Computing", AIST, Tsukuba, Japon, 22 avril 2005

Serge Petiton, "A Survey on Peer to Peer Parallel Scientific Global Computing", NEC, Tokyo, Japon, 27 avril 2005

Serge Petiton, "Vers le calcul scientifique global pair à pair", Journées calcul du CEA, La baule, 7 juin 2005

Serge Petiton, "Large Scale Matrix Global Computing", Institut de Physique, Teheran, Iran, 23 juillet 2005

Serge Petiton, "Very Large Global Computing on Heterogeneous Platforms", Google, Seattle, USA , 18 novembre 2005

MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes G. Bosilca G. A. Bouteiller A. F. Cappello F. S. Djilali S. G. Fedak G. C. Germain C. T. Herault T. P. Lemarinier P. O. Lodygensky O. F. Magniette F. V. Neri V. A. Selhikov A. proceedings of ACM/IEEE International Conference on Supercomputing 2002 MPICH-V3: A hierarchical fault tolerant MPI for Multi-Cluster Grids Aurelien Bouteiller A. Geraud Krawezik G. Pierre Lemarinier P. Franck Cappello F. IEEE/ACM SC 2003 Phoenix USA November 2003 Coordinated Checkpoint versus Message Log for fault tolerant MPI Aurelien Bouteiller A. Pierre Lemarinier P. Geraud Krawezik G. Franck Cappello F. IEEE Cluster 2003, Hong Kong December 2003 Computing on Large Scale Distributed Systems: XtremWeb Architecture, Programming Models, Security, Tests and Convergence with Grid Franck Cappello F. Samir Djilali S. Gilles Fedak G. Thomas Herault T. Frédéric Magniette F. Vincent Neri V. Oleg Lodygensky O. FGCS Future Generation Computer Science 2004 Optimal Snap-stabilizing Neighborhood Synchronizer in Tree Networks Colette Johnen C. Luc Onana Alima L. O. Ajoy Kumar Datta A. K. Sébastien Tixeuil S. Parallel Processing Letters 12 3 & 4 2002 327–340 XtremWeb and Condor : sharing resources between Internet connected Condor pools Oleg Lodygensky O. Gilles Fedak G. Franck Cappello F. Vincent Neri V. Miron Livny M. Douglas Thain D. GP2PC 2003 Workshop, Tokyo, Japan IEEE/ACM CCGRID2003 May 12-15 2003 Self-stabilizing Systems Lecture Notes in Computer Science Ted Herman T. Sébastien Tixeuil S. 3764 Springer Verlag

Barcelona, Spain

October 2005 http://www.springeronline.com/3-540-29814-2 Observing locally self-stabilization Joffroy Beauquier J. Laurence Pilard L. Brigitte Rozoy B. 0926-6801 Journal of High Speed networks 14 1 2005 3-19 Hybrid Preemptive Scheduling of MPI Applications on the Grids Aurelien Bouteiller A. Hinde-Lilia Bouziane H.-L. Thomas Herault T. Pierre Lemarinier P. Franck Cappello F. 1740-0562 International Journal of High Performance Computing and Networking to appear 2005 MPICH-V Project: a Multiprotocol Automatic Fault Tolerant MPI Aurelien Bouteiller A. Thomas Herault T. Geraud Krawezik G. Pierre Lemarinier P. Franck Cappello F. 1094-3420 International Journal of High Performance Computing Applications to appear 2005 Les enjeux de l'informatique de demain Franck Cappello F. Les Grilles : les défis de la globalisation des ressources informatiques et des données (Luis Farinas del Cerro editor) Lavoisier 2005 Encyclopédie Vuibert Franck Cappello F. Gilles Fedak G. Tangui Morlier T. Oleg Lodygensky O. Des systèmes client-serveur aux systèmes pair a pair Vuibert 2005 An algorithmic model for heterogeneous clusters: rationale and experience Franck Cappello F. Pierre Fraigniaud P. Bernard Mans B. Arnold Rosenberg A. 0129-0541 oundations of Computer Science 16 2005 195–216 Stabilizing Inter-domain Routing in the Internet Yu Chen Y. Ajoy K. Datta A. K. Sébastien Tixeuil S. 0926-6801 Journal of High Speed Networks 14 1 2005 21-37 Multiple Explicitly Restarted Arnoldi Method for Solving Large Eigenproblems Nahid Emad N. Serge Petiton S. Guy Edjlali G. 1064-8275 SIAM Journal on Scientific Computing Volume 27 Number 1 2005 pp. 253-277 D2B: a de Bruijn Based Content-Addressable Network Pierre Fraigniaud P. Philippe Gauron. P. 2005 To appear A Hybrid GMRES-LS-Arnoldi method to accelerate the parallel solution of linear systems Haiwu He H. Guy Bergère G. Serge G. Petiton S. G. 0898-1221 An International Journal: Computer and Mathematics with Applications 2005 Cluster and Grid Matrix Computation with Persistent Storage and Out-of-core Programming Lamine M. Aouad L. M. Serge G. Petiton S. G. Mitsuhisa Sato M. Cluster'05, Boston, Massachusetts, USA September 26 - 30 2005 Observing self-stabilization in a probabilistic way Joffroy Beauquier J. Laurence Pilard L. Brigitte Rozoy B. Pierre Fraigniaud P. Proceedings of DISC'2005, Kracow, Poland Lecture Notes in Computer Science Springer-Verlag October 2005 399-413 Grid'5000: A Large Scale, Reconfigurable, Controlable and Monitorable Grid Platform Franck Cappello F. Frederic Desprez F. Michel Dayde M. Emmanuel Jeannot E. Yvon Jegou Y. Stephane Lanteri S. Nouredine Melab N. Raymond Namyst R. Pascale Primet P. Olivier Richard O. Eddy Caron E. Julien Leduc J. Guillaume Mornet G. 6th IEEE/ACM International Workshop on Grid Computing, Seattle, USA 2005 Toward global and grid computing for large scale linear algebra problems Laurent Choy L. Serge G. Petiton S. G. Heteropar'05, Boston, Massachusetts, USA September 27 - 30 2005 Self-stabilization with r-operators revisited Sylvie Delaët S. Bertrand Ducourthial B. Sébastien Tixeuil S. Proceedings of the Seventh Symposium on Self-stabilizing Systems (SSS'05), Barcelona, Spain Lecture Notes in Computer Science to appear 3764 3764 Springer Verlag October 2005 Probabilistic Model Checking of the CSMA/CD Protocol Using PRISM and APCM Marie Duflot M. Laurent Fribourg L. Thomas Herault T. Richard Lassaigne R. Frédéric Magniette F. Stéphane Messika S. Sylvain Peyronnet S. Claudine Picaronny C. Michael R. A. Huth M. R. A. Proceedings of the 4th International Workshop on Automated Verification of Critical Systems (AVoCS'04), London, UK Electronic Notes in Theoretical Computer Science 128 6 Elsevier Science Publishers May 2005 195-214 http://www.lsv.ens-cachan.fr/Publis/PAPERS/PDF/DFH-avocs2004.pdf Combining the use of clustering and scale-free nature of user exchanges into a simple and efficient P2P system Pierre Fraigniaud P. Philippe Gauron P. Matthieu Latapy M. Proceedings of the 12th European Conference on Parallel Computing (Euro-Par), Lisbon, Portugal Lecture Notes in Computer Science Springer-Verlag GmbH August 2005 1163-1172 Space lower bounds for graph exploration via reduced automata Pierre Fraigniaud P. David Ilcinkas D. Sergio Rajsbaum S. Sébastien Tixeuil S. Andrzej Pelc A. Michel Raynal M. Structural Information and Communication Complexity, 12th International Colloquium, SIROCCO 2005, Mont Saint-Michel, France, May 24-26, 2005, Proceedings Lecture Notes in Computer Science 3499 Springer Verlag May 2005 140-154 Expoiter les agrégats et les lois de puissance pour le pair-à-pair Philippe Gauron P. Actes des 7 èmes Rencontres Francophones sur les aspects Algorithmiques des Télécommunications (AlgoTel), Presqu'île de Giens, France INRIA May 2005 85-88 Distribution, approximation and probabilistic model checking Guillaume Guirado G. Thomas Herault T. Richard Lassaigne R. Sylvain Peyronnet S. Proceedings of the 4th International Workshop on Parallel and Distributed Methods in verifiCation (PDMC'05), Lisbon, Portugal July 2005 GMRES Method on Lightweight GRID System Haiwu He H. Guy Bergère G. Serge G. Petiton S. G. 4th International Symposium on Parallel and Distributed Computing, Lille, France 2005 A language-driven tool for fault injection in distributed applications William Hoarau W. Sébastien Tixeuil S. Proceedings of the IEEE/ACM Workshop GRID 2005, Seattle, USA November 2005 to appear Impact of Event Logger on Causal Message Logging Protocols for Fault Tolerant MPI Pierre Lemarinier P. Aurelien Bouteiller A. Thomas Herault T. Geraud Krawezik G. Franck Cappello F. Proceedings of the Int Parallel and Distributed Processing Symposium (IPDPS 05), Denver, USA April 2005, Denver, USA April 2005 Towards a scheduling policy for hybrid methods on computational grids Pierre Manneback P. Guy Bergère G. Nahid Emad N. Ralf Gruber R. Vincent Keller V. Pierre Kuonen P. Tuan Anh Nguyen T. A. Sébastien Noël S. Serge Petiton S. Pisa november 2005 A Self-stabilizing Link Coloring Algorithm Resilient to Unbounded Byzantine Faults in Arbitrary Networks Toshimitsu Masuzawa T. Sébastien Tixeuil S. Proceedings of OPODIS 2005, Pisa, Italy Lecture Notes in Computer Science Springer-Verlag December 2005 to appear Auto-stabilisation dans les réseaux ad hoc Nathalie Mitton N. Eric Fleury E. Isabelle Guérin-Lassous I. Sébastien Tixeuil S. Proceedings of Algotel 2005 May 2005 45-48 Self-stabilization in Self-organized Wireless Multihop Networks Nathalie Mitton N. Eric Fleury E. Isabelle Guérin-Lassous I. Sébastien Tixeuil S. Proceedings of the 25th IEEE International Conference on Distributed Computing Systems Workshops (WWAN'05), Columbus, Ohio, USA IEEE Press June 2005 909-915 Matrix Peer-to-Peer Computing With Very Large Heterogeneous Plateforms Serge G. Petiton S. G. Lamine M. Aouad L. M. Laurent Choy L. IMACS'2005, the 17th IMACS World Congress on Scientific Computation, Applied Mathematics and Simulation, Paris, France 11–15 July 2005 Peer to Peer Large Scale Linear Algebra, programming and experimentations Serge G. Petiton S. G. Lamine M. Aouad L. M. Laurent Choy L. LSSC'05, 5th International Conference on Large-Scale Scientific Computations, Sozopol, Bulgaria June 6-10 2005 Distributed Out-of-Core Parallel Linear Algebre on Grid5000 Heterogeneous Platform Serge G. Petiton S. G. Lamine Aouad L. San Francisoc, USA february 21-24 2006 Large Scale Linear System Global Computing Serge G. Petiton S. G. Haiwu He H. Guy Bergère G. IMACS'2005, the 17th IMACS World Congress on Scientific Computation, Applied Mathematics and Simulation, Paris, France 11–15 July 2005 A survey of Grid research tools: simulators, emulators and real life platforms Benjamin Quétier B. Franck Cappello F. 17th IMACS World Congress (IMACS 2005), Paris, France 2005 V-Meter: Microbenchmark pour évaluer les utilitaires de virtualisation dans la perspective de systèmes d'émulation à grande échelle Benjamin Quétier B. Vincent Neri V. 16ème Rencontres Francophones du Parallélisme (RenPar'16), Le Croisic, France April 2005 Rechercher parmi ses pairs ou quand le hasard ne fait pas si bien les choses, tutoriel. Etienne Rivière E. Philippe Gauron P. Actes de la troisième MAnifestation francophones des Jeunes Chercheurs en STIC (MajecSTIC'05), Rennes, France HAL-Inria, édité par Sylvie Saget et Alexandre Vautier École doctorale Matisse, Université de Rennes 1

November 2005 To appear http://hal.inria.fr/inria-00000672 Fault-Injection and Dependability Benchmarking for Grid Computing Middleware Sébastien Tixeuil S. Luis Moura Silva L. M. William Hoarau W. Gonçalo Jesus G. Joao Bento J. Frederico Telles F. Proceedings of CoreGrid Integration Workshop November 2005 to appear Collaborative Data Distribution with BitTorrent for Computational Desktop Grids Baohua Wei B. Gilles Fedak G. Franck Cappello F. ISPDC'05, Lille, France 2005 Scheduling Independent Tasks Sharing Large Data Distributed with BitTorrent Baohua Wei B. Gilles Fedak G. Franck Cappello F. 6th IEEE/ACM International Workshop on Grid Computing, Seattle, USA 2005 1-adaptativity Joffroy Beauquier J. Sylvie Delaët S. Sammy Haddad S. Rapport de recherche 1405 CNRS - Université Paris Sud 2005 Self-stabilizing Philosophers with Generic Conflicts Praveen Danturi P. Mikhail Nesterenko M. Sébastien Tixeuil S. Technical report TR-KSU-CS-2005-05 Kent State University August 2005 A language-driven tool for fault injection in distributed systems William Hoarau W. Sébastien Tixeuil S. Technical report 1399 Laboratoire de Recherche en Informatique February 2005 Easy Fault Injection and Stress Testing with FAIL-FCI William Hoarau W. Sébastien Tixeuil S. Fabien Vauchelles F. Technical report 1421 Laboratoire de Recherche en Informatique

Université Paris Sud

October 2005 Fault Injection in Distributed Java Applications William Hoarau W. Sébastien Tixeuil S. Fabien Vauchelles F. Technical report 1420 Laboratoire de Recherche en Informatique

Université Paris Sud

October 2005 A Self-Stabilizing Link-Coloring Protocol Resilient to Unbounded Byzantine Faults in Arbitrary Networks Toshimitsu Masuzawa T. Sébastien Tixeuil S. Technical report 1396 Laboratoire de Recherche en Informatique January 2005 On Fast Randomized Colorings in Sensor Networks Nathalie Mitton N. Eric Fleury E. Isabelle Guérin-Lassous I. Bruno Séricola B. Sébastien Tixeuil S. Technical report 1416 Laboratoire de Recherche en Informatique

Université Paris Sud

June 2005 Discovering Network Topology in the Presence of Byzantine Nodes Mikhail Nesterenko M. Sébastien Tixeuil S. Technical report TR-KSU-CS-2005-1 Kent State University May 2005 Performance evaluation model for scheduling in a global computing system K. Aida K. A. Takefusa A. H. Nakada H. S. Matsuoka S. S. Sekiguchi S. U. Nagashima U. 14, No. 3 2000 SuperWeb: Research Issues in JavaBased Global Computing A. D. Alexandrov A. D. M. Ibel M. K. E. Schauser K. E. C. J. Scheiman C. J. Concurrency: Practice and Experience 9 6 June 1997 535–553 Message Logging: Pessimistic, Optimistic and Causal L. Alvisi L. K. Marzullo K. 2001 Proc. 15th Int'l Conf. on Distributed Computing Berkeley Open Infrastructure for Network Computing (BOINC) D. Anderson D. http://boinc.berkeley.edu/ The MOSIX multicomputer operating system for high performance cluster computing Amnon Barak A. Oren La'adan O. Future Generation Computer Systems 13 4–5 1998 361–372 Charlotte: Metacomputing on the Web A. Baratloo A. M. Karaul M. Z. M. Kedem Z. M. P. Wyckoff P. Proceedings of the 9th International Conference on Parallel and Distributed Computing Systems (PDCS-96) 1996 Optimal reactive k-stabilization: the case of mutual exclusion. In Proceedings of the 18th Annual ACM Symposium on Principles of Distributed Computing Joffroy Beauquier J. Christophe Genolini C. Shay Kutten S. may 1999 Fault-Local Stabilization: the Shortest Path Tree. Proceedings of the 21th Symposium of Reliable Distributed Systems, october 2002 Joffroy Beauquier J. Thomas Herault T. MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes, in IEEE/ACM SC 2002 George Bosilca G. Aurelien Bouteiller A. Franck Cappello F. Samir Djilali S. Gilles Fedak G. Cecile Germain C. Thomas Herault T. Pierre Lemarinier P. Oleg Lodygensky O. Frederic Magniette F. Vincent Neri V. Anton Selikhov A. MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging Aurelien Bouteiller A. Franck Cappello F. Thomas Herault T. Geraud Krawezik G. Pierre Lemarinier P. Frederic Magniette F. November 2003 in IEEE/ACM SC 2003 Coordinated Checkpoint versus Message Log for fault tolerant MPI Aurelien Bouteiller A. Pierre Lemarinier P. Geraud Krawezik G. Franck Cappello F. December 2003 in IEEE Cluster ParaWeb: Towards World-Wide Supercomputing T. Brecht T. H. Sandhu H. M. Shan M. J. Talbot J. Proceedings of the Seventh ACM SIGOPS European Workshop on System Support for Worldwide Applications 1996 GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing R. Buyya R. M. Murshed M. Wiley Press May 2002 The POPCORN Project: Distributed Computation over the Internet in Java N. Camiel N. S. London S. N. Nisan N. O. Regev O. Proceedings of the 6th International World Wide Web Conference April 1997 GridFlow: Workflow Management for Grid Computing Junwei Cao J. Stephen A. Jarvis S. A. Subhash Saini S. Graham R. Nudd G. R. Proceedings of the Third IEEE/ACM Internation Symposium on Cluster Computing and the Grid May 2003 Simgrid: A Toolkit for the Simulation of Application Scheduling. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid '01) H. Casanova H. May 2001 Heuristics for Scheduling Parameter Sweep Applications in Grid Environments H. Casanova H. A. Legrand A. D. Zagorodnov D. F. Berman F. IEEE Computer Society Press Proceedings of the Ninth Heterogeneous Computing Workshop 2000 349-363 Distributed Snapshots: Determining Global States of Distr. systems. ACM Trans. on Comp. Systems, 3(1):63–75, 1985 K. M. Chandy K. M. L. Lamport L. A prototype implementation of archival intermemory. In Proceedings of ACM Digital Libraries. ACM, August 1999. Yuan Chen Y. Jan Edler J. Andrew Goldberg A. Allan Gottlieb A. Sumeet Sobti S. Peter Yianilos P. Entropia: Architecture and Performance of an Enterprise Desktop Grid System A. Chien A. B. Calder B. S. Elbert S. K. Bhatia K. Journal of Parallel and Distributed Computing 63 5 2003 597–610 Javelin: Internet-Based Parallel Computing Using Java B. O. Christiansen B. O. P. Cappello P. M. F. Ionescu M. F. M. O Neary M. O. K. E Schauser K. E. D. Wu D. Concurrency: Practice and Experience 9 11 November 1997 1139–1160 Mithral Communications & Design Inc. COSM http://www.mithral.com/ Self-stabilization, M.I.T. Press 2000 S. Dolev S. UNICORE - a Grid computing environment. Concurrency and Computation: Practice and Experience 14(13-15): 1395-1410 (2002) Dietmar W. Erwin D. W. XtremWeb: A Generic Global Computing System Gilles Fedak G. Cecile Germain C. Vincent Neri V. Franck Cappello F. CCGRID'01: Proceedings of the 1st International Symposium on Cluster Computing and the Grid IEEE Computer Society 2001 582 Impossibility of Distributed Consensus with one Faulty Process M. J. Fischer M. J. N. A. Lynch N. A. M. S. Paterson M. S. Journal of the ACM 32 2 April 1985 374–382 On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing Ian Foster I. Adriana Iamnitchi A. 2nd International Workshop on Peer-to-Peer Systems (IPTPS'03), Berkeley, CA February 2003 Globus: A metacomputing infrastructure toolkit, Internat. J. Supercomput. Appl. 11, 2 (1997), 115128 I. Foster I. C. Kesselman C. The physiology of the grid: An open grid services architecture for distributed systems integration. Technical report, Open Grid Service Infrastructure WG, Global Grid Forum, June 2002. I. Foster I. C. Kesselman C. J. Nick J. S. Tuecke S. Principles of distributed computing. John Wiley and Sons; ISBN: 0471036005; (May 2002). V. K. Garg V. K. A lower bound on k-stabilization in asynchronous systems. Proceedings of the 21th Symposium of Reliable Distributed Systems, october 2002. Christophe Genolini C. S. Tixeuil S. GLUnix: A Global Layer Unix for a Network of Workstations Douglas P. Ghormley D. P. David Petrou D. Steven H. Rodrigues S. H. Amin M. Vahdat A. M. Thomas E. Anderson T. E. Software Practice and Experience 28 9 1998 929–961 Use of Multicast in P2P Network thought Integration in MPICH-V2 B. Hudzia B. Technical report Master of Science Internship, Pierre et Marie Curie University September 2003 A Science-based Case for Large Scale Simulation, Vol. 1, Office of Science, US Department of Energy, Report Editor-in-Chief D. E. Keyes D. E. July 30 2003 OceanStore: An Architecture for Global-scale Persistent Storage John Kubiatowicz J. David Bindel D. Yan Chen Y. Patrick Eaton P. Dennis Geels D. Ramakrishna Gummadi R. Sean Rhea S. Hakim Weatherspoon H. Westly Weimer W. Christopher Wells C. Ben Zhao B. Proceedings of ACM ASPLOS ACM November 2000 Stabilizing time-adaptive protocols. Theoretical Computer Science 220(1) Shay Kutten S. Boaz Patt-Shamir B. 1999 Fault-local distributed mending. Journal of Algorithms 30(1) Shay Kutten S. David Peleg D. 1999 Deconstructing the Kazaa Network N. Leibowitz N. M. Ripeanu M. A. Wierzbicki A. Proceedings of the 3rd IEEE Workshop on Internet Applications WIAPP'03, Santa Clara, CA 2003 Condor — A Hunter of Idle Workstations M. Litzkow M. M. Livny M. M. Mutka M. Proceedings of the Eighth Conference on Distributed Computing, San Jose 1988 Distributed Algorithms Nancy A. Lynch N. A. Morgan Kaufmann M. 1996 The Swarm Simulation System: A Toolkit for Building Multi-Agent Simulations N. Minar N. R. Murkhart R. C. Langton C. M. Askenazi M. 1996 http://www.santafe.edu/projects/swarm/overview/overview.html Web-Based Metacomputing with JET H. Pedroso H. L. M. Silva L. M. J. G. Silva J. G. Proceedings of the ACM 1997 A Scalable Content Addressable Network Sylvia Ratnasamy S. Paul Francis P. Mark Handley M. Richard Karp R. Scott Shenker S. Proceedings of ACM SIGCOMM 2001 2001 Guidelines for Data-Parallel Cycle-Stealing in Networks of Workstations I: On Maximizing Expected Output A. L. Rosenberg A. L. Journal of Parallel Distributed Computing 59 1 1999 31-53 Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems Antony Rowstron A. Peter Druschel P. IFIP/ACM International Conference on Distributed Systems Platforms (Middleware) 2001 329–350 Bayanihan: building and studying Web-based volunteer computing systems using Java Luis F. G. Sarmenta L. F. G. Satoshi Hirano S. Future Generation Computer Systems 15 5–6 1999 675–686 A Measurement Study of Peer-to-Peer File Sharing Systems Stefan Saroiu S. P. Krishna Gummadi P. K. Steven D Gribble S. D. Proceedings of Multimedia Computing and Networking, San Jose, CA, USA January 2002 SciDAC Scidac http://www.scidac.org The Worm Programs: Early Experiences with Distributed Systems J. F. Shoch J. F. J. A. Hupp J. A. Communications of the Association for Computing Machinery 25 3 March 1982 Policies for Swapping MPI Processes. HPDC 2003: 104-113 Otto Sievert O. Henri Casanova H. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications Ion Stoica I. Robert Morris R. David Karger D. Frans Kaashoek F. Hari Balakrishnan H. Proceedings of the 2001 ACM SIGCOMM Conference 2001 149–160 Introduction to distributed algorithms. Cambridge University Press, 2000 G. Tel G. Teragrid Teragrid http://www.teragrid.org Grid Service Specification. Draft 3, Global Grid Forum, July 2002. S. Tuecke S. K. Czajkowski K. I. Foster I. J. Frey J. S. Graham S. C. Kesselman C. Implementation and Characterization of Protein Folding on a Desktop Computational Grid - Is Charmm a Suitable Candidate for the United Devices Metaprocessor B. Uk B. M. Taufer M. T. Stricker T. G. Settanni G. A. Cavalli A. Technical report 385 ETH Zurich, Institute for Comutersystems October 2002 Optimistic Message Logging for Independent Checkpointing in Message-Passing Systems Yi-Min Wang Y.-M. W. Kent Fuchs W. K. Symposium on Reliable Distributed Systems 1992 webservices webservices http://www.webservices.org/ A Causal Logging Scheme for Lazy Release Consistent Distributed Shared Memory Systems. In Proc. of the 1998 Int'l Conf. on Parallel and Distributed Systems, Dec. 1998. 1 Y. Yi Y. T. Park T. H. Y. Yeom H. Y. Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing B. Y. Zhao B. Y. J. D. Kubiatowicz J. D. A. D. Joseph A. D. Technical report UCB/CSD-01-1141 UC Berkeley April 2001 Gridsystems Simplify Complexity datasynapse http://www.datasynapse.com Gridsystems gridsystems http://www.gridsystems.com MPI: A message passing interface standard. Technical report, University of Tennessee, Knoxville, June 12, 1995. 16 Message Passing Interface Forum Platform Computing - Accelerating Intelligence - Grid Computing PlatForm http://www.platform.com