EN FR
EN FR


Section: Application Domains

Providing Environments for Experiments

Participants : Sébastien Badia, Tomasz Buchert, Pierre-Nicolas Clauss, Sylvain Contassot-Vivier, El Mehdi Fekari, Jens Gustedt, Emmanuel Jeanvoine, Lucas Nussbaum, Martin Quinson, Tinaherinantenaina Rakotoarivelo, Cristian Rosa, Luc Sarzyniec, Christophe Thiéry, Stéphane Vialle.

Simulating Distributed Applications

We are major contributors to the SimG rid framework (see  5.4 for the software description, and 6.2.1 for the new results of this year) a collaboration with the Univ. of Hawaii, Manoa, and INRIA Grenoble-Rhône-Alpes, France. It enables the simulation of distributed applications in large-scale settings for the specific purpose of evaluating and assessing algorithms. Simulations not only allow repeatable results (what is near to impossible when experimenting the applications on real experimental facilities) but also make it possible to explore wide ranges of platform and application scenarios. SimG rid implements realistic fluid network models that result in very fast yet precise simulations. This is one of the main simulation tools used in the Grid Computing community.

Formally Assessing Distributed Algorithms

In joint research with Stephan Merz of the Veridis team of INRIA Nancy and LORIA, we are interested in the verification (essentially via model checking) of distributed and peer-to-peer algorithms. Whereas model checking is now routinely used for concurrent and embedded systems, existing algorithms and tools can rarely be effectively applied for the verification of asynchronous distributed algorithms and systems.

We are working on integrating these methods to the SimG rid tool to make them more accessible to non-experts. The expected benefit of such an integration is that programmers can complement simulation runs by exhaustive state space exploration in order to detect defects that would be hard to reproduce by testing. Indeed, a simulation platform provides a controlled execution environment that mediates interactions between processes, and between processes and the environment, and thus provides the basic functionality for implementing a model checker. The principal challenge is the state explosion problem, as a naive approach to systematic generation of all possible process interleavings would be infeasible beyond the most trivial programs. Moreover, it is impractical to store the set of global system states that have already been visited: the programs under analysis are arbitrary C programs with full access to the heap, making even a hashed representation of system states very difficult and costly to implement.

Grid'5000

Grid'5000 is a scientific testbed for experimenting with a large variety of types of distributed systems, such as High Performance Computing, Clouds, P2P or Grids. It provides a unique combination of features to its users:

  • deployment of user-provided operating system on bare hardware, with the Kadeploy tool developed in our team

  • access to various technologies (CPUs, high performance networks, etc.) at a large scale

  • dedicated network backbone, with monitoring and isolation features

  • programmable API, for scripted experiments.

Grid'5000 is currently composed of 11 sites (one in Nancy, managed by our team, and two geographically close to Nancy, in Reims and Luxembourg). The Nancy site is one of the most important, both in terms of number of nodes and cores, and in terms of contribution to the technical team.

With this combination of features, Grid'5000 is a world-leading testbed for research in its field, and plays a central role in our work on experimentation methodologies.

Emulation

Experimental testbeds such as Grid'5000 provide a stable environment which is important to allow reproducible experiments. However, sometimes, the experimental conditions provided by the testbed do not match the conditions required by an experiment, in terms of computing power, network bandwidth, latency and topology, etc.

We are working on a software tool called Distem (see  5.2 ) based on another tool that we developed previously, Wrekavoc (see  5.3 ). The goal of Distem is to emulate a heterogeneous environment consisting of nodes of different compute and memory capacity and varying network bandwidth and latency. On such an emulated environment, it is possible to execute a real, unmodified application.

Distem uses homogeneous Linux clusters and achieves this emulation by controlling the heterogeneity of a given platform by degrading CPU and network of each node composing this platform.

InterCell

Intercell aims at setting up a cluster (256 PCs) for interactive fine grain computation. It is granted by the Lorraine Region (CPER 2007), and managed at the Metz campus of SUPÉLEC.

The purpose is to allow easy fine grain parallel design, providing interactive tools for the visualization and the management of the execution (debug, step by step, etc). The parallelization effort is not visible to the user, since InterCell relies on the dedicated parXXL framework, see  5.1 below. Among the applications that are tested is the interactive simulation of PDEs in physics, based on the Escapade project, see [29] .

Experimental platform of GPU clusters

We participate in the scientific exploitation of two experimental 16-node clusters of GPUs that are installed at the SUPÉLEC Metz site. One cluster has already GPU with "FERMI" architecture, and the second should be updated at the beginning of 2012. This platform allows the experimentation of scientific programming on GPU ("GPGPU"), and to track computing and energetic performances, with specific monitoring hardware. Development environments available on these GPU clusters are mainly the gcc suite and its OpenMP library, OpenMPI and the CUDA environment of nVIDIA's nvcc compiler.