Section: New Results
Autonomous Management of Virtualized Infrastructures
Participants : Amine Belhaj, Roberto-Gioacchino Cascella, Stefania Costache, Florian Dudouet, Eugen Feller, Jérôme Gallard, Rémy Garrigue, Piyush Harsh, Yvon Jégou, Chokchai Leangsuksun, Pierre Lemarinier, Christine Morin, Nikos Parlavantzas, Thierry Priol, Yann Radenac, Pierre Riteau.
Cloud Federations
Participants : Roberto-Gioacchino Cascella, Florian Dudouet, Piyush Harsh, Yvon Jégou, Christine Morin, Thierry Priol, Pierre Riteau.
Virtual Execution Platforms in Cloud Federations
In the context of the Contrail European project, we have defined the overall architecture of the Contrail software stack for cloud computing on top of cloud federations [51] . We have focused on the design and the implementation of a first basic prototype of the Virtual Excution Plaform (VEP) component [52] . It is in charge of provisioning hardware resources from Cloud providers and to deploy and run distributed applications submitted by users under the control of a negotiated Service Level Agreement (SLA) [16] . Within VEP software, REST interface, OVF parsing, SSL security, Authorization modules are under active development and at various levels of integration. A first demo version of VEP running on top of OpenNebula IaaS cloud has been successfully demonstrated at the first annual project review.
Efficient virtual cluster migration
We continued our work on Shrinker, a system providing efficient live migration of virtual clusters on wide area networks. The design has been improved to coordinate the deduplication on the source site of the migration. Deduplication is now performed only within an individual virtual cluster, in order to reduce security issues and avoid performance impact on virtual machines of other users. We performed a comprehensive performance evaluation of Shrinker. An article presenting the design, implementation, and performance of Shrinker was published in [28] .
Elastic Map/Reduce over Cloud Federations
We worked on the development of Resilin, a system to easily create execution platforms over distributed cloud resources for executing MapReduce computations. Resilin implements the Amazon Elastic MapReduce web service API and uses resources from private and community clouds. Resilin takes care of provisioning, configuring and managing cloud-based Hadoop execution platforms, potentially using multiple clouds. An initial implementation of Resilin was presented at the CCA '11 workshop [36] . Further development was performed in the context of Ancuta Iordache's master internship. The results of this work were published as a research report [45] .
Sky Computing Experiments
We continued our collaboration with the University of Florida on sky computing experiments, which led to the publication of a book chapter [38] .
Infrastructure as a Service Clouds
Participants : Stefania Costache, Eugen Feller, Yvon Jégou, Christine Morin, Nikos Parlavantzas, Pierre Riteau.
Large scale Energy aware self-healing IaaS
The research done in 2011 was two fold. A prototype of the previously proposed scalable, fault-tolerant and energy-aware virtual machine (VM) management framework called Snooze was implemented and evaluated on the Grid5000 testbed [41] . In 2011, we have focused on the implementation of the self-healing mechanisms and protocols, and on integrating in Snooze the system-level mechanisms (e.g. for automatically switching on/off cluster nodes) to support energy aware resource management algorithms. Our experimental results show that the fault-tolerance features of the framework do not impact application performance. Moreover, negligible cost is involved in performing distributed VM management and the system remains highly scalable with increasing amounts of resources. A nature-inspired VM placement algorithm [24] based on the Ant Colony Optimization (ACO) meta-heuristic was developed and evaluated by means of simulations.
Resource Management in Private Clouds
We focused on the design of a resource management system for private clouds that provides support for different application SLAs while maximizing the resource utilization of the infrastructure. As we also considered the need of providing users the incentives to truthfully express their valuation for the performance of their application we investigated existing economic models of allocating resources. As a result, we proposed a novel resource management architecture based on a virtual economy. In this system, independent agents monitor the application's performance and dynamically provision virtual machines from the infrastructure under user's budget constraints. To provision virtual machines, a proportional share auction is used, allowing a fine-grain resource sharing at a low complexity cost. This work was done as part of a collaboration with EDF R&D and was published at the VHPC 2011 workshop [22] . We have also implemented a first prototype of this proposal. In collaboration with Vydia Rajagopalan (Master student at VU Amsterdam) we have implemented the proportional-share auction scheduler and integrated it with the OpenNebula Virtual Infrastructure Manager. Then, we have extended this work with the design of agents that execute scientific applications (MPI and Bag-of-Task applications) under deadline and budget constraints. Experimental evaluations are currently performed on Grid5000.
Resilience
We initiated a collaboration with Box Leangsuksun's group on high availability of cloud infrastructures. We carried out a preliminary study on how the HA-OSCAR environment developed at the Louisiana Tech University could be used to ensure the high availability of critical services in Nimbus IaaS clouds [30] .
XtreemOS Grid Distributed Operating System
Participants : Amine Belhaj, Jérôme Gallard, Rémy Garrigue, Yvon Jégou, Christine Morin, Yann Radenac.
Facilitating Experiments with XtreemOS Grid System
XtreemOS Grid system that has been developed in the framework of the XtreemOS European project is now evolving as an open source software in a community driven by INRIA in the framework of the XtreemOS Easy ADT. We have provided first level support to users and maintained XtreemOS website, wiki and mailing-lists. We have updated XtreemOS documentation to reflect the evolution of XtreemOS system. We facilitated the access to XtreemOS in three different ways: maintaining an open public testbed runnning XtreemOS, providing ready-to-use XtreemOS virtual machines and developing tools to automatically deploy XtreemOS Grid system on the Grid'5000 large-scale experimentation platform. In 2011, we have finalized a new 3.0 version of XtreemOS system and ported it on top of the OpenSuse 11.4 Linux distribution. We performed a number of tests to validate the installation, configuration and execution of the new XtreemOS version based on openSuSE Linux distribution. An incremental integration process has been set up to facilitate the integration of patches and bug fixes. We have run a number of experiments with XtreemOS 3.0 based on Mandriva Linux distribution: MPI programs, Salomé numerical simulation platform, bio-informatics applications. Yann Radenac, in the framework of the COOP project funded by ANR contributed to XtreemOS's code by fixing bugs, cleaning the source code to improve maintainability, and adding a few minor features.
Resource Management for Dynamic Applications
In the framework of the COOP project funded by ANR, we compared the features offered by the CooRM resource manager for dynamic applications developed by Christian Perez and Cristian Klein at ENS Lyon with those provided by the XtreemOS Grid system. A plan has been set to adapt CooRM to XtreemOS system and to extend XtreemOS's API to include a CooRM-like interface [53] . We worked on the definition of a variant of CooRM that can work in the context of XtreemOS Grid operating system.