EN FR
EN FR


Section: Overall Objectives

Research Directions

The Myriads project-team aims at dependable execution of applications, particularly, but not exclusively, those relying on Service Oriented Architectures and at managing resources in virtualized infrastructures in order to guarantee SLA terms to resource users and efficient resource management (energy efficiency, business efficiency...) to resource suppliers.

Our research activities are organized along three main work directions (structuring the remainder of this section): (i) autonomous management of virtualized infrastructures, (ii) dynamic adaptation of service-based applications and (iii) investigation of an unconventional, chemically-inspired, programming model for autonomous service computing.

Autonomous Management of Virtualized Infrastructures

With virtualized infrastructures (clouds) computing and storage become a utility. With Infrastructure-as-a-Service (IaaS) cloud providers offer plain resources like x86 virtual machines (VM), IP networking and unstructured storage. These virtual machines can be already configured to support typical computation frameworks such as bag of tasks, MapReduce, etc. integrating autonomous elasticity management. By combining a private cloud with external resources from commercial or partner cloud providers, companies will rely on a federation of clouds as their computing infrastructure. A federation of clouds allows them to quickly add temporary resources when needed to handle peak loads. Similarly, it allows scientific institutions to bundle their resources for joint projects. We envision a peer-to-peer model in which a given company or institution will be both a cloud provider during periods when its IT infrastructure is not used at its maximal capacity and a cloud customer in periods of peak activity. Moreover it is likely that in the future huge data centres will reach their limits in term of size due to energy consumption considerations leading to a new landscape with a wide diversity of clouds (from small to large clouds, from clouds based on data centres to clouds based on highly dynamic distributed resources). We can thus anticipate the emergence of highly dynamic federations of virtualized infrastructures made up of different clouds. We intend to design and implement system services and mechanisms for autonomous resource management in federations of virtualized infrastructures.

SLA-driven PaaS over Cloud Federations

Platform as a Service (PaaS) promises to ease building and deploying applications, shielding developers from the complexity of underlying federated clouds. To fulfill its promise, PaaS should facilitate specifying and enforcing the QoS objectives of applications (e.g., performance objectives). These objectives are typically formalized in Service Level Agreements (SLAs) governing the interactions between the PaaS and hosted applications. The SLAs should be enforced automatically, which is essential for accommodating the dynamism of application requirements and of the capabilities of the underlying environment. Current PaaS offerings, such as Google App Engine and Microsoft Azure, include some form of SLA support, but this support is typically ad-hoc, limited to specific software stacks and to specific QoS properties.

Our main goal is to integrate flexible QoS support in PaaS over cloud federations. Specifically, we will develop an autonomous management solution for ensuring application SLAs while meeting PaaS-provider objectives, notably minimizing costs. The solution will include policies for autonomously providing a wide range of QoS guarantees to applications, focusing mainly on scalability, performance, and dependability guarantees. These policies will handle dynamic variations in workloads, application requirements, resource costs and availabilities by taking advantage of the on-demand elasticity and cloud-bursting capabilities of the federated infrastructure. The solution will enable performing in a uniform and efficient way diverse management activities, such as customizing middleware components and migrating VMs across clouds; these activities will build on the virtualized infrastructure management mechanisms, described in the following paragraphs.

Several research challenges arise in this context. One challenge is translating from SLAs specifying properties related to applications (e.g., fault-tolerance) to federation-level SLAs specifying properties related to virtualized resources (e.g., number and type of VMs). This translation needs to be configurable and compliant with PaaS objectives. Another challenge is supporting the necessary decision-making techniques. Investigated techniques will range from policy-based techniques to control-theory and utility-based optimization techniques as well as combined approaches. Designing the appropriate management structure presents also a significant challenge. The structure must scale to the size of cloud-based systems and be itself dependable and resilient to failures. Finally, the management solution must support openness in order to accommodate multiple objectives and policies and to allow integration of different sensors, actuators, and external management solutions.

Virtual Data Centers

Cloud computing allows organizations and enterprises to rapidly adapt the available computational resources to theirs needs. Small or medium enterprises can avoid the management of their own data center and rent computational as well as storage capacity from cloud providers (outsourcing model). Large organizations already managing their own data centers can adapt their size to the basic load and rent extra capacity from cloud providers to support peak loads (cloud bursting model). In both forms, organization members can expect a uniform working environment provided by their organization: services, storage, ... This environment should be as close as possible to the environment provided by the organization' own data centers in order to provide transparent cloud bursting. A uniform environment is also necessary when applications running on external clouds are migrated back to the organization resources once they become free after a peak load. Supporting organizations necessitates to provide means to the organization administrators to manage and monitor the activity of their members on the cloud: authorization to access services, resource usage and quotas.

To support whole organizations, we will develop the concept of Elastic Virtual Data Center (VDC). A Virtual Data Center is defined by a set of services deployed by the organization on the cloud or on the organization's own resources and connected by a virtual network. The virtual machines supporting user applications deployed on a VDC are connected to the VDC virtual network and provide access to the organization's services. VDCs are elastic as the virtual compute resources are created when the users start new applications and released when these applications terminate. The concept of Virtual Data Center necessitates some form of Virtual Organization (VO) framework in order to manage user credentials and roles, to manage access control to services and resources. The concept of SLA must be adapted to the VDC context: SLA are negotiated by the organization administrators with resource providers and then exploited by the organization members (the organization receives the bill for resource usage). An organization may wish to restrict the capability to exploit some form of cloud resources to a limited group of members. It should be possible to define such policies through access rights on SLAs based on the user credential in a VO.

Virtualized Infrastructure Management

In the future, service-based and computational applications will be most likely executed on top of distributed virtualized computing infrastructures built over physical resources provided by one or several data centers operated by different cloud providers. We are interested in designing and implementing system mechanisms and services for multi-cloud environments (e.g. cloud federations).

At the IaaS level, one of the challenges is to efficiently manage physical resources from the cloud provider view point while enforcing SLA terms negotiated with cloud customers. We will propose efficient resource management algorithms and mechanisms. In particular, energy conservation in data centers is an important aspect to take into account in resource management.

In the context of virtualized infrastructures, we call a virtual execution platform (VEP) a collection of VMs executing a given distributed application. We plan to develop mechanisms for managing the whole life-cycle of VEPs from their deployment to their termination in a multi-cloud context. One of the key issues is ensuring interoperability. Different IaaS clouds may provide different interfaces and run heterogeneous hypervisors (Xen, VMware, KVM or even Linux containers). We will develop generic system level mechanisms conforming to cloud standards (e.g. DMTF OVF & CIMI, OGF OCCI, SNIA CDMI...) to deal with heterogeneous IaaS clouds and also to attempt to limit the vendor lock-in that is prevalent today. When deploying a VEP, we need to take into account the SLA terms negotiated between the cloud provider and customer. For instance, resource reservation mechanisms will be studied in order to provide guarantees in terms of resource availability. Moreover, we will develop the monitoring and measurement mechanisms needed to assess relevant SLA terms and detect any SLA violation. We also plan to develop efficient mechanisms to support VEP horizontal and vertical elasticity in the framework of cloud federations.

We envision that in the future Internet, a VEP or part of a VEP may migrate from one IaaS cloud to another one. While VM migration has been extensively studied in the framework of a single data center, providing efficient VM migration mechanisms in a WAN environment is still challenging [52] , [47] . In a multi-cloud context, it is essential to provide mechanisms allowing secure and efficient communication between VMs belonging to the same VEP and between these VMs and their user even in the presence of VM migration.

Heterogeneous Cloud Infrastructure Management

Today’s cloud platforms are missing out on the revolution in new hardware and network technologies for realising vastly richer computational, communication, and storage resources. Technologies such as Field Programmable Gate Arrays (FPGA), General-Purpose Graphics Processing Units (GPGPU), programmable network routers, and solid-state disks promise increased performance, reduced energy consumption, and lower cost profiles. However, their heterogeneity and complexity makes integrating them into the standard Platform as a Service (PaaS) framework a fundamental challenge.

Our main challenge in this context is to automate the choice of resources which should be given to each application. To execute an application a cloud user submits an SLO document specifying non-functional requirements for this execution, such as the maximum execution latency or the maximum monetary cost. The goal of the platform developed in the HARNESS European project (see Section  8.3.1.6 ) is to deploy applications over well-chosen sets of resources such that the SLO is respected. This is realised as follows: (i) building a performance model of each application; (ii) choosing the implementation and the set of cloud resources that best satisfy the SLO; (iii) deploying the application over these resources; (iv) scheduling access to these resources.

Multilevel Dynamic Adaptation of Service-based Applications

In the Future Internet, most of the applications will be built by composing independent software elements, the services. A Service Oriented Architecture (SOA) should be able to work in large scale and open environments where services are not always available and may even show up and disappear at any time.

Applications which are built as a composition of services need to ensure some Quality of Service (QoS) despite the volatility of services, to make a clever use of new services and to satisfy changes of needs from end-users.

So there is a need for dynamic adaptation of applications and services in order to modify their structure and behaviour.

The task of making software adaptable is very difficult at many different levels:

  • At business level, processes may need to be reorganized when some services cannot meet their Service Level Agreement (SLA).

  • At service composition level, applications may have to change dynamically their configuration in order to take into account new needs from the business level or new constraints from the services and the infrastructure level. At this level, most of the applications are distributed and there is a strong need for coordinated adaptation.

  • At infrastructure level, the state of resources (networks, processors, memory,...) has to be taken into account by service execution engines in order to make a clever use of these resources such as taking into account available resources and energy consumption. At this level there is a strong requirement for cooperation with the underlying operating system.

Moreover, the adaptations at these different levels need to be coordinated. In the Myriads project-team we address mainly the infrastructure and service composition layers.

So our main challenge is to build generic and concrete frameworks for self-adaptation of services and service based applications at run-time. The basic steps of an adaptation framework are Monitoring, Analysis/decision, Planning and Execution, following the MAPE model proposed in  [58] . We intend to improve this basic framework by using models at runtime to validate the adaptation strategies and establishing a close cooperation with the underlying Operating System.

We will pay special attention to each step of the MAPE model. For instance concerning the Monitoring, we will design high-level composite events ; for the Decision phase, we work on different means to support decision policies such as rule-based engine, utility function based engine. We will also work on the use of an autonomic control loop for learning algorithms ; for Planning, we investigate the use of on-the-fly planning of adaptation actions allowing the parallelization and distribution of actions. Finally, for the Execution step our research activities aim to design and implement dynamic adaptation mechanisms to allow a service to self-adapt according to the required QoS and the underlying resource management system.

Then we intend to extend this model to take into account proactive adaptation, to ensure some properties during adaptation and to monitor and adapt the adaptation itself.

An important research direction is the coordination of adaptation at different levels. We will mainly consider the cooperation between the application level and the underlying operating system in order to ensure efficient and consistent adaptation decisions. This work is closely related to the activity on autonomous management of virtualized infrastructures.

We are also investigating the Chemical approach as an alternative way to frameworks for providing autonomic properties to applications.

A Chemical Approach for Autonomous Service Computing

While the very nature of Internet is the result of a decentralized vision of the numeric world, the Internet of Services tends today to be supported by highly centralized platforms and software (data centers, application infrastructures like Google or Amazon, etc.). These architectures suffer from technical problems such as lack of fault-tolerance, but also raise some societal and environmental issues, such as privacy or energy consumption. Our key challenge is to promote a decentralized vision of service infrastructures, clearly separating expression (description, specification) of the platform from its implementation.

Chemical Expression of Interactions

As programming service infrastructures (in the user's point of view) mainly means expressing the coordination of services, we need an expressive and high level language, abstracting out low level implementation details to the user, while being able to model in a simple way the nature of service infrastructures.

Existing standardized languages do not provide this level of abstraction (mixing expression of the service coordination and implementation details). Within the chemical paradigm, a program is seen as a solution in which molecules (data) float and react together to produce new data according to rules (programs). Such a paradigm, implicitly parallel and distributed, appears to be a good candidate to express high level behaviors. The language naturally focus on the coordination of distributed autonomous entities. Thus, our first objective is to extend the semantics of chemical programs, in order to model not only a distributed execution of a service coordination, but also, the interactions between the different molecules within the Internet of Services (users, companies, services, advertisements, requests, ...). Finally, expressing the quality of services in a chemical context is investigated.

Distributed Implementation of the Chemical Paradigm

At present, a distributed implementation of the chemical paradigm does not exist. Our second objective is to develop the concepts and techniques required for such an implementation. Molecules will be distributed among the underlying platform and need to meet to react. To achieve this, we will consider several research tracks. A first track will be algorithmic solutions for information dissemination and retrieval over decentralized (peer-to-peer) networks, allowing nodes to exchange some molecules according to some probabilistic rules. A second track is the development of a shared virtual space gathering the molecules, similar to the series of works conducted around the Distributed Shared Memory (DSM) approach, which simulates a global virtual shared memory on top of a distributed memory platform. In both tracks, we will finally consider fault-tolerance, as we cannot afford loosing (too many) molecules pertained by some reactions of the program, when nodes storing them are unreliable. For example, one of the techniques envisioned for fault-tolerance is replication. Replication must be manipulated with care, as replicating molecules should ensure reactions fulfillment while avoiding to trigger too many reactions (several replicas of the same molecules could trigger a reaction, generating more reactions than specified by the program).