EN FR
EN FR


Section: New Results

Modelling cloud storage performance

Participants : Daniel Higuero, Louis-Claude Canon, Alexandru Costan, Gabriel Antoniu.

The objective of this research direction is to provide comprehensive performance models for storage systems. Their role is to capture how the system components interact for different usage patterns (number of reads or writes). The objective is to determine the incurred costs in terms of storage space and efficiency for a given workload.

One application of this model consists in dynamically adjusting the parameters of the storage system as required in an autonomic approach. For this purpose, it is necessary to identify the characteristics of the storage system for meeting a given level of requirements. Progress has been made on this part during the 3-month visit of Daniel Higuero (University Carlos III, Madrid). A preliminary performance model currently predicts the available bandwidth when multiple concurrent transfers occur. This model serves as a basis for a dimensioning strategy that is formulated through a linear program.

This approach has further been complemented with an offline analysis of several traces of the BlobSeer storage system when it is used as a backend for MapReduce applications. Mining this information in an automated fashion allowed to detect the different trade-offs that influence a BlobSeer deployment: time required to execute the application vs. number of machines used by the storage system, communication costs vs. space usage. The final goal is to tune BlobSeer for specific applications. The proposed strategy is currently being evaluated.

Future directions are directed towards refining the proposed model. Several parameters significantly impact the performance of storage systems such as the redundancy mechanism, the data placement strategy or disk-related effects. As a first step, experiments for assessing the quality of finer models will be designed. Ultimately, we aim to capture the I/O variability of storage systems, in particular in the context of the cloud.

This work will enable new collaborations. It is planned to work on the models mentioned above with the Mescal INRIA team in the context of a collaboration between the MapReduce ANR project and the Songs ANR project. Moreover, in the framework of the MapReduce project, we expect to work on a performance model for designing decision algorithms that are required by the component-based MapReduce framework that is developed in the GRAAL/Avalon INRIA team. Finally, the GRAAL/AVALON team works on scheduling algorithms that could beneficially profit from a storage performance model.