EN FR
EN FR


Section: Scientific Foundations

Our goals and methodology

Managing data at large scales is paramount nowadays and many application areas exhibit a need for efficient scaling to huge data sizes: data mining applications  [55] , multimedia applications  [46] , database-oriented applications ( [49] , [67] , [62] ), bioinformatic applications, etc. In such contexts, one important goal is to provide mechanisms allowing to transparently manage massive data blocks (e.g., of several terabytes), while providing efficient, fine-grain access to small parts of the data.

The overall goal of the KerData team is to bring a substantial contribution to the effort of the research community to address the above challenges. More specifically, to support the large-scale execution of the applications we described, KerData aims to design and implement distributed algorithms for scalable data storage and input/output management for efficient large-scale data processing. We target two main execution infrastructures: cloud platforms and post-Petascale HPC supercomputers. We are also looking at other kinds of infrastructures (that we are considering as secondary), e.g. hybrid platforms combining enterprise desktop grids extended to cloud platforms.

Our approach relies on building prototypes and on their large-scale experimental validation on real testbeds and experimental platforms. In our current projects, our target platforms include: the Grid'5000 testbed, Amazon and Microsoft's Azure commercial clouds, public clouds based on open-source IaaS toolkits such as Nimbus and OpenNebula. In the HPC area we have access to the Jaguar and Kraken supercomputers (ranked 3rd and 11th respectively in the Top 500 supercomputer list). Last but not least, our methodology includes large-scale validations of our solutions with real-life applications, such as the ones described in Section  4.1 . To this purpose, we have started to build partnerships with the application communities that can potentially benefit from our contributions and we will continue to do so in future collaborative projects.