Section: Overall Objectives

Objective: efficient support for scalable data-intensive computing

Our research activities focus on the data storage and processing needs of data-intensive applications that that exhibit the need to handle:

  • Massive data BLOBs (Binary Large OBjects), in the order of Terabytes, stored in a large number of nodes (thousands to tens of thousands), accessed under heavy concurrency by a large number of processes (thousands to tens of thousands at a time), with a relatively fine access grain, in the order of Megabytes;

  • Very large sets (millions) of small objects potentially arriving in streams, stored and processed on geographically distributed infrastructures (e.g. multi-site clouds);

  • Very large sets of scientific data processed on extreme-scale supercomputers.

Examples of such applications are:

  • Massively parallel data analytics for Big Data applications (e.g., Map-Reduce-based data analysis as currently enabled by frameworks such as Ha doop, Spark or Flink);

  • Advanced cloud services for data storage and transfer for geographically distributed workflows requiring efficient data sharing within and across multiple datacenters;

  • Scalable solutions for I/O management and in situ visualization for data-intensive scientific simulations (e.g. atmospheric simulations, computational fluid dynamics, etc.) running on Extreme-Scale HPC systems.