Section: Overall Objectives
Objective: efficient support for scalable data-intensive computing
Our research activities focus on the data storage and processing needs of data-intensive applications that that exhibit the need to handle:
-
Massive data BLOBs (Binary Large OBjects), in the order of Terabytes, stored in a large number of nodes (thousands to tens of thousands), accessed under heavy concurrency by a large number of processes (thousands to tens of thousands at a time), with a relatively fine access grain, in the order of Megabytes;
-
Very large sets (millions) of small objects potentially arriving in streams, stored and processed on geographically distributed infrastructures (e.g. multi-site clouds);
-
Very large sets of scientific data processed on extreme-scale supercomputers.
Examples of such applications are:
-
Massively parallel data analytics for Big Data applications (e.g., Map-Reduce-based data analysis as currently enabled by frameworks such as Ha doop, Spark or Flink);
-
Advanced cloud services for data storage and transfer for geographically distributed workflows requiring efficient data sharing within and across multiple datacenters;
-
Scalable solutions for I/O management and in situ visualization for data-intensive scientific simulations (e.g. atmospheric simulations, computational fluid dynamics, etc.) running on Extreme-Scale HPC systems.