EN FR
EN FR


Section: New Software and Platforms

Other software

JetStream

Title:

JetStream: Enabling High-Performance Event Streaming across Cloud Data-Centers.

Keywords:

Big Data, streaming, data transfer, multisite cloud.

Scientific Description.

JetStream is a middleware solution for batch-based, high-performance streaming across cloud data centers. JetStream implements a set of context-aware strategies to optimize batch-based streaming, being able to self-adapt to changing conditions.

Functional Description.

The system provides multi-route streaming across cloud data centers for aggregating bandwidth by leveraging the network parallelism. It enables easy deployment across .Net frameworks and seamless binding with event processing engines such as StreamInsight. JetStream is currently used at Microsoft Research ATLE Munich for the management of the Azure cloud infrastructure.

Participants:

Ovidiu-Cristian Marcu, Alexandru Costan, Gabriel Antoniu.

Contact:

Alexandru Costan.

Omnisc'IO

Title:

Omnisc'IO: a Grammar-Based Approach to Spatial and Temporal I/O Patterns Prediction.

Keywords:

HPC, Input-Output, Prediction, Grammar.

Scientific Description.

Omnisc'IO is a library that aims to be integrated into I/O middleware.

Functional Description.

It traces I/O operations, models the stream of such operations using grammar-inference techniques, and predicts when new I/O operations will be performed, as well as where and how much data will be written.

Participants:

Matthieu Dorier (ANL), Gabriel Antoniu, Shadi Ibrahim.

Contact:

Gabriel Antoniu.

OverFlow

Title:

OverFlow: Workflow Data Management as a Service for Multi-Site Applications.

Keywords:

Small data; workflow; multi-site cloud.

Scientific Description.

OverFlow is a uniform data management system for scientific workflows running across geographically distributed sites, aiming to reap economic benefits from this geo-diversity. The software is environment-aware, as it monitors and models the global cloud infrastructure, offering high and predictable performance for transfer cost and time, within and across sites.

Functional Description.

OverFlow proposes a set of pluggable services, grouped in a data-scientist cloud kit. They provide the applications with the possibility to monitor the underlying infrastructure, to exploit smart data compression, deduplication and geo-replication, to evaluate data management costs, to set a tradeoff between money and time, and optimize the transfer strategy accordingly. Currently, OverFlow is used for data transfers by the Microsoft Research ATLE Munich team as well as for synthetic benchmarks at the Politehnica University of Bucharest.

Participants:

Paul Le Noac'h, Ovidiu-Cristian Marcu, Alexandru Costan and Gabriel Antoniu.

Contact:

Alexandru Costan.

iHadoop

Title:

iHadoop: A Hadoop Simulator Developed In Java on Top of SimGrid.

Keywords:

Simulation, Map-Reduce, Hadoop, SimGrid.

Scientific Description.

iHadoop is a Hadoop simulator developed in Java on top of SimGrid. It simulates the behavior of Hadoop and therefore accurately predicts the performance of Hadoop in normal scenarios and under failures. iHadoop is extended to (1) simulate the execution and predict the performance of multiple Map-Reduce applications; (2) simulate the execution of Map-Reduce applications under various data distributions and data skew models.

Functional Description.

iHadoop is an internal software prototype, which was initially developed to validate our idea regarding the behavior of Hadoop under failures. iHadoop has preliminarily evaluated within our group and it has shown very high accuracy to predict the execution time of a Map-Reduce applications. We intend to integrate iHadoop within the SimGrid distribution and make it available to the SimGrid community.

Participants:

Shadi Ibrahim and Tien-Dat Phan.

Contact:

Shadi Ibrahim.