EN FR
EN FR


Section: New Results

HAP: Building Pipelines with Heterogeneous Data and Hive

The increasing number of available datasets gives opportunities to build large and complex applications which aggregate results coming from several sources. These emerging use cases require new systems where the combinations of heterogeneous sources are both allowed and efficient. To tackle these challenges, we built a system [17] offering a simple high-level set of primitives – called HAP – to easily describe processing chains. These descriptions are then compiled into optimized SQL queries executed on Hive.