Section: New Software and Platforms
FP-Hadoop
Fast Parallel Hadoop
Keywords: Hadoop - Data parallelism
Functional Description: FP-Hadoop makes the reduce side of Hadoop MapReduce more parallel and efficiently deals with the problem of data skew in the reduce side. In FP-Hadoop, there is a new phase, called intermediate reduce (IR), in which blocks of intermediate values, constructed dynamically, are processed by intermediate reduce workers in parallel. Our experiments using FP-Hadoop using synthetic and real benchmarks have shown excellent performance gains compared to native Hadoop, e.g. more than 10 times in reduce time and 5 times in total execution time.
-
Participants: Reza Akbarinia, Miguel Liroz-Gistau and Patrick Valduriez
-
Publication: FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data