EN FR
EN FR


Section: New Software and Platforms

BigGraphs

Keywords: Graph algorithmics - Distributed computing - Java - Graph processing

Functional Description: The objective of BigGraphs is to provide a distributed platform for very large graphs processing. A typical data set for testing purpose is a sample of the Twitter graph : 240GB on disk, 398M vertices, 23G edges, average degree of 58 and max degree of 24635412.

We started the project in 2014 with the evaluation of existing middlewares (GraphX / Spark and Giraph / Hadoop). After having tested some useful algorithms (written according to the BSP model) we decided to develop our own platform.

This platform is based on the existing BIGGRPH library and we are now in the phasis where we focus on the quality and the improvement of the code. In particular we have designed strong test suites and some non trivial bugs have been fixed. We also have solved problems of scalability, in particular concerning the communication layer with billions of messages exchanged between BSP steps. We also have implemented specific data structures for BSP and support for distributed debugging. This comes along with the implementation of algorithms such as BFS or strongly connected components that are run on the NEF cluster.

In 2017 we have developed a multi-threaded shared-memory parallel version of the Bulk Synchronous Parallel framework. This new version uses advanced synchronization mechanisms and strategies to minimize the congestion of multiple threads working on the same graph. Using the NEF cluster (Inria Sophia Antipolis), this parallel version exhibits speed-ups up to 6.5 using 8 nodes (16 cores each) when computing a BFS on the 23 G edges Twitter graph sample.