Section: New Software and Platforms
BigGrph
-
The objective of “biggrph” is to provide a distributed platform for very large graphs processing. A typical data set for testing purpose is a sample of the Twitter graph : 240GB on disk, 398M vertices, 23G edges, average degree of 58 and max degree of 24,635,412.
We started the project in 2014 with the evaluation of existing middleware (GraphX / Spark and Giraph / Hadoop). After having tested some useful algorithms (written according to the BSP model) we decided to develop our own platform.
The development of the “biggrph” platform is now at the stage where we focus on the quality and the improvement of the code.
In particular we have designed strong test suites and some non trivial bugs have been fixed. We have also solved problems of scalability, in particular concerning the communication layer with billions of messages exchanged between BSP steps. Moreover, we have implemented specific data structures for BSP and support for distributed debugging. This comes along with the implementation of algorithms such as BFS or strongly connected components that are run on the NEF cluster (a facility maintained at Inria Sophia Antipolis).
-
This project is a joint work of the three EPs Coati, Diana and Scale and is supported by an ADT grant.