Section: New Software and Platforms
BigGraphs
Functional Description
The objective of BigGraphs is to provide a distributed platform for very large graphs processing. A typical data set for testing purpose is a sample of the Twitter graph : 240GB on disk, 398M vertices, 23G edges, average degree of 58 and max degree of 24635412.
We started the project in 2014 with the evaluation of existing middlewares (GraphX / Spark and Giraph / Hadoop). After having tested some useful algorithms (written according to the BSP model) we decided to develop our own platform.
This platform is based on the existing BigGrph library and we are now in the phasis where we focus on the quality and the improvement of the code. In particular we have designed strong test suites and some non trivial bugs have been fixed. We also have solved problems of scalability, in particular concerning the communication layer with billions of messages exchanged between BSP steps. We also have implemented specific data structures for BSP and support for distributed debugging. This comes along with the implementation of algorithms such as BFS or strongly connected components that are run on the NEF cluster.
-
Participants: Luc Hogie, Nicolas Chleq, David Coudert, Michel Syska.
-
Partner: This project is a joint work of the three EPI Coati , Diana and Scale and is supported by an ADT grant.
Additional softwares
The following software are useful tools that bring basic services to the platform (they are not dedicated to BigGrph ). Participants : Luc Hogie, Nicolas Chleq
-
Jac-a-boo is a framework aiming at facilitating the deployment and the bootstrapping of distributed Java applications over Share-Nothing Clusters (SNCs). The primary motivation for developing Jac-a-boo is to have an efficient and comprehensive deployment infrastructure for the BigGrph distributed graph library. http://www.i3s.unice.fr/~hogie/jacaboo
-
ldjo (Live Distributed Java Objects) is a framework for the development and the deployment of Java distributed data structures. Alongside with data aspect of distributed data structures, ldjo comes with mechanisms for processing them in a distributed/parallel way. In particular it provides implementations of Map/Reduce and Bulk Synchronous Parallel (BSP). http://www.i3s.unice.fr/~hogie/ldjo
-
Octojus provides an object-oriented RPC (Remote Procedure Call) implementation in Java. At a higher abstraction level, Octojus provides a framework for the development of systolic algorithms, a batch scheduler, as well as an implementation of Map/Reduce. The latter is used in the BigGrph graph computing platform. http://www.i3s.unice.fr/~hogie/octojus