Section: New Results
Dynamic broadcasts in StarPU /NewMadeleine
We worked on the improvement of broadcast performance in StarPU runtime with NewMadeleine . Although StarPU supports MPI , its distributed and asynchronous model to schedule tasks makes it impossible to use MPI optimized routines, such as MPI_Bcast . Indeed these functions need that all nodes participating in the collective are synchronized and know each others, which makes it unusable in practice for StarPU .
We proposed [42], a dynamic broadcast algorithm that runs without synchronization among participants, and where only the root node needs to know the others. Recipient don't even have to know whether the message will arrive as a plain send/receive or through a dynamic broadcast, which allows for a seamless integration in StarPU . We implemented the algorithm in our NewMadeleine communication library, leveraging its event-based paradigm and background progression of communications. Preliminary experiments using Cholesky factorization from the Chameleon library show a sensible performance improvement.