Section: New Results
Heterogenous Multi-CPU Multi-GPU Parallel Branch-and-Bound Tree Search
Participants: Trong-Tuan Vu, Bilel Derbel, Nouredine Melab
In this work  , we push forward the design of parallel and distributed optimization algorithms running on heterogenous systems consisting of multiple CPUs coupled with multiple GPUs. We consider parallel Branch-and-Bound (B&B), viewed as a generic algorithm searching in a dynamic tree representing a set of candidate solutions built dynamically at runtime. Given that several distributed CPUs and GPUs coming from possibly different clusters connected through a network can be used to parallelize the tree search, we give new insights into how to fully benefit from such a heterogeneous environment. More precisely, we describe a two-level generic and fully distributed parallel approach taking into account PU characteristics. In the first level, we use data streaming in order to allow parallelism between hosts and devices. The evaluation of tree nodes is done inside a GPU while the CPU-host is performing the pruning, selection and decomposition operations in parallel. In the second level, our approach incorporates an adaptive dynamic load balancing scheme based on distributed work stealing, in order to flow workloads efficiently from overloaded PUs to idle ones at runtime. We deployed our approach over a distributed system of up to 20 GPUs and 128 CPUs coming from three clusters. Different scales and configurations of PUs were experimented with the B&B algorithm and the well-known FlowShop combinatorial optimization problem as a case study. Firstly, on one single GPU, we improve on the running time of previous B&B GPUs implementation by at least a factor of two. More importantly, independently of CPUs or GPUs scale or power, our approach provides a substantial speed-up which is nearly optimal compared to the ideal performance one could expect in theory.