Section: New Results
System support: System support for multicore machines
Participants : Vivien Quéma, Renaud Lachaize, Fabien Gaud, Baptiste Lepers, Sylvain Genevès, Fabien Mottet.
Multicore machines with Non-Uniform Memory Accesses (NUMA) are becoming commodity platforms. Efficiently exploiting their resources remains an open research problem. Most of the body of existing work focuses on increasing locality between computations and memory or I/O resources. This is achieved by allocating data items preferably in local memory nodes, by moving computations close to I/O devices or by moving already allocated memory pages close to the applications which use them most. In all these works, researchers always assume that all processors have equal memory performance. Nevertheless, this assumption is not always valid. In 2011, we have studied the performance achieved by a 16-core NUMA exhibiting an irregular connectivity between processors. Some processors are directly connected to all other processors and access memory nodes with a low latency. Other processors have a lower degree of connectivity and need more hops to access some memory nodes and access memory with a higher latency.
Current operating systems are not aware of such performance characteristics. We have shown that the completion time of applications taken from the PARSEC benchmark suite can vary by up to 15% depending on the processor they are scheduled on. We have thus proposed a new OS scheduler that takes this asymmetry into account in order to make efficient decisions. This scheduler relies on a new metric, called MAPI (number of main Memory Accesses Per retired Instruction), to predict the impact of processor interconnect asymmetry on the performance of applications. We have empirically evaluated the relevance of this metric on applications taken from the PARSEC benchmark suite. We have shown that this metric helps estimating the performance gap between running an application on a "well-interconnected” processor and on a "weakly-interconnected” one. Using this metric, the scheduler we proposed makes efficient decisions. More precisely, we have observed that the scheduler always performed within 3% of the best possible scheduling decision. This work is currently under submission.