Section: New Results

Binary parallelization

Our work on parallelizing binary programs has continued in 2011, with several new results. The general principle is to analyze the binary code and extract a model of the most intense loops. The model has to include everything that is related to memory access, and also some part of the computations done in registers. Once a suitable model is extracted, it can be used to derive a new scheduling for each targeted loops, optimizing various criteria: this is where polyhedral techniques are used, providing algorithms to optimize locality, parallelism, or both at the same time. After a new scheduling is computed, the transformed code is generated by a polyhedron-scanning algorithm. Our approach relies on an intermediate representation whose emphasis is on memory accesses, hiding, i.e., outlining all low-level details and retaining only what is needed by the parallelization component: we use raw C constructs, and macros to denote outlined code. Starting with the executable program, a first phase raises the code into our intermediate representation. The second phase uses a stock parallelizing component, producing a transformed C programs. The last phase lowers this intermediate representation into a new binary executable. The system then uses a run-time monitoring component (generated automatically at the same time as the parallel version) that redirects execution to the transformed loops whenever appropriate.

This year's activity on this topic in our team has started by finalizing and presenting a paper at the IMPACT workshop [20] , held during CGO'2011, in Chamonix (France). This workshop focuses on tools and techniques based on the polyhedral model. It has been interesting to hear the various reactions and remarks of researchers attending our presentation: the general position is that our work opens new perspectives on the use of the polyhedral model, and avoids to need to have a complete polyhedral tool-chain. Another major aspect is the fact that our 3-phase strategy clearly separates the polyhedral part of the whole process, in essence providing a basis for a polyhedral programming language that is slightly more general that what was considered before.

The work on this topic has continued on two directions. The first was to effectively abstract the parallelizing phase. This has been done by demonstrating the use of two distinct parallelizers: the first is PLUTO, a polyhedral locality optimizer and parallelizer, and the second is CETUS, a “simple” parallelizer. The second new direction was directed by the complexity of typical “real-world” executable programs. It has consisted in developing new dependence analysis and parallelization techniques, handling more general classes of programs. This current is currently submitted for publication.

These research results will be presented at the forthcoming HiPEAC conference, to be held in Paris in January 2012. A full-length paper has been accepted for publication in ACM Transactions on Architecture and Code Optimization some time in 2012.