## Section: New Results

### Clock Removal in X10

Participants : Paul Feautrier, Eric Violard [Inria/Camus] , Alain Ketterlin [Inria/Camus] .

In the light of the previous work on the determinism of X10, a natural question is: are the parallel programming directives of X10 redundant? The answer is yes, at least for static control programs, i.e., programs in which the set of operations and their execution order do not depend on the input data. The basic idea is that the synchronization which occurs when several activities execute an advance is similar to the synchronization at the end of a finish. If one is able to count advances, one may construct a front by gathering all operations with the same advance count. Each front is executed inside one finish, and fronts are executed sequentially in order of increasing counts. For polyhedral programs, advance counting can be done at compile time. If the counts are affine functions, the restructuring can be done by classical polyhedral code generators like CLooG, and no overhead is incurred. For polynomial counts, one overall enclosing loop must be added, but the resulting program can usually be optimized by simple loop transformations, e.g., pushing guards into enclosing loop bounds. For arbitrary programs, the counts have to be computed dynamically; this is possible only if the program has static control.

This result does not contradict the previous undecidability proof
(Section
6.13 ), as the translation of a
polyhedral program is usually not polyhedral. Application of the method to a
set of simple kernels has shown significant speedups. The interpretation of
this result is that, at least in the present state of the X10 runtime, the
implementation of the `async` primitive is more mature than the
implementation of clocks. A paper on this topic has been accepted at CC'14
(Compiler Construction Conference) [7] .