EN FR
EN FR


Section: New Results

Validity Conditions for Transformations of Non-Affine Programs

Participants : Alain Ketterlin, Philippe Clauss.

This project is a collaborative work with the CORSE Inria Team, in Grenoble. Participant is: Fabrice Rastello.

Representing loop nests with the help of the polyhedral model has been a powerful and fruitful strategy to enable automatic optimization and parallelization. However, this model places strong requirements on the input program, and in many cases these requirements are hard to meet. Because they are based on linear programming, polyhedral techniques require every constraint to be affine in loop counters and parameters. While this is easily verified for loop bounds in a large majority of programs, the same constraint imposed to memory access functions is often too strong. There are several reasons for this. First, programmers often linearize multi-dimensional arrays, turning straightforward accesses like t[i][j] into t1[i*n+j] , with the unfortunate effect of placing their program outside the scope of the polyhedral model. Second, optimization often happens late in the compilation process (or even during just-in-time compilation at run-time), where multi-dimensional array accesses have been transformed by the compiler itself, for the needs of its earlier passes. Third, complex data storage strategies for certain classes of arrays, e.g., band or triangular matrices, may introduce non-linear access functions, and this non-linearity must be taken into account, e.g., for locality optimization. And fourth, some access functions are almost completely unspecified, like in the case of indirect accesses (t[s[i]] ) or abstract mappings (t[f(i)] ).

Our goal is to extend polyhedral analysis techniques to cover at least some of these cases, and see how far we can push the limits of the fundamental algorithms beyond pure linearity. We have started by considering the case of multi-dimensional array linearization, where the code doesn't provide access functions for all (original) dimensions, but rather a single access function, which is linear in loop counters but contains parametric coefficients. Here is an example illustrating our initial target, which is taken from the gemver program in the polybench suite:

  for (i = 0; i < n; i++)

    for (j = 0; j < n; j++)

      // Was: A[i][j] = A[i][j] + u1[i] * v1[j] + ...;

S1:   *(n*i+A+j) = *(n*i+A+j) + *(u1+i) * *(v1+j) + ...;

  for (i = 0; i < n; i++)

    for (j = 0; j < n; j++)

      // Was: x[i] = x[i] + beta * A[j][i] * y[j];

S2:   *(x+i) = *(x+i) + beta * *(n*j+A+i) * *(y+j);

  // ...

The original form of the statements appear in comments, but what finally reaches the compiler is much more convoluted: basically, every array access appears as a pointer access whose effective address is a polynomial function mixing counters (i , j ), array base addresses (A , u1 , v1 , x , y ), and size parameters (n ). In some other cases, the arrays have been “locally” linearized, i.e., the code still displays different arrays, but their inner dimensions have been linearized. In our example, statement S1 would appear as:

      // Was: A[i][j] = A[i][j] + u1[i] * v1[j] + ...;

S1:   A[n*i+j] = A[n*i+j] + u1[i] + v1[j] + ...;

This is an important special case in practice, and its particular structure helps a lot, for example, when data dependence analysis is needed.

Extending current polyhedral techniques to deal with non-affine accesses is a formidable endeavor, requiring the adaptation of the many algorithms developed over decades for analysis, scheduling, and code generation. Rather, we have started by studying a specific task, with immediate practical impact: given a non-affine loop nest and a specific desired transformation, what are the conditions under which this condition is valid? It is not unreasonable to expect the transformation to be provided by other means than pure analysis, for instance to be suggested by profiling data. In this case, the problem we are left with is the one of testing whether the given transformation is valid. This in turn requires testing the emptyness of a “problematic system”. For any given loop nest, this can be written as:

( A , A ' ) ( v , v ' ) s . t . v 𝒟 A v ' 𝒟 A ' ( domain ) v l e x v ' ( original schedule ) A ( v ) = A ' ( v ) ( same access location ) T A ( v ) ¬ l e x T A ' ( v ' ) ( transformed schedule )

where A and A' range over pairs of potentially conflicting accesses, v and v' are iteration vectors, 𝒟A and 𝒟A' are iteration domains, A(v) and A'(v') are access functions, and TA and TA'' are schedules. The condition under which the transformation is valid is the projection of this set on parameter dimensions, i.e., the elimination of all variables representing counters. The difficulty of this comes from the non-affine condition expressing the equality of access functions.

Building on previous work, we have devised a projection procedure that eliminates all counters and leaves a (usually complex) condition on parameters. We have also developed several simplification strategies, applied during elimination and also on the final result, that overall produces a test deciding whether the targeted transformation can be applied. For instance, on the fully linearized version of the previous examples, when deciding whether the following transformation is legal:

T 𝚂 1 ( 0 , i , j ) = ( 0 , i , j ) T 𝚂 2 ( 1 , i , j ) = ( 0 , j , i )

i.e., interchanging the second loop (around S2 ) and then applying fusion on both depth-2 loops, our elimination and simplification procedure produces the following run-time test:

  if ( ((y+n >= x+2) && (x+n >= y+2))

  || ((n >= 2) && (n*n+A >= x+1) && (x >= A+1))

  || ((n >= 2) && (u1+n >= x+1) && (x+n >= u1+2))

  || ((n >= 2) && (n+v1 >= x+1) && (x+n >= v1+1))

  || ((n*n+A >= y+1) && (y >= A+1) && (n >= 2)) || ...)

    // Transformation invalid: run the original version...

  else

    // Transformation valid: run the transformed version...

The reader may want to verify that this test actually corresponds to verifying that “arrays” do not overlap, but only as far as the given transformation requires it.

A systematic evaluation of our procedure on a benchmark suite has shown that the resulting tests are both accurate and incur very little run-time overhead. The overall mechanism compares favorably with alternative techniques aiming at dealing with non-affine access functions, which consist in statically reconstructing array dimensions  [30] . This part of our work is ready for publication. However, to be completely competitive with alternative approaches, we need to find ways to complete the polyhedral compilation chain, with a prior effective scheduling algorithm and a posterior code generation algorithm.