## Section: New Results

### Parametric Tiling with Inter-Tile Data Reuse

Participants : Alain Darte, Alexandre Isoard.

Loop tiling is a loop transformation widely used to improve spatial and temporal data locality, increase computation granularity, and enable blocking algorithms, which are particularly useful when offloading kernels on platforms with small memories. When hardware caches are not available, data transfers must be software-managed: they can be reduced by exploiting data reuse between tiles and, this way, avoid some useless external communications. An important parameter of loop tiling is the sizes of the tiles, which impact the size of the necessary local memory. However, for most analyzes that involve several tiles, which is the case for inter-tile data reuse, the tile sizes induce non-linear constraints, unless they are numerical constants. This complicates or prevents a parametric analysis. In this work, we showed that, actually, parametric tiling with inter-tile data reuse is nevertheless possible.

Our solution is the first parametric solution for generating the memory
transfers needed when a kernel is offloaded to a distant accelerator, tile by
tile after loop tiling, and when all intermediate results are stored locally on
the accelerator. For such computations, there is a complete decoupling between
loads and stores, and when a value has been defined in a previous tile, it has
to be loaded from the local memory and not from the distant memory as this
memory is not yet up-to-date. In other words, inter-tile reuse is mandatory.
This also saves external communications. Our solution is parametric in the
sense that we derive the set of loads and stores from and to the distant memory
with the tile sizes as parameters. Although the direct formulation is
quadratic, we can still solve it in an affine way by developing techniques that
consider, in the analysis, all (unaligned) possible tiles obtained by
translation and not just those that belong to a tiling (partitioning) of the
iteration space. We were able to use a similar technique to also parameterize
the computations of local memory sizes, thanks to parametric lifetime analysis
and folding with modulos, even for pipeline schedules similar to double
buffering. Our method is currently implemented with the `iscc`
calculator of `ISL` , a library for the manipulation of integer sets
defined with Presburger arithmetic.

Also, the whole analysis can handle approximations thanks to the introduction of the concept of pointwise functions, well suited to deal with unaligned tiles. We believe that this technique can be used for other applications linked to the extension of the polyhedral model as it turns out to be fairly powerful. Our future work will be to derive efficient approximation techniques, either because the program cannot be fully analyzable, or because approximations can speed-up or simplify the results of the analysis without losing much in terms of memory transfers and/or memory sizes.

This work has been accepted for publication at IMPACT'14 [5] .