<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN" "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
    <title>Team:COMPSYS</title>
    <link rel="stylesheet" href="../static/css/raweb.css" type="text/css"/>
    <meta name="description" content="Research Program - Code Analysis, Code Transformations, Code Optimizations"/>
    <meta name="dc.title" content="Research Program - Code Analysis, Code Transformations, Code Optimizations"/>
    <meta name="dc.subject" content=""/>
    <meta name="dc.publisher" content="INRIA"/>
    <meta name="dc.date" content="(SCHEME=ISO8601) 2016-01"/>
    <meta name="dc.type" content="Report"/>
    <meta name="dc.language" content="(SCHEME=ISO639-1) en"/>
    <meta name="projet" content="COMPSYS"/>
    <script type="text/javascript" src="https://raweb.inria.fr/rapportsactivite/RA2016/static/MathJax/MathJax.js?config=TeX-MML-AM_CHTML">
      <!--MathJax-->
    </script>
  </head>
  <body>
    <div class="tdmdiv">
      <div class="logo">
        <a href="http://www.inria.fr">
          <img style="align:bottom; border:none" src="../static/img/icons/logo_INRIA-coul.jpg" alt="Inria"/>
        </a>
      </div>
      <div class="TdmEntry">
        <div class="tdmentete">
          <a href="uid0.html">Team Compsys</a>
        </div>
        <span>
          <a href="uid1.html">Members</a>
        </span>
      </div>
      <div class="TdmEntry">Overall Objectives<ul><li><a href="./uid3.html">Introduction</a></li><li><a href="./uid5.html">General Presentation</a></li><li><a href="./uid10.html">Summary of Compsys I Achievements</a></li><li><a href="./uid14.html">Summary of Compsys II
Achievements</a></li><li><a href="./uid19.html">Summary of Compsys III
Achievements</a></li></ul></div>
      <div class="TdmEntry">Research Program<ul><li><a href="uid23.html&#10;&#9;&#9;  ">Architecture and Compilation Trends</a></li><li class="tdmActPage"><a href="uid38.html&#10;&#9;&#9;  ">Code Analysis, Code Transformations, Code Optimizations</a></li><li><a href="uid41.html&#10;&#9;&#9;  ">Mathematical Tools</a></li></ul></div>
      <div class="TdmEntry">Application Domains<ul><li><a href="uid43.html&#10;&#9;&#9;  ">Compilers for Embedded Computing Systems</a></li><li><a href="uid44.html&#10;&#9;&#9;  ">Users of HPC Platforms and Scientific Computing</a></li></ul></div>
      <div class="TdmEntry">
        <a href="./uid48.html">Highlights of the Year</a>
      </div>
      <div class="TdmEntry">New Software and Platforms<ul><li><a href="uid67.html&#10;&#9;&#9;  ">Lattifold</a></li><li><a href="uid70.html&#10;&#9;&#9;  ">PolyOrdo</a></li><li><a href="uid72.html&#10;&#9;&#9;  ">OpenOrdo</a></li><li><a href="uid74.html&#10;&#9;&#9;  ">ppcg-paramtiling</a></li></ul></div>
      <div class="TdmEntry">New Results<ul><li><a href="uid78.html&#10;&#9;&#9;  ">Handling Polynomials for Program
Analysis and Transformation</a></li><li><a href="uid79.html&#10;&#9;&#9;  ">Static Analysis of OpenStream Programs</a></li><li><a href="uid80.html&#10;&#9;&#9;  ">Liveness Analysis in Explicitly-Parallel
Programs</a></li><li><a href="uid81.html&#10;&#9;&#9;  ">Extended Lattice-Based Memory Allocation</a></li><li><a href="uid82.html&#10;&#9;&#9;  ">Stencil Accelerators</a></li><li><a href="uid83.html&#10;&#9;&#9;  ">Efficient Mapping of Irregular Memory Accesses on FPGA</a></li><li><a href="uid85.html&#10;&#9;&#9;  ">PolyApps</a></li></ul></div>
      <div class="TdmEntry">Bilateral Contracts and Grants with Industry<ul><li><a href="uid87.html&#10;&#9;&#9;  ">Bilateral Contracts with Industry</a></li><li><a href="uid88.html&#10;&#9;&#9;  ">Bilateral Grants with Industry</a></li></ul></div>
      <div class="TdmEntry">Partnerships and Cooperations<ul><li><a href="uid90.html&#10;&#9;&#9;  ">Regional Initiatives</a></li><li><a href="uid91.html&#10;&#9;&#9;  ">National Initiatives</a></li><li><a href="uid96.html&#10;&#9;&#9;  ">European Initiatives</a></li><li><a href="uid100.html&#10;&#9;&#9;  ">International Initiatives</a></li><li><a href="uid106.html&#10;&#9;&#9;  ">International Research Visitors</a></li></ul></div>
      <div class="TdmEntry">Dissemination<ul><li><a href="uid115.html&#10;&#9;&#9;  ">Promoting Scientific Activities</a></li><li><a href="uid128.html&#10;&#9;&#9;  ">Teaching - Supervision - Juries</a></li><li><a href="uid139.html&#10;&#9;&#9;  ">Popularization</a></li></ul></div>
      <div class="TdmEntry">
        <div>Bibliography</div>
      </div>
      <div class="TdmEntry">
        <ul>
          <li>
            <a id="tdmbibentyear" href="bibliography.html">Publications of the year</a>
          </li>
          <li>
            <a id="tdmbibentfoot" href="bibliography.html#References">References in notes</a>
          </li>
        </ul>
      </div>
    </div>
    <div id="main">
      <div class="mainentete">
        <div id="head_agauche">
          <small><a href="http://www.inria.fr">
	    
	    Inria
	  </a> | <a href="../index.html">
	    
	    Raweb 
	    2016</a> | <a href="http://www.inria.fr/en/teams/compsys">Presentation of the Team COMPSYS</a> | <a href="http://www.ens-lyon.fr/LIP/COMPSYS/index.html.en">COMPSYS Web Site
	  </a></small>
        </div>
        <div id="head_adroite">
          <table class="qrcode">
            <tr>
              <td>
                <a href="compsys.xml">
                  <img style="align:bottom; border:none" alt="XML" src="../static/img/icons/xml_motif.png"/>
                </a>
              </td>
              <td>
                <a href="compsys.pdf">
                  <img style="align:bottom; border:none" alt="PDF" src="IMG/qrcode-compsys-pdf.png"/>
                </a>
              </td>
              <td>
                <a href="../compsys/compsys.epub">
                  <img style="align:bottom; border:none" alt="e-pub" src="IMG/qrcode-compsys-epub.png"/>
                </a>
              </td>
            </tr>
            <tr>
              <td/>
              <td>PDF
</td>
              <td>e-Pub
</td>
            </tr>
          </table>
        </div>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid23.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid41.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
      <div id="textepage">
        <!--DEBUT2 du corps du module-->
        <h2>Section: 
      Research Program</h2>
        <h3 class="titre3">Code Analysis, Code Transformations, Code Optimizations</h3>
        <p>Embedded systems, as we recalled earlier, generated new problems in
code analysis and optimization both for optimizing embedded software
(compilation) and hardware (HLS). We now give a bit more details on
some general challenges for program analysis, optimizations, and
transformations, induced by this context, and on our methodology, in
particular our development and use of polyhedral optimizations and
its extensions.</p>
        <a name="uid39"/>
        <h4 class="titre4">Processes, Scheduling, Mapping, Communications, etc.</h4>
        <p>Before mapping an application to an architecture,
one has to decide which execution model is targeted and where to
intervene in the design flow. Then one has to solve scheduling,
placement, and memory management problems. These three aspects should
be handled as a whole, but present state of the art dictates that they
be treated separately. One of our aims was to develop more
comprehensive solutions. The last task is code generation, both for
the processing elements and the interfaces processors/accelerators.</p>
        <p>There are basically two execution models for embedded systems: one is
the classical accelerator model, in which data is deposited in the
memory of the accelerator, which then does its job, and returns the
results. In the streaming model, computations are done on the fly, as
data items flow from an input channel to the output. Here, the data
are never stored in (addressable) memory. Other models are special
cases, or sometimes compositions of the basic models. For instance, a
systolic array follows the streaming model, and sometimes extends it
to higher dimensions. Software radio modems follow the streaming model
in the large, and the accelerator model in detail. The use of first-in
first-out queues (FIFO) in hardware design is an application of the
streaming model. Experience shows that designs based on the streaming
model are more efficient that those based on memory, for such
applications. One of the point to be investigated is whether it is
general enough to handle arbitrary (regular) programs. The answer is
probably negative. One possible implementation of the streaming model
is as a network of communicating processes either as Kahn process
networks (FIFO based) or as our more recent model of communicating
regular processes (memory based, such as CRP mentioned hereafter). It
is an interesting fact that several researchers have investigated
the translation from process networks <a href="./bibliography.html#compsys-2016-bid0">[12]</a> and to process
networks <a href="./bibliography.html#compsys-2016-bid1">[20]</a>, <a href="./bibliography.html#compsys-2016-bid2">[21]</a>. Streaming languages such as
StreamIt and OpenStream are also interesting
solutions to explore.</p>
        <p>Kahn process networks (KPN) were introduced 30 years ago as a notation
for representing parallel programs. Such a network is built from
processes that communicate via perfect FIFO channels. Because the
channel histories are deterministic, one can define a semantics and
talk meaningfully about the equivalence of two implementations. As a
bonus, the dataflow diagrams used by signal processing specialists can
be translated on-the-fly into process networks. The problem with KPNs
is that they rely on an asynchronous execution model, while VLIW
processors and FPGAs are synchronous or partially synchronous. Thus,
there is a need for a tool for synchronizing KPNs. This can be done by
computing a schedule that has to satisfy data dependences within each
process, a causality condition for each channel (a message cannot be
received before it is sent), and real-time constraints. However, there
is a difficulty in writing the channel constraints because one has to
count messages in order to establish the send/receive correspondence
and, in multi-dimensional loop nests, the counting functions may not
be affine. The same situation arises for the OpenStream language (see
Section <a title="Static Analysis of OpenStream Programs" href="./uid79.html">7.2</a>. Recent
developments on the theory of polynomials (see
Section <a title="Handling Polynomials for Program&#10;Analysis and Transformation" href="./uid78.html">7.1</a>) may offer a
solution to this problem. One can also define another model,
<i>communicating regular processes</i> (CRP), in which channels are
represented as write-once/read-many arrays. One can then dispense with
counting functions and prove that the determinacy property still
holds. As an added benefit, a communication system in which the
receive operation is not destructive is closer to the expectations of
system designers.</p>
        <p>The main difficulty with this approach is that ordinary programs are usually
not constructed as process networks. One needs automatic or semi-automatic
tools for converting sequential programs into process networks. One
possibility is to start from array dataflow analysis <a href="./bibliography.html#compsys-2016-bid3">[15]</a> or
variants.
Another approach attempts to construct threads, i.e., pieces of sequential code
with the smallest possible interactions. In favorable cases, one may even find
outermost parallelism, i.e., threads with no interactions whatsoever. Tiling
mechanisms can also be used to define atomic processes that can be pipelined as we proposed initially for FPGA <a href="./bibliography.html#compsys-2016-bid4">[9]</a>.</p>
        <p>Whatever the chosen solution (FIFO or addressable memory) for communicating
between two accelerators or between the host processor and an accelerator, the
problems of optimizing communication between processes and of optimizing
buffers have to be addressed. Many local memory optimization problems have
already been solved theoretically. Some examples are loop fusion and loop
alignment for array contraction,
techniques for data allocation in scratch-pad memory, or techniques for folding
multi-dimensional arrays <a href="./bibliography.html#compsys-2016-bid5">[11]</a>. Nevertheless, the problem is
still largely open. Some questions are: how to schedule a loop sequence (or
even a process network) for minimal scratch-pad memory size? How is the problem
modified when one introduces unlimited and/or bounded parallelism (same
questions for analyzing explicitly-parallel programs)? How does one take into
account latency or throughput constraints, bandwidth constraints for input and
output channels, memory hierarchies? All loop transformations are useful in
this context, in particular loop tiling, and may be applied either as
source-to-source transformations (when used in front of HLS or C-level
compilers) or to generate directly VHDL or lower-level C-dialects such as
OpenCL. One should keep in mind that theory will not be sufficient to solve
these problems. Experiments are required to check the relevance of the various
models (computation model, memory model, power consumption model) and to select
the most important factors according to the architecture. Besides,
optimizations do interact: for instance, reducing memory size and increasing
parallelism are often antagonistic. Experiments will be needed to find a global
compromise between local optimizations. In particular, the design of cost
models remain a fundamental challenge.</p>
        <p>Finally, there remains the problem of code generation for accelerators. It is a
well-known fact that methods for program optimization and
parallelization do not generate a new program, but just deliver blueprints for
program generation, in the form, e.g., of schedules, placement functions, or
new array subscripting functions. A separate code generation phase must be
crafted with care, as a too naive implementation may destroy the benefits
of high-level optimization. There are two possibilities here as suggested
before; one may target another high-level synthesis or compilation tool, or one
may target directly VHDL or low-level code. Each approach has its advantages
and drawbacks. However, both situations require that the input program
respects some strong constraints on the code shape, array accesses, memory
accesses, communication protocols, etc. Furthermore, to get the compilers do
what the user wants requires a lot of program tuning, i.e., of program
rewriting or of program annotations. What can be automated in this rewriting
process? Semi-automated?</p>
        <p>In other words, we still need to address scheduling, memory,
communication, and code generation issues, in the light of the
developments of new languages and architectures, pushing the limits of
such an automation of program analysis, program optimizations, and
code generation.</p>
        <a name="uid40"/>
        <h4 class="titre4">Beyond Static Control Programs</h4>
        <p>With the advent of parallelism in supercomputers, the bulk of research in code
transformation resulted in (semi-)automatic parallelization, with many
techniques (analysis, scheduling, code generation, etc.) based on the
description and manipulation of nested loops with polyhedra. Compsys has always
taken an active part in the development of these so-called “polyhedral
techniques”. Historically, these analysis were (wrongly) understood to be
limited to static control programs.</p>
        <p>Actually, the polyhedral model is neither a programming language nor an execution model,
rather an intermediate representation.
As such, it can be generated from imperative sequential languages like
C or Fortran, streaming languages like CRP, or equational languages like Alpha.
While the structure of the model is the same in all three cases, it may enjoy
different properties, e.g., a schedule always exists in the
first case, not in the two others. The import of the
polyhedral model is that many questions relative to the analysis of a program
and the applicability of transformations can be answered precisely and
efficiently by applying well-known mathematical results to the model.</p>
        <p>For irregular programs, the basic idea is to construct a polyhedral
over-approximation, i.e., a program which has more operations, a
larger memory footprint, and more dependences than the original. One
can then parallelize the approximated program using polyhedral tools,
and then return to the original, either by introducing guards, or by
insuring that approximations are harmless. This technique is the
standard way of dealing with approximated dependences. We already
started to study the impact of approximations in our kernel offloading
technique, for optimizing remote
communications <a href="./bibliography.html#compsys-2016-bid6">[10]</a>. It is clear however that
this extension method based on over-approximation will apply only to mildly
non-polyhedral programs. The restriction to arrays as the only data
structure is still present. Its advantage is that it will be able to
subsume in a coherent framework many disparate tricks: the extraction
of SCoPs, induction variable detection, the omission of non-affine
subscripts, or the conversion of control dependences into data
dependences. The link with the techniques developed in the PIPS
compiler (based on array region analysis) is strong and will have to
be explored.</p>
        <p>Such over-approximations can be found by mean of abstract
interpretation, a general framework to develop static analysis on
real-life programs.
However, they were designed mainly for verification purposes, thus
precision was the main issue before scalability. Although many efforts
were made in designing specialized analyses (pointers, data
structures, arrays), these approaches still suffer from a lack of
experimental evidence concerning their applicability for code
optimization. Following our experience and work on termination
analysis (that connects the work on back-end CFG-like and front-end
polyhedral-like optimizations), and our work on range analysis of
numerical variables and on the memory footprint on real-world C
programs <a href="./bibliography.html#compsys-2016-bid7">[18]</a>, one of our objectives for the
future was to bridge the gap between abstract interpretation and
compilation, by designing cheaper analyses that scale well, mainly
based on compact representations derived from variants of static
single assignment (SSA), with a special focus on complex control, and
complex data structures (pointers, lists) that still suffer from
complexity issues in the area of optimization.</p>
        <p>Another possibility is to rely on
application specific knowledge to guide compiler decisions,
as it is impossible for a compiler alone to fully exploit such pieces
of information. A possible approach to better utilize such knowledge
is to put the programmers “in the loop”.
Expert parallel programmers often have a good idea about coarse-grain
parallelism and locality that they want to use for an application. On
the other hand, fine-grain parallelism (e.g., ILP, SIMD) is tedious
and specific to each underlying architecture, and is best left to the
compiler. Furthermore, approximations will have opportunities to be
refined using programmer knowledge. The key challenge is to create a
programming environment where compiler techniques and programmer
knowledge can be combined effectively. One of the difficulties is to
design a common language between the compiler and the programmer. The
first step towards this objective is to establish inter-disciplinary
collaborations with users, and take the time to analyze and optimize
their applications together.M
</p>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid23.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid41.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
    </div>
  </body>
</html>
