<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN" "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
    <title>Project-Team:COMPSYS</title>
    <link rel="stylesheet" href="../static/css/raweb.css" type="text/css"/>
    <meta name="description" content="Research Program - Architecture and Compilation Trends"/>
    <meta name="dc.title" content="Research Program - Architecture and Compilation Trends"/>
    <meta name="dc.subject" content=""/>
    <meta name="dc.publisher" content="INRIA"/>
    <meta name="dc.date" content="(SCHEME=ISO8601) 2015-01"/>
    <meta name="dc.type" content="Report"/>
    <meta name="dc.language" content="(SCHEME=ISO639-1) en"/>
    <meta name="projet" content="COMPSYS"/>
    <!-- Piwik -->
    <script type="text/javascript" src="/rapportsactivite/piwik.js"></script>
    <noscript><p><img src="//piwik.inria.fr/piwik.php?idsite=49" style="border:0;" alt="" /></p></noscript>
    <!-- End Piwik Code -->
  </head>
  <body>
    <div class="tdmdiv">
      <div class="logo">
        <a href="http://www.inria.fr">
          <img style="align:bottom; border:none" src="../static/img/icons/logo_INRIA-coul.jpg" alt="Inria"/>
        </a>
      </div>
      <div class="TdmEntry">
        <div class="tdmentete">
          <a href="uid0.html">Project-Team Compsys</a>
        </div>
        <span>
          <a href="uid1.html">Members</a>
        </span>
      </div>
      <div class="TdmEntry">Overall Objectives<ul><li><a href="./uid3.html">Introduction</a></li><li><a href="./uid6.html">General Presentation</a></li><li><a href="./uid11.html">Summary of Compsys I Achievements</a></li><li><a href="./uid17.html">Quick View of Compsys II
Achievements and Directions for Compsys III</a></li></ul></div>
      <div class="TdmEntry">Research Program<ul><li class="tdmActPage"><a href="uid21.html&#10;&#9;&#9;  ">Architecture and Compilation Trends</a></li><li><a href="uid36.html&#10;&#9;&#9;  ">Code Analysis, Code Transformations, Code Optimizations</a></li><li><a href="uid39.html&#10;&#9;&#9;  ">Mathematical Tools</a></li></ul></div>
      <div class="TdmEntry">Application Domains<ul><li><a href="uid41.html&#10;&#9;&#9;  ">Compilers for Embedded Computing Systems</a></li><li><a href="uid42.html&#10;&#9;&#9;  ">Users of HPC Platforms and Scientific Computing</a></li></ul></div>
      <div class="TdmEntry">
        <a href="./uid46.html">Highlights of the Year</a>
      </div>
      <div class="TdmEntry">New Software and Platforms<ul><li><a href="uid57.html&#10;&#9;&#9;  ">Aspic</a></li><li><a href="uid61.html&#10;&#9;&#9;  ">DCC</a></li><li><a href="uid64.html&#10;&#9;&#9;  ">Lattifold</a></li><li><a href="uid67.html&#10;&#9;&#9;  ">OpenOrdo</a></li><li><a href="uid69.html&#10;&#9;&#9;  ">PoCo</a></li><li><a href="uid72.html&#10;&#9;&#9;  ">PolyOrdo</a></li><li><a href="uid74.html&#10;&#9;&#9;  ">PPCG-ParamTiling</a></li><li><a href="uid77.html&#10;&#9;&#9;  ">Termite</a></li><li><a href="uid81.html&#10;&#9;&#9;  ">Vaphor</a></li></ul></div>
      <div class="TdmEntry">New Results<ul><li><a href="uid86.html&#10;&#9;&#9;  ">Studying Optimal Spilling in the Light of SSA</a></li><li><a href="uid87.html&#10;&#9;&#9;  ">Symbolic Range of Pointers in C programs</a></li><li><a href="uid88.html&#10;&#9;&#9;  ">Analyzing C Programs with Arrays</a></li><li><a href="uid89.html&#10;&#9;&#9;  ">Termination of C Programs</a></li><li><a href="uid90.html&#10;&#9;&#9;  ">Data-aware Process Networks</a></li><li><a href="uid91.html&#10;&#9;&#9;  ">Mono-parametric Tiling</a></li><li><a href="uid92.html&#10;&#9;&#9;  ">Exact and Approximated Data-Reuse Optimizations
for Tiling with Parametric Sizes</a></li><li><a href="uid93.html&#10;&#9;&#9;  ">Analysis of X10 Programs</a></li><li><a href="uid94.html&#10;&#9;&#9;  ">Revisiting Loop Transformations with X10 Clocks</a></li><li><a href="uid95.html&#10;&#9;&#9;  ">Static Analysis of OpenStream Programs</a></li><li><a href="uid96.html&#10;&#9;&#9;  ">Handling Polynomials for Program Analysis and
Transformation</a></li><li><a href="uid97.html&#10;&#9;&#9;  ">Liveness Analysis in Explicitly-Parallel
Programs</a></li><li><a href="uid98.html&#10;&#9;&#9;  ">Extended Lattice-Based Memory Allocation</a></li><li><a href="uid99.html&#10;&#9;&#9;  ">Stencil Accelerators</a></li><li><a href="uid101.html&#10;&#9;&#9;  ">PolyApps</a></li></ul></div>
      <div class="TdmEntry">Bilateral Contracts and Grants with Industry<ul><li><a href="uid103.html&#10;&#9;&#9;  ">ManycoreLabs Project with Kalray</a></li><li><a href="uid104.html&#10;&#9;&#9;  ">Technological Transfer: XtremLogic Start-Up</a></li></ul></div>
      <div class="TdmEntry">Partnerships and Cooperations<ul><li><a href="uid106.html&#10;&#9;&#9;  ">Regional Initiatives</a></li><li><a href="uid110.html&#10;&#9;&#9;  ">National Initiatives</a></li><li><a href="uid115.html&#10;&#9;&#9;  ">European Initiatives</a></li><li><a href="uid118.html&#10;&#9;&#9;  ">International Initiatives</a></li><li><a href="uid126.html&#10;&#9;&#9;  ">International Research Visitors</a></li></ul></div>
      <div class="TdmEntry">Dissemination<ul><li><a href="uid136.html&#10;&#9;&#9;  ">Promoting Scientific Activities</a></li><li><a href="uid149.html&#10;&#9;&#9;  ">Teaching - Supervision - Juries</a></li><li><a href="uid176.html&#10;&#9;&#9;  ">Popularization</a></li></ul></div>
      <div class="TdmEntry">
        <div>Bibliography</div>
      </div>
      <div class="TdmEntry">
        <ul>
          <li>
            <a id="tdmbibentyear" href="bibliography.html">Publications of the year</a>
          </li>
          <li>
            <a id="tdmbibentfoot" href="bibliography.html#References">References in notes</a>
          </li>
        </ul>
      </div>
    </div>
    <div id="main">
      <div class="mainentete">
        <div id="head_agauche">
          <small><a href="http://www.inria.fr">
	    
	    Inria
	  </a> | <a href="../index.html">
	    
	    Raweb 
	    2015</a> | <a href="http://www.inria.fr/en/teams/compsys">Presentation of the Project-Team COMPSYS</a> | <a href="http://www.ens-lyon.fr/LIP/COMPSYS/index.html.en">COMPSYS Web Site
	  </a></small>
        </div>
        <div id="head_adroite">
          <table class="qrcode">
            <tr>
              <td>
                <a href="compsys.xml">
                  <img style="align:bottom; border:none" alt="XML" src="../static/img/icons/xml_motif.png"/>
                </a>
              </td>
              <td>
                <a href="compsys.pdf">
                  <img style="align:bottom; border:none" alt="PDF" src="IMG/qrcode-compsys-pdf.png"/>
                </a>
              </td>
              <td>
                <a href="../compsys/compsys.epub">
                  <img style="align:bottom; border:none" alt="e-pub" src="IMG/qrcode-compsys-epub.png"/>
                </a>
              </td>
            </tr>
            <tr>
              <td/>
              <td>PDF
</td>
              <td>e-Pub
</td>
            </tr>
          </table>
        </div>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid17.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid36.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
      <div id="textepage">
        <!--DEBUT2 du corps du module-->
        <h2>Section: 
      Research Program</h2>
        <h3 class="titre3">Architecture and Compilation Trends</h3>
        <p>The embedded system design community is facing two challenges:</p>
        <ul>
          <li>
            <p class="notaparagraph"><a name="uid22"> </a>The complexity of embedded applications is increasing at a rapid rate.</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid23"> </a>The needed increase in processing power is no longer obtained by
increases in the clock frequency, but by increased parallelism.</p>
          </li>
        </ul>
        <p>While, in the past, each type of embedded application was implemented in a
separate appliance, the present tendency is toward a universal hand-held
object, which must serve as a cell-phone, as a personal digital assistant, as a
game console, as a camera, as a Web access point, and much more. One may say
that embedded applications are of the same level of complexity as those running
on a PC, but they must use a more constrained platform in terms of processing
power, memory size, and energy consumption. Furthermore, most of them depend
on international standards (e.g., in the field of radio digital communication),
which are evolving rapidly. Lastly, since ease of use is at a premium for
portable devices, these applications must be integrated seamlessly to a degree
that is unheard of in standard computers.</p>
        <p>All of this dictates that modern embedded systems retain some form of
programmability. For increased designer productivity and reduced
time-to-market, programming must be done in some high-level language, with
appropriate tools for compilation, run-time support, and debugging. This does
not mean however that all embedded systems (or all of an embedded system) must
be processor based. Another solution is the use of field programmable gate
arrays (FPGA), which may be programmed at a much finer grain than a processor,
although the process of FPGA “programming” is less well understood than
software generation. Processors are better than application-specific circuits
at handling complicated control and unexpected events. On the other hand,
FPGAs may be tailored to just meet the needs of their application, resulting in
better energy and silicon area usage. It is expected that most embedded
systems will use a combination of general-purpose processors, specific
processors like DSPs, and FPGA accelerators (or even low-power GPUs).
Such a combination DSP+FPGA is already present in recent versions of the Atom
Intel processor.</p>
        <p>As a consequence, parallel programming, which has long been confined to the
high-performance community, must become the common place rather than the
exception. In the same way that sequential programming moved from assembly code
to high-level languages at the price of a slight loss in performance, parallel
programming must move from low-level tools, like OpenMP or even MPI, to
higher-level programming environments. While fully-automatic parallelization
is a Holy Grail that will probably never be reached in our lifetimes, it will
remain as a component in a comprehensive environment, including general-purpose
parallel programming languages, domain-specific parallelizers, parallel
libraries and run-time systems, back-end compilation, dynamic parallelization.
The landscape of embedded systems is indeed very diverse and many design flows
and code optimization techniques must be considered. For example, embedded
processors (micro-controllers, DSP, VLIW) require powerful back-end
optimizations that can take into account hardware specificities, such as
special instructions and particular organizations of registers and memories.
FPGA and hardware accelerators, to be used as small components in a larger
embedded platform, require “hardware compilation”, i.e., design flows and
code generation mechanisms to generate non-programmable circuits. For the
design of a complete system-on-chip platform, architecture models, simulators,
debuggers are required. The same is true for multicores of any kind, GPGPU
(“general-purpose” graphical processing units), CGRA (coarse-grain
reconfigurable architectures), which require specific methodologies and
optimizations, although all these techniques converge or have connections. In
other words, embedded systems need all usual aspects of the process that
transforms some specification down to an executable, software or hardware. In
this wide range of topics, Compsys concentrates on the code optimizations
aspects (and the associated analysis) in this transformation chain, restricting
to compilation (transforming a program to a program) for embedded processors
and programmable accelerators, and to high-level synthesis (transforming a
program into a circuit description) for FPGAs.</p>
        <p>Actually, it is not a surprise to see compilation and high-level synthesis
getting closer (in the last 10 years now). Now that high-level synthesis has
grown up sufficiently to be able to rely on place-and-route tools, or even to
synthesize C-like languages, standard techniques for back-end code generation
(register allocation, instruction selection, instruction scheduling, software
pipelining) are used in HLS tools. At the higher level, programming languages
for programmable parallel platforms share many aspects with high-level
specification languages for HLS, for example, the description and manipulations
of nested loops, or the model of computation/communication (e.g., Kahn process
networks and its many “streaming” variants). In all aspects, the frontier
between software and hardware is vanishing. For example, in terms of
architecture, customized processors (with processor extension as first proposed
by Tensilica) share features with both general-purpose processors and hardware
accelerators. FPGAs are both hardware and software as they are fed with
“programs” representing their hardware configurations.</p>
        <p>In other words, this convergence in code optimizations explains why Compsys
studies both program compilation and high-level synthesis, and at both
front-end and back-end levels, the first one acting more at the granularity of
memories, transfers, and multiple cores, the second one more at the granularity
of registers, system calls, and single core. Both levels must be considered as
they interact with each other. Front-end optimizations must be aware of what
back-end optimizations will do, as single core performance remain the basis for
good parallel performances. Some front-end optimizations even act directly on
back-end features, for example register tiling considered as a source-level
transformation. Also, from a conceptual point of view, the polyhedral
techniques developed by Compsys are actually the symbolic front-end
counterpart, for structured loops, of back-end analysis and optimizations of
unstructured programs (through control-flow graphs), such as dependence
analysis, scheduling, lifetime analysis, register allocation, etc. A strength
of Compsys so far was to juggle with both aspects, one more on graph theory
with SSA-type optimizations, the other with polyhedra representing loops, and
to exploit the correspondence between both. This has still to be exploited, for
applying polyhedral techniques to more irregular programs.</p>
        <p>Besides, Compsys has a tradition of building free software tools for linear
programming and optimization in general, and will continue it, as needed for
our current research.</p>
        <a name="uid24"/>
        <h4 class="titre4">Compilation and Languages Issues in the Context of Embedded
Processors, “Embedded Systems”, and Programmable Accelerators</h4>
        <p>Compilation is an old activity, in particular back-end code optimizations. The development of embedded systems was one of the reasons for the revival of compilation activities as
a research topic.
Applications for embedded computing systems generate complex programs and need
more and more processing power. This evolution is driven, among others, by the
increasing impact of digital television, the first instances of UMTS
networks, and the increasing size of digital supports, like recordable DVD,
and even Internet applications. Furthermore, standards are evolving very
rapidly (see for instance the successive versions of MPEG). As a consequence,
the industry has focused on programmable structures, whose flexibility more
than compensates for their larger size and power consumption. The appliance
provider has a choice between hard-wired structures (Asic), special-purpose
processors (Asip), (quasi) general-purpose processors (DSP for multimedia
applications), and now hardware accelerators (dedicated platforms – such as
those developed by Thales or the CEA –, or more general-purpose accelerators
such as GPUs or even multicores, even if these are closer to small HPC
platforms than truly embedded systems). Our cooperation with STMicroelectronics, until
2012, focused on investigating the compilation for specialized processors, such
as the ST100 (DSP processor) and the ST200 (VLIW DSP processor)
family. Even for this restricted class of processors, the diversity is large,
and the potential for instruction level parallelism (SIMD, MMX), the limited
number of registers and the small size of the memory, the use of direct-mapped
instruction caches, of predication, generate many open problems. Our goal was
to contribute to their understanding and their solutions.</p>
        <p>An important concept to cope with the diversity of platforms is the concept of
<i>virtualization</i>, which is a key for more portability, more simplicity,
more reliability, and of course more security. This concept – implemented at
low level through binary translation and just-in-time (JIT)
compilation (<i>Aggressive compilation</i> consists in allowing more
time to implement more complete and costly solutions: the compiled program is
loaded in permanent memory (ROM, flash, etc.) and its compilation time is
less relevant than the execution time, size, and energy consumption of the
produced code, which can have a critical impact on the cost and quality of
the final product. Hence, the application is cross-compiled, i.e., compiled
on a powerful platform distinct from the target processor. <i>Just-in-time
compilation</i>, on the other hand, corresponds to compiling applets on demand
on the target processor. For compatibility and compactness, the source
languages are CIL or Java bytecode. The code can be uploaded or sold
separately on a flash memory. Compilation is performed at load time and even
dynamically during execution. The optimization heuristics, constrained by
time and limited resources, are far from being aggressive. They must be fast
but smart enough.) – consists in hiding the architecture-dependent features
as long as possible during the compilation process. It has been used for a
while for servers such as HotSpot, a bit more recently for workstations, and
now for embedded computing. The same needs drive the development of
intermediate languages such as OpenCL to, not necessarily hide, but at least
make more uniform, the different facets of the underlying architectures. The
challenge is then to design and compile high-productivity and high-performance
languages (For examples of such languages, see the keynotes event we
organized in 2013: <a href="http://labexcompilation.ens-lyon.fr/hpc-languages">http://labexcompilation.ens-lyon.fr/hpc-languages</a> .)
(coping with parallelism and heterogeneity) that can be ported to such
intermediate languages, or to architecture-dependent runtime systems. The
offloading of computation kernels, through source-to-source compilation,
targeting back-end C dialects, has the same goals: to automate application
porting to the variety of accelerators.</p>
        <p>For JIT compilation, the compactness of the information representation, and
thus its pertinence, is an important criterion for such late compilation
phases. Indeed, the intermediate representation (IR) is evolving not only from
a target-independent description to a target-dependent one, but also from a
situation where the compilation time is almost unlimited (cross-compilation) to
one where any type of resource is limited. This is one of the reasons why
static single assignment (SSA), a sparse compact representation of liveness
information, became popular in embedded compilation.
If time constraints are
common to all JIT compilers (not only for embedded computing), the benefit of
using SSA is also in terms of its good ratio pertinence/storage of information.
It also enables to simplify algorithms, which is also important for increasing
the reliability of the compiler.
In this context, our aim has been, in particular, to develop exact or heuristic
solutions to <i>combinatorial</i> problems that arise in compilation for VLIW
and DSP processors, and to integrate these methods into industrial compilers
for DSP processors (mainly ST100, ST200, Strong ARM). Such combinatorial
problems can be found in register allocation, opcode selection, code placement,
when removing the SSA multiplexer functions (known as <span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>φ</mi></math></span> functions).
These optimizations are usually done
in the last phases of the compiler, using an assembly-level intermediate
representation.
As mentioned in Sections <a title="Summary of Compsys I Achievements" href="./uid11.html">
	2.3</a> 
and <a title="Quick View of Compsys II&#10;Achievements and Directions for Compsys III" href="./uid17.html">
	2.4</a> , we made a lot of progress
in this area in our past collaborations with STMicroelectronics (see also previous
activity reports). Through the Sceptre and Mediacom projects, we first
revisited, in the light of SSA, some code optimizations in an aggressive
context, to develop better strategies, without eliminating too quickly
solutions that may have been considered as too expensive in the past. Then
we exploited the new concepts introduced in the aggressive context to
design better algorithms in a JIT context, focusing on the speed of
algorithms and their memory footprint, without compromising too much on the
quality of the generated code.</p>
        <p>Our research directions are currently more focused on programmable accelerators,
such as GPU and multicores, but still considering <i>static</i> compilation
and without forgetting the link between high-level (in general at source-code level) and
low-level (i.e., at assembly-code level) optimizations. They concern program
analysis (of both sequential and parallel specifications), program
optimizations (for memory hierarchies, parallelism, streaming, etc.), and
also the link with applications and between compilers and users
(programmers). Polyhedral techniques play an important role in these
directions, even if control-flow-based techniques remain in the background and
may come back at any time in the foreground. This is also the case for
high-level synthesis, as exposed in the next section.</p>
        <a name="uid27"/>
        <h4 class="titre4">Context of High-Level Synthesis and FPGA Platforms</h4>
        <p>High-level synthesis has become a necessity, mainly because the exponential
increase in the number of gates per chip far outstrips the productivity of
human designers. Besides, applications that need hardware accelerators usually
belong to domains, like telecommunications and game platforms, where fast
turn-around and time-to-market minimization are paramount. When Compsys
started, we were convinced that our expertise in compilation and automatic
parallelization could contribute to the development of the needed tools.</p>
        <p>Today, synthesis tools for FPGAs or ASICs come in many shapes. At the lowest
level, there are proprietary Boolean, layout, and place-and-route tools, whose
input is a VHDL or Verilog specification at the structural or register-transfer
level (RTL). Direct use of these tools is difficult, for several reasons:</p>
        <ul>
          <li>
            <p class="notaparagraph"><a name="uid28"> </a>A structural description is completely different from an usual
algorithmic language description, as it is written in term of interconnected
basic operators. One may say that it has a spatial orientation, in place of
the familiar temporal orientation of algorithmic languages.</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid29"> </a>The basic operators are extracted from a library, which poses problems of
selection, similar to the instruction selection problem in ordinary
compilation.</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid30"> </a>Since there is no accepted standard for VHDL synthesis, each tool has its
own idiosyncrasies and reports its results in a different format. This makes
it difficult to build portable HLS tools.</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid31"> </a>HLS tools have trouble handling loops. This is particularly true for
logic synthesis systems, where loops are systematically unrolled (or
considered as sequential) before synthesis. An efficient treatment of loops
needs the polyhedral model. This is where past results from the automatic
parallelization community are useful.</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid32"> </a>More generally, a VHDL specification is too low level to allow the
designer to perform, easily, higher-level code optimizations, especially on
multi-dimensional loops and arrays, which are of paramount importance to
exploit parallelism, pipelining, and perform communication and memory
optimizations.</p>
          </li>
        </ul>
        <p>Some intermediate tools were proposed that generate VHDL from a specification in
restricted C, both in academia (such as SPARK,
Gaut,
UGH,
CloogVHDL),
and in industry (such as C2H,
CatapultC,
Pico-Express, Vivado HLS).
All these tools use only the most elementary form of parallelization,
equivalent to instruction-level parallelism in ordinary compilers, with some
limited form of block pipelining, and communication through FIFOs. Targeting
one of these tools for low-level code generation, while we concentrate on
exploiting loop parallelism, might be a more fruitful approach than directly
generating VHDL. However, it may be that the restrictions they impose
preclude efficient use of the underlying hardware.
Our first experiments with these HLS tools reveal two important issues.
First, they are, of course, limited to certain types of input programs so as
to make their design flows successful, even if, over the years, they become
more and more mature. But it remains a painful and tricky task for the user to
transform the program so that it fits these constraints and to tune it to get
good results. Automatic or semi-automatic program transformations can help
the user achieve this task. Second, users, even expert users, have only a very
limited understanding of what back-end compilers do and why they do not lead
to the expected results. An effort must be done to analyze the different
design flows of HLS tools, to explain what to expect from them, and how to use
them to get a good quality of results. Our first goal is thus to develop
high-level techniques that, used in front of existing HLS tools, improve their
utilization. This should also give us directions on how to modify them or to
design new tools from scratch.</p>
        <p>More generally, we want to consider HLS as a more global parallelization
process. So far, no HLS tools is capable of generating designs with
communicating <i>parallel</i> accelerators, even if, in theory, at least for
the scheduling part, a tool such as Pico-Express could have such
capabilities. The reason is that it is, for example, very hard to
automatically design parallel memories and to decide the distribution of array
elements in memory banks to get the desired performances with parallel
accesses. Also, how to express communicating processes at the language level?
How to express constraints, pipeline behavior, communication media, etc.? To
better exploit parallelism, a first solution is to extend the source language
with parallel constructs, as in all derivations of the Kahn process networks
model, including communicating regular processes (CRP, see later). The other
solution is a form of automatic parallelization. However, classical methods,
which are mostly based on scheduling, need to be revisited, to pay more
attention to locality, process streaming, and low-level pipelining, which are
of paramount importance in hardware. Besides, classical methods mostly rely
on the runtime system to tailor the parallelism degree to the available
resources. Obviously, there is no runtime system in hardware. The real
challenge is thus to invent new scheduling algorithms that take resource,
locality, and pipelining into account, and then to infer the necessary
hardware from the schedule. This is probably possible only for programs that
fit into the polyhedral model, or in an incrementally-extended model.</p>
        <p>Our research activities on polyhedral code analysis and optimizations directly
target these HLS challenges. But they are not limited to the automatic
generation of hardware as can be seen from our different contributions on X10,
OpenStream, parametric tiling, etc. The same underlying concepts also arise
when optimizing codes for GPUs and multicores. In this context of polyhedral
analysis and optimizations, we will focus on three aspects:</p>
        <ul>
          <li>
            <p class="notaparagraph"><a name="uid33"> </a>developing high-level transformations, especially for loops and
memory/communication optimizations, that can be used in front of HLS tools so
as to improve their use, as well as for hardware accelerators;</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid34"> </a>developing concepts and techniques in a more global view of high-level
synthesis and high-level parallel programming, starting from specification
languages down to hardware implementation;</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid35"> </a>developing more general code analysis so as to extract more information
from codes as well as to extend the programs that can be handled.</p>
          </li>
        </ul>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid17.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid36.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
    </div>
  </body>
</html>
