<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN" "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
    <title>Project-Team:PACAP</title>
    <link rel="stylesheet" href="../static/css/raweb.css" type="text/css"/>
    <meta name="description" content="Research Program - Research Objectives"/>
    <meta name="dc.title" content="Research Program - Research Objectives"/>
    <meta name="dc.subject" content=""/>
    <meta name="dc.publisher" content="INRIA"/>
    <meta name="dc.date" content="(SCHEME=ISO8601) 2018-01"/>
    <meta name="dc.type" content="Report"/>
    <meta name="dc.language" content="(SCHEME=ISO639-1) en"/>
    <meta name="projet" content="PACAP"/>
    <script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML">
      <!--MathJax-->
    </script>
  </head>
  <body>
    <div class="tdmdiv">
      <div class="logo">
        <a href="http://www.inria.fr">
          <img style="align:bottom; border:none" src="../static/img/icons/logo_INRIA-coul.jpg" alt="Inria"/>
        </a>
      </div>
      <div class="TdmEntry">
        <div class="tdmentete">
          <a href="uid0.html">Project-Team Pacap</a>
        </div>
        <span>
          <a href="uid1.html">Team, Visitors, External Collaborators</a>
        </span>
      </div>
      <div class="TdmEntry">
        <a href="./uid3.html">Overall Objectives</a>
      </div>
      <div class="TdmEntry">Research Program<ul><li><a href="uid22.html&#10;&#9;&#9;  ">Motivation</a></li><li class="tdmActPage"><a href="uid26.html&#10;&#9;&#9;  ">Research Objectives</a></li></ul></div>
      <div class="TdmEntry">Application Domains<ul><li><a href="uid44.html&#10;&#9;&#9;  ">Any computer usage</a></li></ul></div>
      <div class="TdmEntry">
        <a href="./uid46.html">Highlights of the Year</a>
      </div>
      <div class="TdmEntry">New Software and Platforms<ul><li><a href="uid49.html&#10;&#9;&#9;  ">ATMI</a></li><li><a href="uid53.html&#10;&#9;&#9;  ">HEPTANE</a></li><li><a href="uid58.html&#10;&#9;&#9;  ">tiptop</a></li><li><a href="uid62.html&#10;&#9;&#9;  ">PADRONE</a></li><li><a href="uid66.html&#10;&#9;&#9;  ">If-memo</a></li><li><a href="uid70.html&#10;&#9;&#9;  ">Simty</a></li><li><a href="uid74.html&#10;&#9;&#9;  ">Barra</a></li><li><a href="uid79.html&#10;&#9;&#9;  ">Memoization</a></li><li><a href="uid82.html&#10;&#9;&#9;  ">FiPlib</a></li><li><a href="uid85.html&#10;&#9;&#9;  ">sigmask</a></li></ul></div>
      <div class="TdmEntry">New Results<ul><li><a href="uid89.html&#10;&#9;&#9;  ">Compilation and Optimization</a></li><li><a href="uid103.html&#10;&#9;&#9;  ">Processor Architecture</a></li><li><a href="uid110.html&#10;&#9;&#9;  ">WCET estimation and optimization</a></li><li><a href="uid114.html&#10;&#9;&#9;  ">Security</a></li></ul></div>
      <div class="TdmEntry">Bilateral Contracts and Grants with Industry<ul><li><a href="uid118.html&#10;&#9;&#9;  ">Bilateral Grants with Industry</a></li></ul></div>
      <div class="TdmEntry">Partnerships and Cooperations<ul><li><a href="uid121.html&#10;&#9;&#9;  ">Regional Initiatives</a></li><li><a href="uid122.html&#10;&#9;&#9;  ">National Initiatives</a></li><li><a href="uid130.html&#10;&#9;&#9;  ">European Initiatives</a></li><li><a href="uid185.html&#10;&#9;&#9;  ">International Initiatives</a></li><li><a href="uid199.html&#10;&#9;&#9;  ">International Research Visitors</a></li></ul></div>
      <div class="TdmEntry">Dissemination<ul><li><a href="uid206.html&#10;&#9;&#9;  ">Promoting Scientific Activities</a></li><li><a href="uid229.html&#10;&#9;&#9;  ">Teaching - Supervision - Juries</a></li><li><a href="uid262.html&#10;&#9;&#9;  ">Popularization</a></li></ul></div>
      <div class="TdmEntry">
        <div>Bibliography</div>
      </div>
      <div class="TdmEntry">
        <ul>
          <li>
            <a id="tdmbibentmajor" href="bibliography.html">Major publications</a>
          </li>
          <li>
            <a id="tdmbibentyear" href="bibliography.html#year">Publications of the year</a>
          </li>
          <li>
            <a id="tdmbibentfoot" href="bibliography.html#References">References in notes</a>
          </li>
        </ul>
      </div>
    </div>
    <div id="main">
      <div class="mainentete">
        <div id="head_agauche">
          <small><a href="http://www.inria.fr">
	    
	    Inria
	  </a> | <a href="../index.html">
	    
	    Raweb 
	    2018</a> | <a href="http://www.inria.fr/en/teams/pacap">Presentation of the Project-Team PACAP</a> | <a href="https://team.inria.fr/pacap/en">PACAP Web Site
	  </a></small>
        </div>
        <div id="head_adroite">
          <table class="qrcode">
            <tr>
              <td>
                <a href="pacap.xml">
                  <img style="align:bottom; border:none" alt="XML" src="../static/img/icons/xml_motif.png"/>
                </a>
              </td>
              <td>
                <a href="pacap.pdf">
                  <img style="align:bottom; border:none" alt="PDF" src="IMG/qrcode-pacap-pdf.png"/>
                </a>
              </td>
              <td>
                <a href="../pacap/pacap.epub">
                  <img style="align:bottom; border:none" alt="e-pub" src="IMG/qrcode-pacap-epub.png"/>
                </a>
              </td>
            </tr>
            <tr>
              <td/>
              <td>PDF
</td>
              <td>e-Pub
</td>
            </tr>
          </table>
        </div>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid22.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid44.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
      <div id="textepage">
        <!--DEBUT2 du corps du module-->
        <h2>Section: 
      Research Program</h2>
        <h3 class="titre3">Research Objectives</h3>
        <p>Processor micro-architecture and compilation have been at the core of
the research carried by the members of the project teams for two
decades, with undeniable contributions. They continue to be the
foundation of PACAP.</p>
        <p>Heterogeneity and diversity of processor architectures now require new
techniques to guarantee that the hardware is satisfactorily exploited
by the software. One of our goals is to devise new static compilation
techniques (cf. Section <a title="Research Objectives" href="./uid26.html#uid28">3.2.1</a>), but also build upon
iterative <a href="./bibliography.html#pacap-2018-bid1">[1]</a> and split <a href="./bibliography.html#pacap-2018-bid2">[2]</a>
compilation to continuously adapt software to its environment (Section
<a title="Research Objectives" href="./uid26.html#uid29">3.2.2</a>). Dynamic binary optimization will also play a key
role in delivering adapting software and increased performance.</p>
        <p>The end of Moore's law and Dennard's scaling  (According to
Dennard scaling, as transistors get smaller the power density
remains constant, and the consumed power remains proportional to the
area.) offer an exciting window of opportunity, where performance
improvements will no longer derive from additional transistor budget
or increased clock frequency, but rather come from breakthroughs in
micro-architecture (Section <a title="Research Objectives" href="./uid26.html#uid30">3.2.3</a>). Reconciling CPU and GPU
designs (Section <a title="Research Objectives" href="./uid26.html#uid34">3.2.4</a>) is one of our objectives.</p>
        <p>Heterogeneity and multicores are also major obstacles to determining
tight worst-case execution times of real-time systems (Section
<a title="Research Objectives" href="./uid26.html#uid35">3.2.5</a>), which we plan to tackle.</p>
        <p>Finally, we also describe how we plan to address transversal aspects
such as reliability (Section <a title="Research Objectives" href="./uid26.html#uid40">3.2.6</a>), power efficiency
(Section <a title="Research Objectives" href="./uid26.html#uid41">3.2.7</a>), and security (Section <a title="Research Objectives" href="./uid26.html#uid42">3.2.8</a>).</p>
        <a name="uid28"/>
        <h4 class="titre4">Static Compilation</h4>
        <p>Static compilation techniques continue to be relevant in addressing the
characteristics of emerging hardware technologies, such as
non-volatile memories, 3D-stacking, or novel communication
technologies. These techniques expose new characteristics to the
software layers. As an example, non-volatile memories typically have
asymmetric read-write latencies (writes are much longer than reads)
and different power consumption profiles. PACAP studies new
optimization opportunities and develops tailored compilation techniques
for upcoming compute nodes. New technologies may also be coupled
with traditional solutions to offer new trade-offs. We study how
programs can adequately exploit the specific features of the proposed
heterogeneous compute nodes.</p>
        <p>We propose to build upon iterative compilation <a href="./bibliography.html#pacap-2018-bid1">[1]</a>
to explore how applications perform on different configurations. When
possible, Pareto points are related to application
characteristics. The best configuration, however, may actually depend
on runtime information, such as input data, dynamic events, or
properties that are available only at runtime. Unfortunately a runtime
system has little time and means to determine the best
configuration. For these reasons, we also leverage split-compilation
<a href="./bibliography.html#pacap-2018-bid2">[2]</a>: the idea consists in pre-computing alternatives,
and embedding in the program enough information to assist and drive a
runtime system towards to the best solution.</p>
        <a name="uid29"/>
        <h4 class="titre4">Software Adaptation</h4>
        <p>More than ever, software needs to adapt to its environment. In most
cases, this environment remains unknown until runtime. This is
already the case when one deploys an application to a cloud, or an
“app” to mobile devices. The dilemma is the following: for maximum
portability, developers should target the most general device; but for
performance they would like to exploit the most recent and advanced
hardware features. JIT compilers can handle the situation to some
extent, but binary deployment requires dynamic binary rewriting. Our
work has shown how SIMD instructions can be upgraded from SSE to AVX transparently
<a href="./bibliography.html#pacap-2018-bid3">[3]</a>. Many more opportunities will appear with
diverse and heterogeneous processors, featuring various kinds of
accelerators.</p>
        <p>On shared hardware, the environment is also defined by other
applications competing for the same computational resources. It
becomes increasingly important to adapt to changing runtime
conditions, such as the contention of the cache memories, available
bandwidth, or hardware faults. Fortunately, optimizing at runtime is
also an opportunity, because this is the first time the program is
visible as a whole: executable and libraries (including library
versions). Optimizers may also rely on dynamic information, such as
actual input data, parameter values, etc. We have already developed a
software platform <a href="./bibliography.html#pacap-2018-bid4">[12]</a> to analyze and optimize
programs at runtime, and we started working on automatic dynamic
parallelization of sequential code, and dynamic specialization.</p>
        <p>We started addressing some of these challenges in ongoing projects
such as Nano2017 PSAIC Collaborative research program with
STMicroelectronics, as well as within the Inria Project Lab
MULTICORE. The H2020 FET HPC project ANTAREX also addresses
these challenges from the energy perspective. We further leverage our
platform and initial results to address other adaptation
opportunities. Efficient software adaptation requires expertise from
all domains tackled by PACAP, and strong interaction between all team
members is expected.</p>
        <a name="uid30"/>
        <h4 class="titre4">Research directions in uniprocessor micro-architecture</h4>
        <p>Achieving high single-thread performance remains a major challenge
even in the multicore era (Amdahl's law). The members of the PACAP
project-team have been conducting research in uniprocessor
micro-architecture research for about 20 years covering major topics
including caches, instruction front-end, branch prediction,
out-of-order core pipeline, and value prediction. In particular, in
recent years they have been recognized as world leaders in branch
prediction
<a href="./bibliography.html#pacap-2018-bid5">[19]</a><a href="./bibliography.html#pacap-2018-bid6">[9]</a> and in
cache prefetching <a href="./bibliography.html#pacap-2018-bid7">[7]</a> and they have
revived the forgotten concept of value prediction
<a href="./bibliography.html#pacap-2018-bid8">[11]</a><a href="./bibliography.html#pacap-2018-bid9">[10]</a>. This
research was supported by the ERC Advanced grant DAL (2011-2016) and
also by Intel. We pursue research on achieving ultimate unicore
performance. Below are several non-orthogonal directions that we have
identified for mid-term research:</p>
        <ol>
          <li>
            <p class="notaparagraph"><a name="uid31"> </a>management of the memory hierarchy (particularly the hardware
prefetching);</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid32"> </a>practical design of very wide issue execution cores;</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid33"> </a>speculative execution.</p>
          </li>
        </ol>
        <p>
          <i>Memory design issues:</i>
        </p>
        <p class="notaparagraph">Performance of many applications is highly impacted by the memory
hierarchy behavior. The interactions between the different components
in the memory hierarchy and the out-of-order execution engine have
high impact on performance.</p>
        <p>The last <i>Data Prefetching Contest</i> held with ISCA 2015 has
illustrated that achieving high prefetching efficiency is still a
challenge for wide-issue superscalar processors, particularly those
featuring a very large instruction window. The large instruction
window enables an implicit data prefetcher. The interaction between
this implicit hardware prefetcher and the explicit hardware prefetcher
is still relatively mysterious as illustrated by Pierre Michaud's BO
prefetcher (winner of DPC2) <a href="./bibliography.html#pacap-2018-bid7">[7]</a>. The first
research objective is to better understand how the implicit
prefetching enabled by the large instruction window interacts with the
L2 prefetcher and then to understand how explicit prefetching on the
L1 also interacts with the L2 prefetcher.</p>
        <p>The second research objective is related to the interaction of
prefetching and virtual/physical memory. On real hardware,
prefetching is stopped by page frontiers. The interaction between TLB
prefetching (and on which level) and cache prefetching must be
analyzed.</p>
        <p>The prefetcher is not the only actor in the hierarchy that must
be carefully controlled. Significant benefits can also be
achieved through careful management of memory access bandwidth,
particularly the management of spatial locality on memory accesses,
both for reads and writes. The exploitation of this locality
is traditionally handled in the memory controller. However, it could
be better handled if larger temporal granularity was available.
Finally, we also intend to continue to explore the promising avenue of
compressed caches. In particular we recently proposed the skewed
compressed cache <a href="./bibliography.html#pacap-2018-bid10">[13]</a>. It offers new
possibilities for efficient compression schemes.</p>
        <p>
          <i>Ultra wide-issue superscalar.</i>
        </p>
        <p class="notaparagraph">To effectively leverage memory level parallelism, one requires huge
out-of-order execution structures as well as very wide issue
superscalar processors. For the two past decades, implementing ever
wider issue superscalar processors has been challenging. The objective
of our research on the execution core is to explore (and revisit)
directions that allow the design of a very wide-issue (8-to-16 way)
out-of-order execution core while mastering its complexity (silicon
area, hardware logic complexity, power/energy consumption).</p>
        <p>The first direction that we are exploring is the use of clustered
architectures <a href="./bibliography.html#pacap-2018-bid11">[8]</a>. Symmetric clustered
organization allows to benefit from a simpler bypass network, but induce
large complexity on the issue queue. One remarkable finding of our
study <a href="./bibliography.html#pacap-2018-bid11">[8]</a> is that, when considering two
large clusters (e.g. 8-wide), steering large groups of consecutive
instructions (e.g. 64 <span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>μ</mi></math></span>ops) to the same cluster is quite
efficient. This opens opportunities to limit the complexity of the
issue queues (monitoring fewer buses) and register files (fewer ports
and physical registers) in the clusters, since not all results have to
be forwarded to the other cluster.</p>
        <p>The second direction that we are exploring is associated with the
approach that we developed with Sembrant et
al. <a href="./bibliography.html#pacap-2018-bid12">[15]</a>. It reduces the number of
instructions waiting in the instruction queues for the applications
benefiting from very large instruction windows. Instructions are
dynamically classified as ready (independent from any long latency
instruction) or non-ready, and as urgent (part of a dependency chain
leading to a long latency instruction) or non-urgent. Non-ready
non-urgent instructions can be delayed until the long latency
instruction has been executed; this allows to reduce the pressure on
the issue queue. This proposition opens the opportunity to consider
an asymmetric micro-architecture with a cluster dedicated to the
execution of urgent instructions and a second cluster executing the
non-urgent instructions. The micro-architecture of this second cluster
could be optimized to reduce complexity and power consumption (smaller
instruction queue, less aggressive scheduling...)</p>
        <p>
          <i>Speculative execution.</i>
        </p>
        <p class="notaparagraph">Out-of-order (OoO) execution relies on speculative execution that
requires predictions of all sorts: branch, memory dependency, value...</p>
        <p>The PACAP members have been major actors of branch prediction
research for the last 20 years; and their proposals have influenced
the design of most of the hardware branch predictors in current
microprocessors. We will continue to steadily explore new branch
predictor designs, as for instance <a href="./bibliography.html#pacap-2018-bid13">[17]</a>.</p>
        <p>In speculative execution, we have recently revisited value prediction
(VP) which was a hot research topic between 1996 and 2002. However it
was considered until recently that value prediction would lead to a
huge increase in complexity and power consumption in every stage of
the pipeline. Fortunately, we have recently shown that complexity
usually introduced by value prediction in the OoO engine can be
overcome
<a href="./bibliography.html#pacap-2018-bid8">[11]</a><a href="./bibliography.html#pacap-2018-bid9">[10]</a><a href="./bibliography.html#pacap-2018-bid5">[19]</a><a href="./bibliography.html#pacap-2018-bid6">[9]</a>. First,
very high accuracy can be enforced at reasonable cost in coverage and
minimal complexity <a href="./bibliography.html#pacap-2018-bid8">[11]</a>. Thus, both
prediction validation and recovery by squashing can be done outside
the out-of-order engine, at commit time. Furthermore, we propose a
new pipeline organization, EOLE ({Early | Out-of-order | Late}
Execution), that leverages VP with validation at commit to execute
many instructions outside the OoO core, in-order
<a href="./bibliography.html#pacap-2018-bid9">[10]</a>. With EOLE, the issue-width in OoO
core can be reduced without sacrificing performance, thus benefiting
the performance of VP without a significant cost in silicon area
and/or energy. In the near future, we will explore new avenues
related to value prediction. These directions include register
equality prediction and compatibility of value prediction with weak
memory models in multiprocessors.</p>
        <a name="uid34"/>
        <h4 class="titre4">Towards heterogeneous single-ISA CPU-GPU architectures</h4>
        <p>Heterogeneous single-ISA architectures have been proposed in the
literature during the 2000's  <a href="./bibliography.html#pacap-2018-bid14">[44]</a> and are now widely used in
the industry (Arm big.LITTLE, NVIDIA 4+1...) as a way to improve
power-efficiency in mobile processors. These architectures include
multiple cores whose respective micro-architectures offer different
trade-offs between performance and energy efficiency, or between
latency and throughput, while offering the same interface to software.
Dynamic task migration policies leverage the heterogeneity of the
platform by using the most suitable core for each application, or even
each phase of processing. However, these works only tune cores by
changing their complexity. Energy-optimized cores are either identical
cores implemented in a low-power process technology, or simplified
in-order superscalar cores, which are far from state-of-the-art
throughput-oriented architectures such as GPUs.</p>
        <p>We investigate the convergence of CPU and GPU at both architecture and
compiler levels.</p>
        <p>
          <i>Architecture.</i>
        </p>
        <p class="notaparagraph">The architecture convergence between Single Instruction Multiple
Threads (SIMT) GPUs and multicore processors that we have been
pursuing <a href="./bibliography.html#pacap-2018-bid15">[5]</a> opens the way for heterogeneous
architectures including latency-optimized superscalar cores and
throughput-optimized GPU-style cores, which all share the same
instruction set. Using SIMT cores in place of superscalar cores will
enable the highest energy efficiency on regular sections of
applications. As with existing single-ISA heterogeneous architectures,
task migration will not necessitate any software rewrite and will
accelerate existing applications.</p>
        <p>
          <i>Compilers for emerging heterogeneous architectures.</i>
        </p>
        <p class="notaparagraph">Single-ISA CPU+GPU architectures will provide the necessary substrate
to enable efficient heterogeneous processing. However, it will also
introduce substantial challenges at the software and firmware
level. Task placement and migration will require advanced policies
that leverage both static information at compile time and dynamic
information at run-time. We are tackling the heterogeneous task
scheduling problem at the compiler level.</p>
        <a name="uid35"/>
        <h4 class="titre4">Real-time systems</h4>
        <p>Safety-critical systems (e.g. avionics, medical devices,
automotive...) have so far used simple unicore hardware systems as a
way to control their predictability, in order to meet timing
constraints. Still, many critical embedded systems have increasing
demand in computing power, and simple unicore processors are not
sufficient anymore. General-purpose multicore processors are not
suitable for safety-critical real-time systems, because they include
complex micro-architectural elements (cache hierarchies, branch,
stride and value predictors) meant to improve average-case
performance, and for which worst-case performance is difficult to
predict. The prerequisite for calculating tight WCET is a
deterministic hardware system that avoids dynamic, time-unpredictable
calculations at run-time.</p>
        <p>Even for multi and manycore systems designed with time-predictability
in mind (Kalray MPPA manycore
architecture  (<a href="http://www.kalrayinc.com">http://www.kalrayinc.com</a>), or the Recore
manycore hardware  (<a href="http://www.recoresystems.com/">http://www.recoresystems.com/</a>))
calculating WCETs is still challenging. The following two challenges
will be addressed in the mid-term:</p>
        <ol>
          <li>
            <p class="notaparagraph"><a name="uid38"> </a>definition of methods to estimate WCETs tightly on manycores,
that smartly analyze and/or control shared resources such as buses,
NoCs or caches;</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid39"> </a>methods to improve the programmability of real-time applications
through automatic parallelization and optimizations from model-based
designs.</p>
          </li>
        </ol>
        <a name="uid40"/>
        <h4 class="titre4">Fault Tolerance</h4>
        <p>Technology trends suggest that, in tomorrow's computing world,
failures will become commonplace due to many factors, and the expected
probability of failure will increase with scaling. While well-known
approaches, such as error correcting codes, exist to recover from
failures and provide fault-free chips, the exponential growth of the
number of faults will make them unaffordable in the
future. Consequently, other approaches such as fine-grained disabling
and reconfiguration of hardware elements (e.g. individual functional
units or cache blocks) will become economically necessary. We are
going to enter a new era: functionally correct chips with variable
performance among chips and throughout their lifetime <a href="./bibliography.html#pacap-2018-bid0">[45]</a>.</p>
        <p>Transient and permanent faults may be detected by similar techniques,
but correcting them generally involves different approaches. We are
primarily interested in permanent faults, even though we do not
necessarily disregard transient faults (e.g. the TMR approach in the
next paragraph addresses both kinds of faults).</p>
        <p>
          <i>CPU.</i>
        </p>
        <p class="notaparagraph">Permanent faults can occur anywhere in the processor. The performance
implications of faulty cells vary depending on how the array is used
in a processor. Most of micro-architectural work aiming at assessing
the performance implications of permanently faulty cells relies on
simulations with random fault-maps. These studies are, therefore,
limited by the fault-maps they use that may not be representative for
the average and distributed performance. They also do not consider
aging effects.</p>
        <p>Considering the memory hierarchy, we have already studied
<a href="./bibliography.html#pacap-2018-bid16">[4]</a> the impact of permanent faults on the average and
worst-case performance based on analytical models. We will extend
these models to cover other components and other designs, and to
analyze the interaction between faulty components.</p>
        <p>For identified critical hardware structures, such as the memory
hierarchy, we will propose protection mechanisms by for instance using
larger cells, or even by selecting a different array organization to
mitigate the impact of faults.</p>
        <p>Another approach to deal with faults is to introduce redundancy at the
code level. We propose to consider static compilation techniques
focusing on existing hardware. As an example, we plan to leverage SIMD
extensions of current instruction sets to introduce redundancy in
scalar code at minimum cost. With these instructions, it will be
possible to protect the execution from both soft errors by using TMR
(triple modular redundancy) with voters in the code itself, and
permanent faults without the need of extra hardware support to
deconfigure faulty functional units.</p>
        <p>
          <i>Reconfigurable Computing.</i>
        </p>
        <p class="notaparagraph">In collaboration with the <span class="smallcap">Cairn </span> project-team, we propose to construct
Coarse Grain Reconfigurable Architectures (CGRA) from a sea of basic
arithmetic and memory elements organized into clusters and connected
through a hierarchical interconnection network. These clusters of
basic arithmetic operators (e.g. 8-bit arithmetic and logic units)
would be able to be seamlessly configured to various accuracy and data
types to adapt the consumed energy to application requirements taking
advantage of approximate computations. We propose to add new kinds of
error detection (and sometimes correction) directly at the operator
level by taking advantage of the massive redundancy of the array. As
an example, errors can be tracked and detected in a complex sequence of
double floating-point operations by using a reduced-precision
version of the same processing.</p>
        <p>Such reconfigurable blocks will be driven by compilation techniques,
in charge of computing checkpoints, detecting faults, and replaying
computations when needed.</p>
        <p>Dynamic compilation techniques will help better exploit faulty
hardware, by allocating data and computations on correct resources.
In case of permanent faults, we will provide a mechanism to
reconfigure the hardware, for example by reducing the issue width of
VLIW processors implemented in CGRA. Dynamic code generation (JIT
compiler) will re-generate code for the new configuration,
guaranteeing portability and optimal exploitation of the hardware.</p>
        <a name="uid41"/>
        <h4 class="titre4">Power efficiency</h4>
        <p>PACAP addresses power-efficiency at several levels. First, we
design static and split compilation techniques to contribute to the
race for Exascale computing (the general goal is to reach
<span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mn>10</mn><mn>18</mn></msup></math></span> FLOP/s at less than 20 MW).
Second, we focus on high-performance low-power embedded compute nodes.
Within the ANR project Continuum, in collaboration with architecture
and technology experts from LIRMM and the SME Cortus, we research new
static and dynamic compilation techniques that fully exploit emerging
memory and NoC technologies. Finally, in collaboration with the CAIRN
project-team, we investigate the synergy of reconfigurable
computing and dynamic code generation.</p>
        <p>
          <i>Green and heterogeneous high-performance computing.</i>
        </p>
        <p class="notaparagraph">Concerning HPC systems, our approach consists in mapping, runtime
managing and autotuning applications for green and heterogeneous
High-Performance Computing systems up to the Exascale level. One key
innovation of the proposed approach consists of introducing a
separation of concerns (where self-adaptivity and energy efficient
strategies are specified aside to application functionalities)
promoted by the definition of a Domain Specific Language (DSL)
inspired by aspect-oriented programming concepts for heterogeneous
systems. The new DSL will be introduced for expressing
adaptivity/energy/performance strategies and to enforce at runtime
application autotuning and resource and power management. The goal is
to support the parallelism, scalability and adaptability of a dynamic
workload by exploiting the full system capabilities (including energy
management) for emerging large-scale and extreme-scale systems, while
reducing the Total Cost of Ownership (TCO) for companies and public
organizations.</p>
        <p>
          <i>High-performance low-power embedded compute nodes.</i>
        </p>
        <p class="notaparagraph">We will address the design of next generation energy-efficient
high-performance embedded compute nodes. It focuses at the same time
on software, architecture and emerging memory and communication
technologies in order to synergistically exploit their corresponding
features. The approach of the project is organized around three
complementary topics: 1) compilation techniques; 2) multicore
architectures; 3) emerging memory and communication
technologies. PACAP will focus on the compilation aspects, taking as
input the software-visible characteristics of the proposed emerging
technology, and making the best possible use of the new features
(non-volatility, density, endurance, low-power).</p>
        <p>
          <i>Hardware Accelerated JIT Compilation.</i>
        </p>
        <p class="notaparagraph">Reconfigurable hardware offers the opportunity to limit power
consumption by dynamically adjusting the number of available resources
to the requirements of the running software. In particular, VLIW
processors can adjust the number of available issue
lanes. Unfortunately, changing the processor width often requires
recompiling the application, and VLIW processors are highly dependent
of the quality of the compilation, mainly because of the instruction
scheduling phase performed by the compiler. Another challenge lies in
the high constraints of the embedded system: the energy and execution
time overhead due to the JIT compilation must be carefully kept under
control.</p>
        <p>We started exploring ways to reduce the cost of JIT compilation
targeting VLIW-based heterogeneous manycore systems. Our approach relies
on a hardware/software JIT compiler framework. While basic
optimizations and JIT management are performed in software, the
compilation back-end is implemented by means of specialized
hardware. This back-end involves both instruction scheduling and
register allocation, which are known to be the most time-consuming
stages of such a compiler.</p>
        <a name="uid42"/>
        <h4 class="titre4">Security</h4>
        <p>Security is a mandatory concern of any modern computing
system. Various threat models have led to a multitude of protection
solutions. Members of PACAP already contributed, thanks to the HAVEGE
<a href="./bibliography.html#pacap-2018-bid17">[48]</a> random number generator, and code obfuscating
techniques (the obfuscating just-in-time compiler <a href="./bibliography.html#pacap-2018-bid18">[43]</a>, or
thread-based control flow mangling <a href="./bibliography.html#pacap-2018-bid19">[46]</a>).</p>
        <p>We partner with security experts who can provide intuition, know-how
and expertise, in particular in defining threat models, and assessing
the quality of the solutions. Our background in compilation and
architecture helps design more efficient and less expensive protection
mechanisms.</p>
        <p>We already have ongoing research directions related to security.
SECODE (Secure Codes to Thwart Cyber-physical Attacks) is a project
started in January 2016, in collaboration with security experts from
Télécom Paris Tech, Paris 8, Université Catholique de Louvain
(Belgium), and University of Sabancı (Turkey).
We also plan to partner with the Inria/CentraleSupelec CIDRE
project-team to design a tainting technique based on a just-in-time
compiler.</p>
        <p>
          <i>Compiler-based data protection.</i>
        </p>
        <p class="notaparagraph">We specify and design error correction codes suitable for an efficient
protection of sensitive information in the context of Internet of
Things (IoT) and connected objects. We partner with experts in
security and codes to prototype a platform that demonstrates resilient
software. PACAP's expertise is key to select and tune the protection
mechanisms developed within the project, and to propose safe, yet
cost-effective solutions from an implementation point of view.</p>
        <p>
          <i>JIT-based tainting.</i>
        </p>
        <p class="notaparagraph">Dynamic information flow control (DIFC, also known as <i>tainting</i>)
is used to detect intrusions and to identify vulnerabilities. It
consists in attaching metadata (called <i>taints</i> or <i>labels</i>)
to information containers, and to propagate the taints when particular
operations are applied to the containers: reads, writes, etc. The goal
is then to guarantee that confidential information is never used to
generate data sent to an untrusted container; conversely, data
produced by untrusted entities cannot be used to update sensitive
data.</p>
        <p>The containers can be of various granularities: fine-grain approaches
can deal with single variables, coarser-grain approaches consider a
file as a whole. The CIDRE project-team has developed several DIFC
monitors. kBlare is coarse-grain monitor in the Linux kernel. JBlare
is a fine-grain monitor for Java applications. Fine-grain monitors
provide a better precision at the cost of a significant overhead in
execution time.</p>
        <p>Combining the expertise of CIDRE in DIFC with our expertise in JIT
compilation will help design hybrid approaches. An initial static
analysis of the program prior to installation or execution will feed
information to a dynamic analyzer that propagates taints during
just-in-time compilation.</p>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid22.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid44.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
    </div>
  </body>
</html>
