<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN" "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
    <title>Project-Team:PACAP</title>
    <link rel="stylesheet" href="../static/css/raweb.css" type="text/css"/>
    <meta name="description" content="Research Program - Research Objectives"/>
    <meta name="dc.title" content="Research Program - Research Objectives"/>
    <meta name="dc.subject" content=""/>
    <meta name="dc.publisher" content="INRIA"/>
    <meta name="dc.date" content="(SCHEME=ISO8601) 2016-01"/>
    <meta name="dc.type" content="Report"/>
    <meta name="dc.language" content="(SCHEME=ISO639-1) en"/>
    <meta name="projet" content="PACAP"/>
    <script type="text/javascript" src="https://raweb.inria.fr/rapportsactivite/RA2016/static/MathJax/MathJax.js?config=TeX-MML-AM_CHTML">
      <!--MathJax-->
    </script>
  </head>
  <body>
    <div class="tdmdiv">
      <div class="logo">
        <a href="http://www.inria.fr">
          <img style="align:bottom; border:none" src="../static/img/icons/logo_INRIA-coul.jpg" alt="Inria"/>
        </a>
      </div>
      <div class="TdmEntry">
        <div class="tdmentete">
          <a href="uid0.html">Project-Team Pacap</a>
        </div>
        <span>
          <a href="uid1.html">Members</a>
        </span>
      </div>
      <div class="TdmEntry">
        <a href="./uid3.html">Overall Objectives</a>
      </div>
      <div class="TdmEntry">Research Program<ul><li><a href="uid22.html&#10;&#9;&#9;  ">Motivation</a></li><li class="tdmActPage"><a href="uid26.html&#10;&#9;&#9;  ">Research Objectives</a></li></ul></div>
      <div class="TdmEntry">Application Domains<ul><li><a href="uid44.html&#10;&#9;&#9;  ">Any computer usage</a></li></ul></div>
      <div class="TdmEntry">
        <a href="./uid46.html">Highlights of the Year</a>
      </div>
      <div class="TdmEntry">New Software and Platforms<ul><li><a href="uid51.html&#10;&#9;&#9;  ">ATMI</a></li><li><a href="uid55.html&#10;&#9;&#9;  ">Heptane</a></li><li><a href="uid60.html&#10;&#9;&#9;  ">Tiptop</a></li><li><a href="uid64.html&#10;&#9;&#9;  ">ATC</a></li><li><a href="uid68.html&#10;&#9;&#9;  ">Barra</a></li><li><a href="uid72.html&#10;&#9;&#9;  ">If-memo</a></li><li><a href="uid75.html&#10;&#9;&#9;  ">Padrone</a></li><li><a href="uid79.html&#10;&#9;&#9;  ">STiMuL</a></li><li><a href="uid83.html&#10;&#9;&#9;  ">TPCalc</a></li><li><a href="uid88.html&#10;&#9;&#9;  ">Parasuite</a></li><li><a href="uid89.html&#10;&#9;&#9;  ">Simty</a></li></ul></div>
      <div class="TdmEntry">New Results<ul><li><a href="uid91.html&#10;&#9;&#9;  ">Compiler, vectorization, interpretation</a></li><li><a href="uid102.html&#10;&#9;&#9;  ">Processor Architecture</a></li><li><a href="uid118.html&#10;&#9;&#9;  ">WCET estimation and optimization</a></li><li><a href="uid121.html&#10;&#9;&#9;  ">Fault Tolerance</a></li></ul></div>
      <div class="TdmEntry">Bilateral Contracts and Grants with Industry<ul><li><a href="uid124.html&#10;&#9;&#9;  ">Bilateral Contracts with Industry</a></li><li><a href="uid126.html&#10;&#9;&#9;  ">Bilateral Grants with Industry</a></li></ul></div>
      <div class="TdmEntry">Partnerships and Cooperations<ul><li><a href="uid130.html&#10;&#9;&#9;  ">National Initiatives</a></li><li><a href="uid137.html&#10;&#9;&#9;  ">European Initiatives</a></li><li><a href="uid206.html&#10;&#9;&#9;  ">International Initiatives</a></li><li><a href="uid222.html&#10;&#9;&#9;  ">International Research Visitors</a></li></ul></div>
      <div class="TdmEntry">Dissemination<ul><li><a href="uid226.html&#10;&#9;&#9;  ">Promoting Scientific Activities</a></li><li><a href="uid237.html&#10;&#9;&#9;  ">Teaching - Supervision - Juries</a></li><li><a href="uid274.html&#10;&#9;&#9;  ">Popularization</a></li></ul></div>
      <div class="TdmEntry">
        <div>Bibliography</div>
      </div>
      <div class="TdmEntry">
        <ul>
          <li>
            <a id="tdmbibentmajor" href="bibliography.html">Major publications</a>
          </li>
          <li>
            <a id="tdmbibentyear" href="bibliography.html#year">Publications of the year</a>
          </li>
          <li>
            <a id="tdmbibentfoot" href="bibliography.html#References">References in notes</a>
          </li>
        </ul>
      </div>
    </div>
    <div id="main">
      <div class="mainentete">
        <div id="head_agauche">
          <small><a href="http://www.inria.fr">
	    
	    Inria
	  </a> | <a href="../index.html">
	    
	    Raweb 
	    2016</a> | <a href="http://www.inria.fr/en/teams/pacap">Presentation of the Project-Team PACAP</a> | <a href="https://team.inria.fr/pacap/en">PACAP Web Site
	  </a></small>
        </div>
        <div id="head_adroite">
          <table class="qrcode">
            <tr>
              <td>
                <a href="pacap.xml">
                  <img style="align:bottom; border:none" alt="XML" src="../static/img/icons/xml_motif.png"/>
                </a>
              </td>
              <td>
                <a href="pacap.pdf">
                  <img style="align:bottom; border:none" alt="PDF" src="IMG/qrcode-pacap-pdf.png"/>
                </a>
              </td>
              <td>
                <a href="../pacap/pacap.epub">
                  <img style="align:bottom; border:none" alt="e-pub" src="IMG/qrcode-pacap-epub.png"/>
                </a>
              </td>
            </tr>
            <tr>
              <td/>
              <td>PDF
</td>
              <td>e-Pub
</td>
            </tr>
          </table>
        </div>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid22.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid44.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
      <div id="textepage">
        <!--DEBUT2 du corps du module-->
        <h2>Section: 
      Research Program</h2>
        <h3 class="titre3">Research Objectives</h3>
        <p>Processor micro-architecture and compilation have been at the core of
the research carried by the CAPS and ALF project teams for two
decades, with undeniable contributions. They will continue to be the
foundation of PACAP.</p>
        <p>Heterogeneity and diversity of processor architectures now require new
techniques to guarantee that the hardware is satisfactorily exploited
by the software. We will devise new static compilation techniques
(cf. Section <a title="Research Objectives" href="./uid26.html#uid28">3.2.1</a>), but also build upon iterative
<a href="./bibliography.html#pacap-2016-bid1">[1]</a> and split <a href="./bibliography.html#pacap-2016-bid2">[2]</a> compilation to continuously
adapt software to its environment (Section <a title="Research Objectives" href="./uid26.html#uid29">3.2.2</a>). Dynamic
binary optimization will also play a key role in delivering adapting
software and delivering performance.</p>
        <p>The end of Moore's law and Dennard's scaling  (According to
Dennard scaling, as transistors get smaller the power density
remains constant, and the consumed power remains proportional to the
area.) offer an exciting window of opportunity, where performance
improvements will no longer derive from additional transistor budget
or increased clock frequency, but rather come from breakthroughs in
microarchitecture (Section <a title="Research Objectives" href="./uid26.html#uid30">3.2.3</a>). We will also consider how
to reconcile CPU and GPU designs (Section <a title="Research Objectives" href="./uid26.html#uid34">3.2.4</a>).</p>
        <p>Heterogeneity and multicores are also major obstacles to determining
tight worst-case execution times of real-time systems (Section
<a title="Research Objectives" href="./uid26.html#uid35">3.2.5</a>), which we plan to tackle.</p>
        <p>Finally, we also describe how we plan to address transversal aspects
such reliability (Section <a title="Research Objectives" href="./uid26.html#uid40">3.2.6</a>), power efficiency (Section
<a title="Research Objectives" href="./uid26.html#uid41">3.2.7</a>), and security (Section <a title="Research Objectives" href="./uid26.html#uid42">3.2.8</a>).</p>
        <a name="uid28"/>
        <h4 class="titre4">Static Compilation</h4>
        <p>Static compilation techniques will continue to be relevant to address
the characteristics of emerging hardware technologies, such as
non-volatile memories, 3D-stacking, or novel communication
technologies. These techniques expose new characteristics to the
software layers. As an example, non-volatile memories typically have
asymmetric read-write latencies (writes are much longer than reads) and
different power consumption profiles. PACAP will study the new
optimization opportunities and develop tailored compilation techniques
for the upcoming compute nodes. New technologies may also be coupled
with traditional solutions to offer new trade-offs. We will study how
programs can adequately exploit the specific features of the proposed
heterogeneous compute nodes.</p>
        <p>We propose to build upon iterative compilation <a href="./bibliography.html#pacap-2016-bid1">[1]</a> to
explore how applications perform on different configurations. When
possible, Pareto points will be related to application
characteristics. The best configuration, however, may actually depend
on runtime information, such as input data, dynamic events, or
properties that are available only at runtime. Unfortunately a runtime
system has little time and means to determine the best
configuration. For these reasons, we will also leverage
split-compilation <a href="./bibliography.html#pacap-2016-bid2">[2]</a>: the idea consists in pre-computing
alternatives, and embedding in the program enough information to
assist and drive a runtime system towards to the best solution.</p>
        <a name="uid29"/>
        <h4 class="titre4">Software Adaptation</h4>
        <p>More than ever, software will need to adapt to their environment. In
most cases, this environment will remain unknown until runtime. This
is already the case when one deploys an application to a cloud, or an
“app” to mobile devices. The dilemma is the following: for maximum
portability, developers should target the most general device; but for
performance they would like to exploit the most recent and advanced
hardware features. JIT compilers can handle the situation to some
extent, but binary deployment requires dynamic binary rewriting. Our
work has shown how SIMD instructions can be upgraded from SSE to
AVX <a href="./bibliography.html#pacap-2016-bid3">[3]</a>. Many more opportunities will appear with diverse and
heterogeneous processors, featuring various kinds of accelerators.</p>
        <p>On shared hardware, the environment is also defined by other
applications competing for the same computational resources. It will
become increasingly important to adapt to changing runtime conditions,
such as the contention of the cache memories, available bandwidth, or
hardware faults. Fortunately, optimizing at runtime is also an
opportunity, because this is the first time the program is visible as
a whole: executable and libraries (including library
versions). Optimizers may also rely on dynamic information, such as
actual input data, parameter values, etc. We have already developed a
software platform <a href="./bibliography.html#pacap-2016-bid4">[14]</a> to analyze and optimize programs at
runtime, and we started working on automatic dynamic parallelization
of sequential code, and dynamic specialization.</p>
        <p>We started addressing some of these challenges in ongoing projects
such as Nano2017 PSAIC Collaborative research program with
STMicroelectronics, as well as within the Inria Project Lab
MULTICORE. The starting H2020 FET HPC project ANTAREX will also
address these challenges from the energy perspective. We will further
leverage our platform and initial results to address other adaptation
opportunities. Efficient software adaptation will require expertise
from all domains tackled by PACAP, and strong interaction between all
team members is expected.</p>
        <a name="uid30"/>
        <h4 class="titre4">Research directions in uniprocessor microarchitecture</h4>
        <p>Achieving high single-thread performance remains a major challenge
even in the multicore era (Amdahl's law). The members of the PACAP
project-team have been conducting research in uniprocessor
micro-architecture research for about 20 years covering major topics
including caches, instruction front-end, branch prediction,
out-of-order core pipeline, branch prediction and value prediction.
In particular, in the recent years they have been recognized world
leaders in branch prediction
<a href="./bibliography.html#pacap-2016-bid5">[19]</a><a href="./bibliography.html#pacap-2016-bid6">[9]</a> and in cache
prefetching <a href="./bibliography.html#pacap-2016-bid7">[7]</a> and they have revived the
forgotten concept of value prediction
<a href="./bibliography.html#pacap-2016-bid8">[12]</a>, <a href="./bibliography.html#pacap-2016-bid9">[11]</a>. This research was
supported by the ERC Advanced grant DAL (2011-2016) and also by Intel.
We intend to pursue research on achieving ultimate unicore
performance. Below are several non-orthogonal directions that we have
identified for mid-term research:</p>
        <ol>
          <li>
            <p class="notaparagraph"><a name="uid31"> </a>management of the memory hierarchy (particularly the hardware
prefetching);</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid32"> </a>practical design of very wide issue execution core;</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid33"> </a>speculative execution.</p>
          </li>
        </ol>
        <p>
          <i>Memory design issues:</i>
        </p>
        <p class="notaparagraph">Performance of many applications is highly impacted by the memory
hierarchy behavior. The interactions between the different components
in the memory hierarchy and the out-of-order execution engine have
high impact on performance.</p>
        <p>The last <i>Data Prefetching Contest</i> held with ISCA 2015 has
illustrated that achieving high prefetching efficiency is still a
challenge for wide-issue superscalar processors, particularly those
featuring a very large instruction window. The large instruction
window enables an implicit data prefetcher. The interaction between
this implicit hardware prefetcher and the explicit hardware prefetcher
is still relatively mysterious as illustrated by Pierre Michaud's BO
prefetcher (winner of DPC2) <a href="./bibliography.html#pacap-2016-bid7">[7]</a>. The first
objective of the research is to better understand how the implicit
prefetching enabled by the large instruction window interacts with the
L2 prefetcher and then to understand how explicit prefetching on the
L1 also interacts with the L2 prefetcher.</p>
        <p>The second objective of the research is related to the interaction of
prefetching and virtual/physical memory. On real hardware,
prefetching is stopped by page frontiers. The interaction between TLB
prefetching (and on which level) and cache prefetching must be
analyzed.</p>
        <p>The prefetcher is not the only actor in the hierarchy that must
be carefully controlled. Significant benefit can also be
achieved through careful management of memory access bandwidth,
particularly the management of spatial locality on memory accesses,
both for reads and writes. The exploitation of this locality
is traditionally handled in the memory controller. However, it could
be better handled if larger temporal granularity was available.
Finally, we also intend to continue to explore the promising avenue of
compressed caches. In particular we recently proposed the skewed
compressed cache <a href="./bibliography.html#pacap-2016-bid10">[15]</a>. It offers new
possibility for efficient compression schemes.</p>
        <p>
          <i>Ultra wide-issue superscalar.</i>
        </p>
        <p class="notaparagraph">To effectively leverage memory level parallelism, one requires huge
out-of-order execution structures as well as very wide issue
superscalar processor. For the two past decades, implementing always
wider issue superscalar processor has been challenging. The objective
of our research on the execution core is to explore (and revisit)
directions to allow the design of a very wide-issue (8-to-16 way)
out-of-order execution core while mastering its complexity (silicon
area, hardware logic complexity, power/energy consumption).</p>
        <p>The first direction that we intend to explore is the use of clustered
architecture as in our recent work
<a href="./bibliography.html#pacap-2016-bid11">[8]</a>. Symmetric clustered organization allows
to benefit from simpler bypass network, but induce large complexity on
the issue queue. One remarkable finding of our study
<a href="./bibliography.html#pacap-2016-bid11">[8]</a> is that, when considering two large clusters
(e.g. 8-wide) steering large groups of consecutive instructions
(e.g. 64 <span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>μ</mi></math></span>ops) to the same cluster is quite efficient. This opens
opportunities to limit the complexity of the issue queues (monitoring
less fewer buses) and register files (reducing number of ports, and
number of physical registers) in the clusters, since not all results
have to be forwarded to the other cluster.</p>
        <p>The second direction that we intend to explore is associated with the
approach we developed with Sembrant et
al. <a href="./bibliography.html#pacap-2016-bid12">[16]</a>. It reduces the number of
instructions waiting in the instruction queues for the applications
benefiting from very large instruction windows. Instructions are
dynamically classified as ready (independent from any long latency
instruction) or non-ready, and as urgent (part of a dependency chain
leading to a long latency instruction) or non-urgent. Non-ready
non-urgent instructions can be delayed until the long latency
instruction has been executed; this allows to reduce the pressure on
the issue queue. This proposition opens the opportunity to consider
an asymmetric microarchitecture with a cluster dedicated to the
execution of urgent instructions and a second cluster executing the
non-urgent instructions. The microarchitecture of this second cluster
could be optimized to reduce complexity and power consumption (smaller
instruction queue, less aggressive scheduling...)</p>
        <p>
          <i>Speculative execution.</i>
        </p>
        <p class="notaparagraph">Out-of-order (OoO) execution relies on speculative execution that
requires predictions of all sorts: branch, memory dependency, value...</p>
        <p>The PACAP members have been major actors of the branch prediction
research for the last 20 years; and their proposals have influenced
the design of most of the hardware branch predictors in current
microprocessors. We will continue to steadily explore new branch
predictor designs as for instance <a href="./bibliography.html#pacap-2016-bid13">[18]</a>.</p>
        <p>In speculative execution, we have recently revisited value prediction
(VP) which was a hot research topic between 1996 and 2002. However it
was considered up to recently that value prediction would lead to a
huge increase in complexity and power consumption in every stage of
the pipeline. Fortunately, we have recently shown that complexity
usually introduced by value prediction in the OoO engine can be
overcome
<a href="./bibliography.html#pacap-2016-bid8">[12]</a>, <a href="./bibliography.html#pacap-2016-bid9">[11]</a>, <a href="./bibliography.html#pacap-2016-bid5">[19]</a>, <a href="./bibliography.html#pacap-2016-bid6">[9]</a>. First,
very high accuracy can be enforced at reasonable cost in coverage and
minimal complexity <a href="./bibliography.html#pacap-2016-bid8">[12]</a>. Thus, both prediction
validation and recovery by squashing can be done outside the
out-of-order engine, at commit time. Furthermore, we propose a new
pipeline organization, EOLE ({Early | Out-of-order | Late}
Execution), that leverages VP with validation at commit to execute
many instructions outside the OoO core, in-order
<a href="./bibliography.html#pacap-2016-bid9">[11]</a>. With EOLE, the issue-width in OoO core
can be reduced without sacrificing performance, thus benefiting the
performance of VP without a significant cost in silicon area and/or
energy. In the near future, we will explore new avenues related to
value prediction. These directions include register equality
prediction and compatibility of value prediction with weak memory
models in multiprocessors.</p>
        <a name="uid34"/>
        <h4 class="titre4">Towards heterogeneous single-ISA CPU-GPU architectures</h4>
        <p>Heterogeneous single-ISA architectures have been proposed in the
literature during the 2000's  <a href="./bibliography.html#pacap-2016-bid14">[56]</a> and are now widely used in
the industry (ARM big.LITTLE, NVIDIA 4+1...) as a way to improve
power-efficiency in mobile processors. These architectures include
multiple cores whose respective microarchitectures offer different
trade-offs between performance and energy efficiency, or between
latency and throughput, while offering the same interface to software.
Dynamic task migration policies leverage the heterogeneity of the
platform by using the most suitable core for each application, or even
each phase of processing. However, these works only tune cores by
changing their complexity. Energy-optimized cores are either identical
cores implemented in a low-power process technology, or simplified
in-order superscalar cores, which are far from state-of-the-art
throughput-oriented architectures such as GPUs.</p>
        <p>We propose to investigate the convergence of CPU and GPU at both
architecture and compilation levels.</p>
        <p>
          <i>Architecture.</i>
        </p>
        <p class="notaparagraph">The architecture convergence between Single Instruction Multiple
Threads (SIMT) GPUs and multicore processors that we have been
pursuing <a href="./bibliography.html#pacap-2016-bid15">[36]</a> opens the way for heterogeneous
architectures including latency-optimized superscalar cores and
throughput-optimized GPU-style cores, which all share the same
instruction set. Using SIMT cores in place of superscalar cores will
enable the highest energy efficiency on regular sections of
applications. As with existing single-ISA heterogeneous architectures,
task migration will not necessitate any software rewrite and will
accelerate existing applications.</p>
        <p>
          <i>Compilers for emerging heterogeneous architectures.</i>
        </p>
        <p class="notaparagraph">Single-ISA CPU+GPU architectures will provide the necessary substrate
to enable efficient heterogeneous processing. However, it will also
introduce substantial challenges at the software and firmware
level. Task placement and migration will require advanced policies
that leverage both static information at compile time and dynamic
information at run-time. We are tackling the heterogeneous task
scheduling problem at the compiler level. As a first step, we are
prototyping scheduling algorithms on existing multiple-ISA CPU+GPU
architectures like NVIDIA Tegra X1.</p>
        <a name="uid35"/>
        <h4 class="titre4">Real-time systems</h4>
        <p>Safety-critical systems (e.g. avionics, medical devices,
automotive...) have so far used simple unicore hardware systems as a
way to control their predictability, in order to meet timing
constraints. Still, many critical embedded systems have increasing
demand in computing power, and simple unicore processors are not
sufficient anymore. General-purpose multicore processors are not
suitable for safety-critical real-time systems, because they include
complex micro-architectural elements (cache hierarchies, branch,
stride and value predictors) meant to improve average-case
performance, and for which worst-case performance is difficult to
predict. The prerequisite for calculating tight WCET is a
deterministic hardware system that avoids dynamic, time-unpredictable
calculations at run-time.</p>
        <p>Even for multi and manycore systems designed with time-predictability
in mind (Kalray MPPA manycore
architecture  (<a href="http://www.kalrayinc.com">http://www.kalrayinc.com</a>), or the Recore
manycore hardware  (<a href="http://www.recoresystems.com/">http://www.recoresystems.com/</a>))
calculating WCETs is still challenging. The following two challenges
will be addressed in the mid-term:</p>
        <ol>
          <li>
            <p class="notaparagraph"><a name="uid38"> </a>definition of methods to estimate WCETs tightly on manycores,
that smartly analyzes and/or controls shared resources such as
buses, NoCs or caches;</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid39"> </a>methods to improve the programmability of real-time applications
through automatic parallelization and optimizations from model-based
designs.</p>
          </li>
        </ol>
        <a name="uid40"/>
        <h4 class="titre4">Fault Tolerance</h4>
        <p>Technology trends suggest that, in tomorrow's computing world,
failures will become commonplace due to many factors, and the expected
probability of failure will increase with scaling. While well-known
approaches, such as error correcting codes, exist to recover from
failures and provide fault-free chips, the exponential growth of the
number of faults will make them unaffordable in the
future. Consequently, other approaches such as fine-grained disabling
and reconfiguration of hardware elements (e.g. individual functional
units or cache blocks) will become economically necessary. We are
going to enter a new era: functionally correct chips with variable
performance among chips and throughout their lifetime <a href="./bibliography.html#pacap-2016-bid0">[58]</a>.</p>
        <p>Transient and permanent faults may be detected by similar techniques,
but correcting them generally involves different approaches. We are
primarily interested in permanent faults, even though we do not
necessarily disregard transient faults (e.g. the TMR approach in the
next paragraph addresses both kind of faults).</p>
        <p>
          <i>CPU.</i>
        </p>
        <p class="notaparagraph">Permanent faults can occur anywhere in the processor. The performance
implications of faulty cells vary depending on how the array is used
in a processor. Most of micro-architectural work aiming at assessing
the performance implications of permanently faulty cells relies on
simulations with random fault-maps. These studies are, therefore,
limited by the fault-maps they use that may not be representative for
the average and distributed performance. They also do not consider
aging effect.</p>
        <p>Considering the memory hierarchy, we have already studied
<a href="./bibliography.html#pacap-2016-bid16">[5]</a> the impact of permanent faults on the average and
worst-case performance based on analytical models. We will extend
these models to cover other components and other designs, and to
analyze the interaction between faulty components.</p>
        <p>For identified critical hardware structures, such as the memory
hierarchy, we will propose protection mechanisms by for instance using
larger cells, or even by selecting a different array organization to
mitigate the impact of faults.</p>
        <p>Another approach to deal with faults is to introduce redundancy at the
code level. We propose to consider static compilation techniques
focusing on existing hardware. As an example, we plan to leverage SIMD
extensions of current instruction sets to introduce redundancy in
scalar code at minimum cost. With these instructions, it will be
possible to protect the execution from both soft errors by using TMR
(triple modular redundancy) with voters in the code itself, and
permanent faults without the need of extra hardware support to
deconfigure faulty functional units.</p>
        <p>
          <i>Reconfigurable Computing.</i>
        </p>
        <p class="notaparagraph">In collaboration with the <span class="smallcap">Cairn </span> project-team, we propose to construct
Coarse Grain Reconfigurable Architectures (CGRA) from a sea of basic
arithmetic and memory elements organized into clusters and connected
through a hierarchical interconnection network. These clusters of
basic arithmetic operators (e.g. 8-bit arithmetic and logic units)
would be able to be seamlessly configured to various accuracy and data
types to adapt the consumed energy to application requirements taking
advantage of approximate computations. We propose to add new kinds of
error detection (and sometimes correction) directly at the operator
level by taking advantage of the massive redundancy of the array. As
an example, errors can be tracked and detected in a complex sequence of
double floating-point operations by using a reduced-precision
version of the same processing.</p>
        <p>Such reconfigurable blocks will be driven by compilation techniques,
in charge of computing checkpoints, detecting faults, and replaying
computations when needed.</p>
        <p>Dynamic compilation techniques will help better exploit faulty
hardware, by allocating data and computations on correct resources.
In case of permanent faults, we will provide a mechanism to
reconfigure the hardware, for example by reducing the issue width of
VLIW processors implemented in CGRA. Dynamic code generation (JIT
compiler) will re-generate code for the new configuration,
guaranteeing portability and optimal exploitation of the hardware.</p>
        <a name="uid41"/>
        <h4 class="titre4">Power efficiency</h4>
        <p>PACAP will address power-efficiency at several levels. First, we will
design static and split compilation techniques to contribute to the
race for Exascale computing (the general goal is to reach
<span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mn>10</mn><mn>18</mn></msup></math></span> FLOP/s at less than 20 MW).
Second, we will focus on high-performance low-power embedded compute
nodes.
will research new static and dynamic compilation techniques that fully
exploit emerging memory and NoC technologies. Finally, in
collaboration with the CAIRN project-team, we will investigate the
synergy of reconfigurable computing and dynamic code generation.</p>
        <p>
          <i>Green and heterogeneous high-performance computing.</i>
        </p>
        <p class="notaparagraph">Concerning HPC systems, our approach consists in mapping, runtime
managing and autotuning applications for green and heterogeneous
High-Performance Computing systems up to the Exascale level. One key
innovation of the proposed approach consists of introducing a
separation of concerns (where self-adaptivity and energy efficient
strategies are specified aside to application functionalities)
promoted by the definition of a Domain Specific Language (DSL)
inspired by aspect-oriented programming concepts for heterogeneous
systems. The new DSL will be introduced for expressing
adaptivity/energy/performance strategies and to enforce at runtime
application autotuning and resource and power management. The goal is
to support the parallelism, scalability and adaptability of a dynamic
workload by exploiting the full system capabilities (including energy
management) for emerging large-scale and extreme-scale systems, while
reducing the Total Cost of Ownership (TCO) for companies and public
organizations.</p>
        <p>
          <i>High-performance low-power embedded compute nodes.</i>
        </p>
        <p class="notaparagraph">We will address the design of next generation energy-efficient
high-performance embedded compute nodes. It focuses at the same time
on software, architecture and emerging memory and communication
technologies in order to synergistically exploit their corresponding
features. The approach of the project is organized around three
complementary topics: 1) compilation techniques; 2) multicore
architectures; 3) emerging memory and communication
technologies. PACAP will focus on the compilation aspects, taking as
input the software-visible characteristics of the proposed emerging
technology, and making the best possible use of the new features
(non-volatility, density, endurance, low-power).</p>
        <p>
          <i>Hardware Accelerated JIT Compilation.</i>
        </p>
        <p class="notaparagraph">Reconfigurable hardware offers the opportunity to limit power
consumption by dynamically adjusting the number of available resources
to the requirements of the running software. In particular, VLIW
processors can adjust the number of available issue
lanes. Unfortunately, changing the processor width often requires
recompiling the application, and VLIW processors are highly dependent
of the quality of the compilation, mainly because of the instruction
scheduling phase performed by the compiler. Another challenge lies in
the high constraints of the embedded system: the energy and execution
time overhead due to the JIT compilation must be carefully kept under
control.</p>
        <p>We started exploring ways to reduce the cost of JIT compilation
targeting VLIW-based heterogeneous manycore systems. Our approach lies
on a hardware/software JIT compiler framework. While basic
optimizations and JIT management are performed in software, the
compilation back-end is implemented by means of specialized
hardware. This back-end involves both instruction scheduling and
register allocation, which are known to be the most time-consuming
stages of such a compiler.</p>
        <a name="uid42"/>
        <h4 class="titre4">Security</h4>
        <p>Security is a mandatory concern of any modern computing
system. Various threat models have led to a multitude of protection
solutions. ALF already has contributions, thanks to the HAVEGE
<a href="./bibliography.html#pacap-2016-bid17">[62]</a> random number generator, and code obfuscating techniques
(the obfuscating just-in-time compiler <a href="./bibliography.html#pacap-2016-bid18">[55]</a>, or thread-based
control flow mangling <a href="./bibliography.html#pacap-2016-bid19">[60]</a>).</p>
        <p>We plan to partner with security experts who can provide intuition,
know-how and expertise, in particular in defining threat models, and
assessing the quality of the solutions. Our background in compilation
and architecture will help design more efficient and less expensive
protection mechanisms.</p>
        <p>We already have ongoing research directions related to security.
We also plan to partner with the Inria/CentraleSupelec CIDRE
project-team to design a tainting technique based on a just-in-time
compiler.</p>
        <p>
          <i>Compiler-based data protection.</i>
        </p>
        <p class="notaparagraph">We will specify and design error correction codes suitable for an
efficient protection of sensitive information in the context of
Internet of Things (IoT) and connected objects. We will partner with
experts in security and codes to prototype a platform that
demonstrates resilient software. PACAP's expertise will be key to
select and tune the protection mechanisms developed within the
project, and to propose safe, yet cost-effective solutions from an
implementation point of view.</p>
        <p>
          <i>JIT-based tainting.</i>
        </p>
        <p class="notaparagraph">Dynamic information flow control (DIFC, also known as <i>tainting</i>)
is used used to detect intrusions and to identify vulnerabilities. It
consists in attaching metadata (called <i>taints</i> or <i>labels</i>)
to information containers, and to propagate the taints when particular
operations are applied to the containers: reads, writes, etc. The goal
is then to guarantee that confidential information is never used to
generate data sent to an untrusted container; conversely, data
produced by untrusted entities cannot be used to update sensitive
data.</p>
        <p>The containers can be of various granularities: fine-grain approaches
can deal with single variables, coarser-grain approaches consider a
file as a whole. The CIDRE project-team has developed several DIFC
monitors. kBlare is coarse-grain monitor in the Linux kernel. JBlare
is a fine-grain monitor for Java applications. Fine-grain monitors
provide a better precision at the cost of a significant overhead in
execution time.</p>
        <p>We propose to combine the expertise of CIDRE in DIFC with our
expertise in JIT compilation to design hybrid approaches. An initial
static analysis of the program prior to installation or execution will
feed information to a dynamic analyzer that propagates taints during
just-in-time compilation.</p>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid22.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid44.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
    </div>
  </body>
</html>
