<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN" "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
    <title>Project-Team:ROMA</title>
    <link rel="stylesheet" href="../static/css/raweb.css" type="text/css"/>
    <meta name="description" content="Research Program - Algorithms for probabilistic environments"/>
    <meta name="dc.title" content="Research Program - Algorithms for probabilistic environments"/>
    <meta name="dc.subject" content=""/>
    <meta name="dc.publisher" content="INRIA"/>
    <meta name="dc.date" content="(SCHEME=ISO8601) 2016-01"/>
    <meta name="dc.type" content="Report"/>
    <meta name="dc.language" content="(SCHEME=ISO639-1) en"/>
    <meta name="projet" content="ROMA"/>
    <script type="text/javascript" src="https://raweb.inria.fr/rapportsactivite/RA2016/static/MathJax/MathJax.js?config=TeX-MML-AM_CHTML">
      <!--MathJax-->
    </script>
  </head>
  <body>
    <div class="tdmdiv">
      <div class="logo">
        <a href="http://www.inria.fr">
          <img style="align:bottom; border:none" src="../static/img/icons/logo_INRIA-coul.jpg" alt="Inria"/>
        </a>
      </div>
      <div class="TdmEntry">
        <div class="tdmentete">
          <a href="uid0.html">Project-Team Roma</a>
        </div>
        <span>
          <a href="uid1.html">Members</a>
        </span>
      </div>
      <div class="TdmEntry">
        <a href="./uid3.html">Overall Objectives</a>
      </div>
      <div class="TdmEntry">Research Program<ul><li class="tdmActPage"><a href="uid12.html&#10;&#9;&#9;  ">Algorithms for probabilistic environments</a></li><li><a href="uid15.html&#10;&#9;&#9;  ">Platform-aware scheduling strategies</a></li><li><a href="uid18.html&#10;&#9;&#9;  ">High-performance computing and linear algebra</a></li><li><a href="uid25.html&#10;&#9;&#9;  ">Compilers, code optimization and high-level synthesis for FPGA</a></li></ul></div>
      <div class="TdmEntry">Application Domains<ul><li><a href="uid31.html&#10;&#9;&#9;  ">Applications of sparse direct solvers</a></li></ul></div>
      <div class="TdmEntry">
        <a href="./uid33.html">Highlights of the Year</a>
      </div>
      <div class="TdmEntry">New Software and Platforms<ul><li><a href="uid38.html&#10;&#9;&#9;  ">MUMPS</a></li><li><a href="uid45.html&#10;&#9;&#9;  ">DCC</a></li><li><a href="uid48.html&#10;&#9;&#9;  ">PoCo</a></li><li><a href="uid51.html&#10;&#9;&#9;  ">Aspic</a></li><li><a href="uid55.html&#10;&#9;&#9;  ">
        Termite
      </a></li><li><a href="uid59.html&#10;&#9;&#9;  ">
        Vaphor
      </a></li></ul></div>
      <div class="TdmEntry">New Results<ul><li><a href="uid64.html&#10;&#9;&#9;  ">A backward/forward recovery approach for the preconditioned conjugate gradient method </a></li><li><a href="uid65.html&#10;&#9;&#9;  ">High performance parallel algorithms for the tucker decomposition of sparse tensors</a></li><li><a href="uid66.html&#10;&#9;&#9;  ">Preconditioning techniques based on the Birkhoff–von Neumann decomposition</a></li><li><a href="uid67.html&#10;&#9;&#9;  ">Parallel CP decomposition of sparse tensors using dimension trees</a></li><li><a href="uid68.html&#10;&#9;&#9;  ">Scheduling series-parallel task graphs to minimize peak memory</a></li><li><a href="uid69.html&#10;&#9;&#9;  ">Matrix symmetrization and sparse direct solvers</a></li><li><a href="uid70.html&#10;&#9;&#9;  ">Robust Memory-Aware Mapping for Parallel Multifrontal Factorizations</a></li><li><a href="uid71.html&#10;&#9;&#9;  ">Fast 3D frequency-domain full waveform inversion with a parallel Block Low-Rank multifrontal direct solver: application to OBC data from the North Sea</a></li><li><a href="uid72.html&#10;&#9;&#9;  ">Matching-Based Allocation Strategies for Improving Data Locality of Map Tasks in MapReduce</a></li><li><a href="uid73.html&#10;&#9;&#9;  ">Minimizing Rental Cost for Multiple Recipe Applications in the Cloud</a></li><li><a href="uid74.html&#10;&#9;&#9;  ">Malleable task-graph scheduling with a practical speed-up model</a></li><li><a href="uid75.html&#10;&#9;&#9;  ">Dynamic memory-aware task-tree scheduling</a></li><li><a href="uid76.html&#10;&#9;&#9;  ">Optimal resilience patterns to cope with fail-stop and silent errors</a></li><li><a href="uid77.html&#10;&#9;&#9;  ">Two-level checkpointing and partial verifications for linear task graphs</a></li><li><a href="uid78.html&#10;&#9;&#9;  ">Resilient application co-scheduling with processor redistribution</a></li><li><a href="uid79.html&#10;&#9;&#9;  ">A different re-execution speed can help</a></li><li><a href="uid80.html&#10;&#9;&#9;  ">Coping with recall and precision of soft error detectors</a></li><li><a href="uid81.html&#10;&#9;&#9;  ">Checkpointing strategies for scheduling computational workflows</a></li><li><a href="uid82.html&#10;&#9;&#9;  ">Assessing General-Purpose Algorithms to Cope with Fail-Stop and Silent Errors</a></li><li><a href="uid83.html&#10;&#9;&#9;  ">A failure detector for HPC platforms</a></li><li><a href="uid84.html&#10;&#9;&#9;  ">Optimal multistage algorithm for adjoint computatio</a></li><li><a href="uid85.html&#10;&#9;&#9;  ">Assessing the cost of redistribution followed by a computational kernel: Complexity and performance results</a></li><li><a href="uid86.html&#10;&#9;&#9;  ">When Amdahl Meets Young/Daly</a></li><li><a href="uid87.html&#10;&#9;&#9;  ">Computing the expected makespan of task graphs in the presence of silent errors</a></li><li><a href="uid88.html&#10;&#9;&#9;  ">Toward an Optimal Online
Checkpoint Solution under a Two-Level HPC Checkpoint Model</a></li><li><a href="uid89.html&#10;&#9;&#9;  ">Cell morphing: from array programs to array-free Horn clauses</a></li><li><a href="uid90.html&#10;&#9;&#9;  ">Symbolic Analyses of pointers</a></li><li><a href="uid91.html&#10;&#9;&#9;  ">High-Level Synthesis of Pipelined FSM from Loop Nests</a></li><li><a href="uid92.html&#10;&#9;&#9;  ">Estimation of Parallel Complexity with Rewriting Techniques</a></li></ul></div>
      <div class="TdmEntry">Bilateral Contracts and Grants with Industry<ul><li><a href="uid94.html&#10;&#9;&#9;  ">Bilateral Contracts with Industry</a></li><li><a href="uid103.html&#10;&#9;&#9;  ">Technological Transfer: XtremLogic Start-Up</a></li></ul></div>
      <div class="TdmEntry">Partnerships and Cooperations<ul><li><a href="uid105.html&#10;&#9;&#9;  ">Regional Initiatives</a></li><li><a href="uid107.html&#10;&#9;&#9;  ">National Initiatives</a></li><li><a href="uid110.html&#10;&#9;&#9;  ">European Initiatives</a></li><li><a href="uid125.html&#10;&#9;&#9;  ">International Initiatives</a></li><li><a href="uid147.html&#10;&#9;&#9;  ">International Research Visitors</a></li></ul></div>
      <div class="TdmEntry">Dissemination<ul><li><a href="uid157.html&#10;&#9;&#9;  ">Promoting Scientific Activities</a></li><li><a href="uid172.html&#10;&#9;&#9;  ">Teaching - Supervision - Juries</a></li><li><a href="uid208.html&#10;&#9;&#9;  ">Popularization</a></li></ul></div>
      <div class="TdmEntry">
        <div>Bibliography</div>
      </div>
      <div class="TdmEntry">
        <ul>
          <li>
            <a id="tdmbibentyear" href="bibliography.html">Publications of the year</a>
          </li>
          <li>
            <a id="tdmbibentfoot" href="bibliography.html#References">References in notes</a>
          </li>
        </ul>
      </div>
    </div>
    <div id="main">
      <div class="mainentete">
        <div id="head_agauche">
          <small><a href="http://www.inria.fr">
	    
	    Inria
	  </a> | <a href="../index.html">
	    
	    Raweb 
	    2016</a> | <a href="http://www.inria.fr/en/teams/roma">Presentation of the Project-Team ROMA</a> | <a href="http://www.ens-lyon.fr/LIP/ROMA/">ROMA Web Site
	  </a></small>
        </div>
        <div id="head_adroite">
          <table class="qrcode">
            <tr>
              <td>
                <a href="roma.xml">
                  <img style="align:bottom; border:none" alt="XML" src="../static/img/icons/xml_motif.png"/>
                </a>
              </td>
              <td>
                <a href="roma.pdf">
                  <img style="align:bottom; border:none" alt="PDF" src="IMG/qrcode-roma-pdf.png"/>
                </a>
              </td>
              <td>
                <a href="../roma/roma.epub">
                  <img style="align:bottom; border:none" alt="e-pub" src="IMG/qrcode-roma-epub.png"/>
                </a>
              </td>
            </tr>
            <tr>
              <td/>
              <td>PDF
</td>
              <td>e-Pub
</td>
            </tr>
          </table>
        </div>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid3.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid15.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
      <div id="textepage">
        <!--DEBUT2 du corps du module-->
        <h2>Section: 
      Research Program</h2>
        <h3 class="titre3">Algorithms for probabilistic environments</h3>
        <p>There are two main research directions under this research theme. In
the first one, we consider the problem of the efficient execution of
applications in a failure-prone environment. Here, probability
distributions are used to describe the potential behavior of computing
platforms, namely when hardware components are subject to faults. In
the second research direction, probability distributions are used to
describe the characteristics and behavior of applications.</p>
        <a name="uid13"/>
        <h4 class="titre4">Application resilience</h4>
        <p>An application is resilient if it can successfully produce a correct
result in spite of potential faults in the underlying
system. Application resilience can involve a broad range of
techniques, including fault prediction, error detection, error
containment, error correction, checkpointing, replication, migration,
recovery, etc. Faults
are quite frequent in the most powerful existing supercomputers. The
Jaguar platform, which ranked third in the TOP 500 list in November
2011 <a href="./bibliography.html#roma-2016-bid2">[59]</a>, had an average of 2.33 faults per day during
the period from August 2008 to February 2010 <a href="./bibliography.html#roma-2016-bid3">[83]</a>. The
mean-time between faults of a platform is inversely proportional to its
number of components. Progresses will certainly be made in the coming
years with respect to the reliability of individual components.
However, designing and building high-reliability hardware components
is far more expensive than using lower reliability top-of-the-shelf
components. Furthermore, low-power components may not be available
with high-reliability. Therefore, it is feared that the progresses in
reliability will far from compensate the steady projected increase of
the number of components in the largest supercomputers. Already,
application failures have a huge computational cost. In 2008, the DARPA white
paper on “System resilience at extreme
scale” <a href="./bibliography.html#roma-2016-bid4">[58]</a> stated that high-end systems wasted
20% of their computing capacity on application failure and recovery.</p>
        <p>In such a context, any application using a significant fraction of a
supercomputer and running for a significant amount of time will have
to use some fault-tolerance solution. It would indeed be unacceptable
for an application failure to destroy centuries of CPU-time (some of the simulations
run on the Blue Waters platform consumed more than 2,700 years
of core computing time <a href="./bibliography.html#roma-2016-bid5">[54]</a> and lasted over 60
hours; the most time-consuming simulations of the US Department of
Energy (DoE) run for weeks to months on the most powerful existing
platforms <a href="./bibliography.html#roma-2016-bid6">[57]</a>).</p>
        <p>Our research on resilience follows two different directions. On the
one hand we design new resilience solutions, either generic
fault-tolerance solutions or algorithm-based solutions. On the other
hand we model and theoretically analyze the performance of existing
and future solutions, in order to tune their usage and help determine
which solution to use in which context.</p>
        <a name="uid14"/>
        <h4 class="titre4">Scheduling strategies for applications with a
probabilistic behavior</h4>
        <p>Static scheduling algorithms are algorithms where all decisions are
taken before the start of the application execution. On the contrary,
in non-static algorithms, decisions may depend on events that happen
during the execution. Static scheduling algorithms are known to be
superior to dynamic and system-oriented approaches in stable
frameworks <a href="./bibliography.html#roma-2016-bid7">[65]</a>, <a href="./bibliography.html#roma-2016-bid8">[71]</a>, <a href="./bibliography.html#roma-2016-bid9">[72]</a>, <a href="./bibliography.html#roma-2016-bid10">[82]</a>, that is, when all
characteristics of platforms and applications are perfectly known,
known a priori, and do not evolve during the application execution.
In practice, the prediction of application characteristics may be
approximative or completely infeasible. For instance, the amount of
computations and of communications required to solve a given problem
in parallel may strongly depend on some input data that are hard to
analyze (this is for instance the case when solving linear systems
using full pivoting).</p>
        <p>We plan to consider applications whose characteristics change dynamically and are subject to
uncertainties. In order to benefit
nonetheless from the power of static approaches, we plan to model
application uncertainties and variations through probabilistic models,
and to design for these applications scheduling strategies that are
either static, or partially static and partially dynamic.</p>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid3.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid15.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
    </div>
  </body>
</html>
