<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN" "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
    <title>Project-Team:PERCEPTION</title>
    <link rel="stylesheet" href="../static/css/raweb.css" type="text/css"/>
    <meta name="description" content="Overall Objectives - Overall Objectives"/>
    <meta name="dc.title" content="Overall Objectives - Overall Objectives"/>
    <meta name="dc.subject" content=""/>
    <meta name="dc.publisher" content="INRIA"/>
    <meta name="dc.date" content="(SCHEME=ISO8601) 2017-01"/>
    <meta name="dc.type" content="Report"/>
    <meta name="dc.language" content="(SCHEME=ISO639-1) en"/>
    <meta name="projet" content="PERCEPTION"/>
    <script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML">
      <!--MathJax-->
    </script>
  </head>
  <body>
    <div class="tdmdiv">
      <div class="logo">
        <a href="http://www.inria.fr">
          <img style="align:bottom; border:none" src="../static/img/icons/logo_INRIA-coul.jpg" alt="Inria"/>
        </a>
      </div>
      <div class="TdmEntry">
        <div class="tdmentete">
          <a href="uid0.html">Project-Team Perception</a>
        </div>
        <span>
          <a href="uid1.html">Personnel</a>
        </span>
      </div>
      <div class="tdmActPage">
        <a href="./uid3.html">Overall Objectives</a>
      </div>
      <div class="TdmEntry">Research Program<ul><li><a href="uid6.html&#10;&#9;&#9;  ">Audio-Visual Scene Analysis</a></li><li><a href="uid7.html&#10;&#9;&#9;  ">Stereoscopic Vision</a></li><li><a href="uid8.html&#10;&#9;&#9;  ">Audio Signal Processing</a></li><li><a href="uid9.html&#10;&#9;&#9;  ">Visual Reconstruction With Multiple Color and Depth Cameras</a></li><li><a href="uid10.html&#10;&#9;&#9;  ">Registration, Tracking and Recognition of People and Actions</a></li></ul></div>
      <div class="TdmEntry">
        <a href="./uid12.html">Highlights of the Year</a>
      </div>
      <div class="TdmEntry">New Software and Platforms<ul><li><a href="uid20.html&#10;&#9;&#9;  ">ECMPR</a></li><li><a href="uid24.html&#10;&#9;&#9;  ">Mixcam</a></li><li><a href="uid28.html&#10;&#9;&#9;  ">NaoLab</a></li><li><a href="uid32.html&#10;&#9;&#9;  ">Stereo matching and recognition library</a></li><li><a href="uid36.html&#10;&#9;&#9;  ">Platforms</a></li></ul></div>
      <div class="TdmEntry">New Results<ul><li><a href="uid41.html&#10;&#9;&#9;  ">Audio-Source Localization</a></li><li><a href="uid42.html&#10;&#9;&#9;  ">Audio-Source Separation</a></li><li><a href="uid43.html&#10;&#9;&#9;  ">Speech Dereverberation and Noise Reduction</a></li><li><a href="uid44.html&#10;&#9;&#9;  ">Acoustic-Articulatory Mapping</a></li><li><a href="uid45.html&#10;&#9;&#9;  ">Visual Tracking of Multiple Persons</a></li><li><a href="uid47.html&#10;&#9;&#9;  ">Audio-Visual Speaker Tracking and Diarization</a></li><li><a href="uid49.html&#10;&#9;&#9;  ">Head Pose Estimation and Tracking</a></li><li><a href="uid50.html&#10;&#9;&#9;  ">Tracking Eye Gaze and of Visual Focus of Attention</a></li><li><a href="uid52.html&#10;&#9;&#9;  ">Attention-Gated Conditional Random Fields</a></li><li><a href="uid53.html&#10;&#9;&#9;  ">Pooling Local Virality</a></li><li><a href="uid54.html&#10;&#9;&#9;  ">Registration of Multiple Point Sets</a></li></ul></div>
      <div class="TdmEntry">Bilateral Contracts and Grants with Industry<ul><li><a href="uid57.html&#10;&#9;&#9;  ">Bilateral Contracts with Industry</a></li></ul></div>
      <div class="TdmEntry">Partnerships and Cooperations<ul><li><a href="uid59.html&#10;&#9;&#9;  ">European Initiatives</a></li><li><a href="uid69.html&#10;&#9;&#9;  ">International Initiatives</a></li><li><a href="uid78.html&#10;&#9;&#9;  ">International Research Visitors</a></li></ul></div>
      <div class="TdmEntry">Dissemination<ul><li><a href="uid83.html&#10;&#9;&#9;  ">Promoting Scientific Activities</a></li><li><a href="uid98.html&#10;&#9;&#9;  ">Teaching - Supervision - Juries</a></li></ul></div>
      <div class="TdmEntry">
        <div>Bibliography</div>
      </div>
      <div class="TdmEntry">
        <ul>
          <li>
            <a id="tdmbibentmajor" href="bibliography.html">Major publications</a>
          </li>
          <li>
            <a id="tdmbibentyear" href="bibliography.html#year">Publications of the year</a>
          </li>
        </ul>
      </div>
    </div>
    <div id="main">
      <div class="mainentete">
        <div id="head_agauche">
          <small><a href="http://www.inria.fr">
	    
	    Inria
	  </a> | <a href="../index.html">
	    
	    Raweb 
	    2017</a> | <a href="http://www.inria.fr/en/teams/perception">Presentation of the Project-Team PERCEPTION</a> | <a href="http://team.inria.fr/perception">PERCEPTION Web Site
	  </a></small>
        </div>
        <div id="head_adroite">
          <table class="qrcode">
            <tr>
              <td>
                <a href="perception.xml">
                  <img style="align:bottom; border:none" alt="XML" src="../static/img/icons/xml_motif.png"/>
                </a>
              </td>
              <td>
                <a href="perception.pdf">
                  <img style="align:bottom; border:none" alt="PDF" src="IMG/qrcode-perception-pdf.png"/>
                </a>
              </td>
              <td>
                <a href="../perception/perception.epub">
                  <img style="align:bottom; border:none" alt="e-pub" src="IMG/qrcode-perception-epub.png"/>
                </a>
              </td>
            </tr>
            <tr>
              <td/>
              <td>PDF
</td>
              <td>e-Pub
</td>
            </tr>
          </table>
        </div>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid1.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid6.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
      <div id="textepage">
        <!--DEBUT2 du corps du module-->
        <h2>Section: 
      Overall Objectives</h2>
        <h3 class="titre3">Overall Objectives</h3>
        <div align="center" style="margin-top:10px">
          <a name="uid4">
            <!--...-->
          </a>
          <table title="" class="objectContainer">
            <caption align="bottom"><strong>Figure
	1. </strong>This figure illustrates the audio-visual multi-party human-robot interaction paradigm that the PERCEPTION team has developed in the recent past <a href="./bibliography.html#perception-2017-bid0">[20]</a>, <a href="./bibliography.html#perception-2017-bid1">[2]</a>, <a href="./bibliography.html#perception-2017-bid2">[10]</a>. There are inter-person as well as person-robot interactions that must be properly detected and analyzed over time. This includes multiple-person tracking <a href="./bibliography.html#perception-2017-bid3">[4]</a>, person detection and head-pose estimation <a href="./bibliography.html#perception-2017-bid4">[30]</a>, sound-source separation and localization <a href="./bibliography.html#perception-2017-bid5">[6]</a>, <a href="./bibliography.html#perception-2017-bid6">[1]</a>, <a href="./bibliography.html#perception-2017-bid7">[22]</a>, <a href="./bibliography.html#perception-2017-bid8">[23]</a>,<a href="./bibliography.html#perception-2017-bid9">[35]</a>, and speaker diarization <a href="./bibliography.html#perception-2017-bid10">[33]</a>. These developments have been supported by the European Union via the FP7 STREP project <i>“Embodied Audition for Robots"</i> (EARS) and the ERC advanced grant <i>“Vision and Hearing in Action"</i> (VHIA).</caption>
            <tr align="center">
              <td>
                <table>
                  <tr>
                    <td style="height:3px;" align="center">
                      <img style="width:384.2974pt" alt="IMG/scenevincent.png" src="IMG/scenevincent.png"/>
                    </td>
                  </tr>
                </table>
              </td>
            </tr>
          </table>
        </div>
        <p>Auditory and visual perception play a complementary role in human interaction. Perception enables people to communicate based on verbal (speech and language) and non-verbal (facial expressions, visual gaze, head movements, hand and body gesturing) communication. These communication modalities have a large degree of overlap, in particular in social contexts. Moreover, the modalities disambiguate each other whenever one of the modalities is weak, ambiguous, or corrupted by various perturbations. Human-computer interaction (HCI) has attempted to address these issues, e.g., using smart &amp; portable devices. In HCI the user is in the loop for decision taking: images and sounds are recorded purposively in order to optimize their quality with respect to the task at hand.</p>
        <p>However, the robustness of HCI based on speech recognition degrades significantly as the microphones are located a few meters away from the user. Similarly, face detection and recognition work well under limited lighting conditions and if the cameras are properly oriented towards a person. Altogether, the HCI paradigm cannot be easily extended to less constrained interaction scenarios which involve several users and whenever is important to consider the <i>social context</i>.</p>
        <p>The PERCEPTION team investigates the fundamental role played by audio and visual perception in human-robot interaction (HRI). The main difference between HCI and HRI is that, while the former is user-controlled, the latter is robot-controlled, namely <i>it is implemented with intelligent robots that take decisions and act autonomously</i>. The mid term objective of PERCEPTION is to develop computational models, methods, and applications for enabling non-verbal and verbal interactions between people, analyze their intentions and their dialogue, extract information and synthesize appropriate behaviors, e.g., the robot waves to a person, turns its head towards the dominant speaker, nods, gesticulates, asks questions, gives advices, waits for instructions, etc. The following topics are thoroughly addressed by the team members: audio-visual sound-source separation and localization in natural environments, for example to detect and track moving speakers, inference of temporal models of verbal and non-verbal activities (diarisation), continuous recognition of particular gestures and words, context recognition, and multimodal dialogue.</p>
        <p class="notaparagraph">Video: <a href="https://team.inria.fr/perception/demos/nao-video/">https://team.inria.fr/perception/demos/nao-video/</a></p>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid1.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid6.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
    </div>
  </body>
</html>
