<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN" "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
    <title>Project-Team:PERCEPTION</title>
    <link rel="stylesheet" href="../static/css/raweb.css" type="text/css"/>
    <meta name="description" content="Overall Objectives - Overall Objectives"/>
    <meta name="dc.title" content="Overall Objectives - Overall Objectives"/>
    <meta name="dc.subject" content=""/>
    <meta name="dc.publisher" content="INRIA"/>
    <meta name="dc.date" content="(SCHEME=ISO8601) 2015-01"/>
    <meta name="dc.type" content="Report"/>
    <meta name="dc.language" content="(SCHEME=ISO639-1) en"/>
    <meta name="projet" content="PERCEPTION"/>
    <!-- Piwik -->
    <script type="text/javascript" src="/rapportsactivite/piwik.js"></script>
    <noscript><p><img src="//piwik.inria.fr/piwik.php?idsite=49" style="border:0;" alt="" /></p></noscript>
    <!-- End Piwik Code -->
  </head>
  <body>
    <div class="tdmdiv">
      <div class="logo">
        <a href="http://www.inria.fr">
          <img style="align:bottom; border:none" src="../static/img/icons/logo_INRIA-coul.jpg" alt="Inria"/>
        </a>
      </div>
      <div class="TdmEntry">
        <div class="tdmentete">
          <a href="uid0.html">Project-Team Perception</a>
        </div>
        <span>
          <a href="uid1.html">Members</a>
        </span>
      </div>
      <div class="tdmActPage">
        <a href="./uid3.html">Overall Objectives</a>
      </div>
      <div class="TdmEntry">Research Program<ul><li><a href="uid6.html&#10;&#9;&#9;  ">Audio-Visual Scene Analysis</a></li><li><a href="uid7.html&#10;&#9;&#9;  ">Stereoscopic Vision</a></li><li><a href="uid8.html&#10;&#9;&#9;  ">Audio Signal Processing</a></li><li><a href="uid9.html&#10;&#9;&#9;  ">Visual Reconstruction With Multiple Color and Depth Cameras</a></li><li><a href="uid10.html&#10;&#9;&#9;  ">Registration, Tracking and Recognition of People and Actions</a></li></ul></div>
      <div class="TdmEntry">
        <a href="./uid12.html">Highlights of the Year</a>
      </div>
      <div class="TdmEntry">New Software and Platforms<ul><li><a href="uid17.html&#10;&#9;&#9;  ">Associations of Audio Cues with 3D Locations Library</a></li><li><a href="uid22.html&#10;&#9;&#9;  ">Supervised Binaural Mapping Software</a></li><li><a href="uid27.html&#10;&#9;&#9;  ">Audiovisual Robotic Heads</a></li><li><a href="uid31.html&#10;&#9;&#9;  ">MIXCAM Platform</a></li><li><a href="uid36.html&#10;&#9;&#9;  ">NaoLAB</a></li></ul></div>
      <div class="TdmEntry">New Results<ul><li><a href="uid42.html&#10;&#9;&#9;  ">Supervised Audio-Source Localization</a></li><li><a href="uid43.html&#10;&#9;&#9;  ">Multichannel Audio-Source Separation</a></li><li><a href="uid44.html&#10;&#9;&#9;  ">Audio-Visual Speaker Tracking and Recognition</a></li><li><a href="uid46.html&#10;&#9;&#9;  ">Head Pose Estimation</a></li><li><a href="uid47.html&#10;&#9;&#9;  ">High-Resolution Scene Reconstruction</a></li><li><a href="uid49.html&#10;&#9;&#9;  ">Hyper-Spectral Image Analysis</a></li><li><a href="uid50.html&#10;&#9;&#9;  ">Gaussian Mixture Regression for Acoustic-Articulatory Inversion</a></li></ul></div>
      <div class="TdmEntry">Bilateral Contracts and Grants with Industry<ul><li><a href="uid52.html&#10;&#9;&#9;  ">Bilateral Contracts with Industry</a></li></ul></div>
      <div class="TdmEntry">Partnerships and Cooperations<ul><li><a href="uid54.html&#10;&#9;&#9;  ">National Initiatives</a></li><li><a href="uid62.html&#10;&#9;&#9;  ">European Initiatives</a></li><li><a href="uid95.html&#10;&#9;&#9;  ">International Research Visitors</a></li></ul></div>
      <div class="TdmEntry">Dissemination<ul><li><a href="uid102.html&#10;&#9;&#9;  ">Promoting Scientific Activities</a></li><li><a href="uid111.html&#10;&#9;&#9;  ">Teaching - Supervision - Juries</a></li></ul></div>
      <div class="TdmEntry">
        <div>Bibliography</div>
      </div>
      <div class="TdmEntry">
        <ul>
          <li>
            <a id="tdmbibentmajor" href="bibliography.html">Major publications</a>
          </li>
          <li>
            <a id="tdmbibentyear" href="bibliography.html#year">Publications of the year</a>
          </li>
          <li>
            <a id="tdmbibentfoot" href="bibliography.html#References">References in notes</a>
          </li>
        </ul>
      </div>
    </div>
    <div id="main">
      <div class="mainentete">
        <div id="head_agauche">
          <small><a href="http://www.inria.fr">
	    
	    Inria
	  </a> | <a href="../index.html">
	    
	    Raweb 
	    2015</a> | <a href="http://www.inria.fr/en/teams/perception">Presentation of the Project-Team PERCEPTION</a> | <a href="http://team.inria.fr/perception">PERCEPTION Web Site
	  </a></small>
        </div>
        <div id="head_adroite">
          <table class="qrcode">
            <tr>
              <td>
                <a href="perception.xml">
                  <img style="align:bottom; border:none" alt="XML" src="../static/img/icons/xml_motif.png"/>
                </a>
              </td>
              <td>
                <a href="perception.pdf">
                  <img style="align:bottom; border:none" alt="PDF" src="IMG/qrcode-perception-pdf.png"/>
                </a>
              </td>
              <td>
                <a href="../perception/perception.epub">
                  <img style="align:bottom; border:none" alt="e-pub" src="IMG/qrcode-perception-epub.png"/>
                </a>
              </td>
            </tr>
            <tr>
              <td/>
              <td>PDF
</td>
              <td>e-Pub
</td>
            </tr>
          </table>
        </div>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid1.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid6.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
      <div id="textepage">
        <!--DEBUT2 du corps du module-->
        <h2>Section: 
      Overall Objectives</h2>
        <h3 class="titre3">Overall Objectives</h3>
        <div align="center" style="margin-top:10px">
          <a name="uid4">
            <!--...-->
          </a>
          <table title="" class="objectContainer">
            <caption align="bottom"><strong>Figure
	1. </strong>This figure illustrates the general principle of the latent-variable mixture models for audio-visual data analysis that the PERCEPTION team have developed
<a href="./bibliography.html#perception-2015-bid0">[11]</a> , <a href="./bibliography.html#perception-2015-bid1">[17]</a> . Audiovisual events (<span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>𝐒</mi></math></span>), e.g., speaking
faces, are observed with two cameras and two microphones, hence two types of observations are available: 3D binocular
features (<span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>𝐯</mi></math></span>) and 1D binaural features (<span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi></math></span>). By combining the <i>inverse</i> visual mapping with the <i>direct</i> auditory
mapping, <span class="math"><math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>𝒜</mi><mo>∘</mo><msup><mi>𝒱</mi><mrow><mo>-</mo><mn>1</mn></mrow></msup></mrow></math></span>, it is possible to project 3D visual features onto an 1D
auditory space, to represent visual and auditory data in the same space, and to properly cluster and classify them.</caption>
            <tr align="center">
              <td>
                <table>
                  <tr>
                    <td xmlns="" style="height:3px;" align="center">
                      <img xmlns="http://www.w3.org/1999/xhtml" style="width:384.2974pt" alt="IMG/MappingsSimpleR1.png" src="IMG/MappingsSimpleR1.png"/>
                    </td>
                  </tr>
                </table>
              </td>
            </tr>
          </table>
        </div>
        <p>Auditory and visual perception play a complementary role in human interaction. Perception enables people to communicate based on verbal (speech and language) and non-verbal (facial expressions, visual gaze, head movements, hand and body gesturing) communication. These communication modalities have a large degree of overlap, in particular in social contexts. Moreover, the modalities disambiguate each other whenever one of the modalities is weak, ambiguous, or corrupted by various perturbations. Human-computer interaction (HCI) has attempted to address these issues, e.g., using smart &amp; portable devices. In HCI the user is in the loop for decision taking: images and sounds are recorded purposively in order to optimize their quality with respect to the task at hand.</p>
        <p>However, the robustness of HCI based on speech recognition degrades significantly as the microphones are located a few meters away from the user. Similarly, face detection and recognition work well under limited lighting conditions and if the cameras are properly oriented towards a person. Altogether, the HCI paradigm cannot be easily extended to less constrained interaction scenarios which involve several users and whenever is important to consider the <i>social context</i>.</p>
        <p>The PERCEPTION team investigates the fundamental role played by audio and visual perception in human-robot interaction (HRI). The main difference between HCI and HRI is that, while the former is user-controlled, the latter is robot-controlled, namely <i>it is implemented with intelligent robots that take decisions and act autonomously</i>. The mid term objective of PERCEPTION is to develop computational models, methods, and applications for enabling non-verbal and verbal interactions between people, analyze their intentions and their dialogue, extract information and synthesize appropriate behaviors, e.g., the robot waves to a person, turns its head towards the dominant speaker, nods, gesticulates, asks questions, gives advices, waits for instructions, etc. The following topics are thoroughly addressed by the team members: audio-visual sound-source separation and localization in natural environments, for example to detect and track moving speakers, inference of temporal models of verbal and non-verbal activities (diarisation), continuous recognition of particular gestures and words, context recognition, and multimodal dialogue.</p>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid1.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid6.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
    </div>
  </body>
</html>
