<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN" "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
    <title>Project-Team:MULTISPEECH</title>
    <link rel="stylesheet" href="../static/css/raweb.css" type="text/css"/>
    <meta name="description" content="Overall Objectives - Overall Objectives"/>
    <meta name="dc.title" content="Overall Objectives - Overall Objectives"/>
    <meta name="dc.subject" content=""/>
    <meta name="dc.publisher" content="INRIA"/>
    <meta name="dc.date" content="(SCHEME=ISO8601) 2018-01"/>
    <meta name="dc.type" content="Report"/>
    <meta name="dc.language" content="(SCHEME=ISO639-1) en"/>
    <meta name="projet" content="MULTISPEECH"/>
    <script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML">
      <!--MathJax-->
    </script>
  </head>
  <body>
    <div class="tdmdiv">
      <div class="logo">
        <a href="http://www.inria.fr">
          <img style="align:bottom; border:none" src="../static/img/icons/logo_INRIA-coul.jpg" alt="Inria"/>
        </a>
      </div>
      <div class="TdmEntry">
        <div class="tdmentete">
          <a href="uid0.html">Project-Team Multispeech</a>
        </div>
        <span>
          <a href="uid1.html">Team, Visitors, External Collaborators</a>
        </span>
      </div>
      <div class="tdmActPage">
        <a href="./uid3.html">Overall Objectives</a>
      </div>
      <div class="TdmEntry">Research Program<ul><li><a href="uid11.html&#10;&#9;&#9;  ">Explicit Modeling of Speech Production and Perception</a></li><li><a href="uid15.html&#10;&#9;&#9;  ">Statistical Modeling of Speech</a></li><li><a href="uid21.html&#10;&#9;&#9;  ">Uncertainty Estimation and Exploitation in Speech Processing</a></li></ul></div>
      <div class="TdmEntry">Application Domains<ul><li><a href="uid26.html&#10;&#9;&#9;  ">Introduction</a></li><li><a href="uid27.html&#10;&#9;&#9;  ">Computer Assisted Learning</a></li><li><a href="uid28.html&#10;&#9;&#9;  ">Aided Communication and Monitoring</a></li><li><a href="uid29.html&#10;&#9;&#9;  ">Annotation and Processing of Spoken Documents and Audio Archives</a></li><li><a href="uid30.html&#10;&#9;&#9;  ">Multimodal Computer Interactions</a></li></ul></div>
      <div class="TdmEntry">
        <a href="./uid32.html">Highlights of the Year</a>
      </div>
      <div class="TdmEntry">New Software and Platforms<ul><li><a href="uid35.html&#10;&#9;&#9;  ">dnnsep</a></li><li><a href="uid38.html&#10;&#9;&#9;  ">Dynalips-Player</a></li><li><a href="uid42.html&#10;&#9;&#9;  ">KATS</a></li><li><a href="uid44.html&#10;&#9;&#9;  ">VisArtico</a></li><li><a href="uid50.html&#10;&#9;&#9;  ">Xarticulators</a></li></ul></div>
      <div class="TdmEntry">New Results<ul><li><a href="uid54.html&#10;&#9;&#9;  ">Explicit Modeling of Speech Production and Perception</a></li><li><a href="uid64.html&#10;&#9;&#9;  ">Statistical Modeling of Speech</a></li><li><a href="uid83.html&#10;&#9;&#9;  ">Uncertainty Estimation and Exploitation in Speech Processing</a></li></ul></div>
      <div class="TdmEntry">Bilateral Contracts and Grants with Industry<ul><li><a href="uid87.html&#10;&#9;&#9;  ">Bilateral Contracts with Industry</a></li><li><a href="uid109.html&#10;&#9;&#9;  ">Bilateral Grants with Industry</a></li></ul></div>
      <div class="TdmEntry">Partnerships and Cooperations<ul><li><a href="uid131.html&#10;&#9;&#9;  ">Regional Initiatives</a></li><li><a href="uid153.html&#10;&#9;&#9;  ">National Initiatives</a></li><li><a href="uid208.html&#10;&#9;&#9;  ">European Initiatives</a></li><li><a href="uid229.html&#10;&#9;&#9;  ">International Initiatives</a></li></ul></div>
      <div class="TdmEntry">Dissemination<ul><li><a href="uid245.html&#10;&#9;&#9;  ">Promoting Scientific Activities</a></li><li><a href="uid333.html&#10;&#9;&#9;  ">Teaching - Supervision - Juries</a></li><li><a href="uid409.html&#10;&#9;&#9;  ">Popularization</a></li></ul></div>
      <div class="TdmEntry">
        <div>Bibliography</div>
      </div>
      <div class="TdmEntry">
        <ul>
          <li>
            <a id="tdmbibentmajor" href="bibliography.html">Major publications</a>
          </li>
          <li>
            <a id="tdmbibentyear" href="bibliography.html#year">Publications of the year</a>
          </li>
          <li>
            <a id="tdmbibentfoot" href="bibliography.html#References">References in notes</a>
          </li>
        </ul>
      </div>
    </div>
    <div id="main">
      <div class="mainentete">
        <div id="head_agauche">
          <small><a href="http://www.inria.fr">
	    
	    Inria
	  </a> | <a href="../index.html">
	    
	    Raweb 
	    2018</a> | <a href="http://www.inria.fr/en/teams/multispeech">Presentation of the Project-Team MULTISPEECH</a> | <a href="https://team.inria.fr/multispeech/">MULTISPEECH Web Site
	  </a></small>
        </div>
        <div id="head_adroite">
          <table class="qrcode">
            <tr>
              <td>
                <a href="multispeech.xml">
                  <img style="align:bottom; border:none" alt="XML" src="../static/img/icons/xml_motif.png"/>
                </a>
              </td>
              <td>
                <a href="multispeech.pdf">
                  <img style="align:bottom; border:none" alt="PDF" src="IMG/qrcode-multispeech-pdf.png"/>
                </a>
              </td>
              <td>
                <a href="../multispeech/multispeech.epub">
                  <img style="align:bottom; border:none" alt="e-pub" src="IMG/qrcode-multispeech-epub.png"/>
                </a>
              </td>
            </tr>
            <tr>
              <td/>
              <td>PDF
</td>
              <td>e-Pub
</td>
            </tr>
          </table>
        </div>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid1.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid11.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
      <div id="textepage">
        <!--DEBUT2 du corps du module-->
        <h2>Section: 
      Overall Objectives</h2>
        <h3 class="titre3">Overall Objectives</h3>
        <p>The goal of the project is the modeling of speech for facilitating oral-based communication.
The name MULTISPEECH comes from the following aspects that are particularly considered:</p>
        <ul>
          <li>
            <p class="notaparagraph"><a name="uid4"> </a><b>Multisource aspects</b> - which means dealing with speech signals originating from several sources, such as speaker plus noise, or overlapping speech signals resulting from multiple speakers; sounds captured from several microphones are also considered.</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid5"> </a><b>Multilingual aspects</b> - which means dealing with speech in a multilingual context, as for example for computer assisted language learning, where the pronunciations of words in a foreign language (i.e., non-native speech) is strongly influenced by the mother tongue.</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid6"> </a><b>Multimodal aspects</b> - which means considering simultaneously the various modalities of speech signals, acoustic and visual, in particular for the expressive synthesis of audio-visual speech.</p>
          </li>
        </ul>
        <p/>
        <p class="notaparagraph">The project is organized along the three following scientific challenges:</p>
        <ul>
          <li>
            <p class="notaparagraph"><a name="uid7"> </a><b>The explicit modeling of speech.</b>
Speech signals result from the movements of articulators. A good knowledge of their position with respect to sounds is essential to improve, on the one hand, articulatory speech synthesis, and on the other hand, the relevance of the diagnosis and of the associated feedback in computer assisted language learning. Production and perception processes are interrelated, so a better understanding of how humans perceive speech will lead to more relevant diagnoses in language learning as well as pointing out critical parameters for expressive speech synthesis. Also, as the expressivity translates into both visual and acoustic effects that must be considered simultaneously, the multimodal components of expressivity, which are both on the voice and on the face, are addressed to produce expressive multimodal speech.</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid8"> </a><b>The statistical modeling of speech.</b>
Statistical approaches are common for processing speech and they achieve performance that makes possible their use in actual applications. However, speech recognition systems still have limited capabilities (for example, even if large, the vocabulary is limited) and their performance drops significantly when dealing with degraded speech, such as noisy or reverberated signals, distant microphone recording and spontaneous speech. Source separation and speech enhancement approaches are investigated as a way of making speech recognition systems more robust. Handling new proper names is an example of critical aspect that is tackled, along with the use of statistical models for speech-text automatic alignment and for speech production.</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid9"> </a><b>The estimation and the exploitation of uncertainty in speech processing.</b>
Speech signals are highly variable and often disturbed with noise or other spurious signals (such as music or undesired extra speech). In addition, the output of speech enhancement and of source separation techniques is not exactly the accurate “clean” original signal, and estimation errors have to be taken into account in further processing. Hence, one goal consists to compute and handle the uncertainty of the reconstructed signal provided by source separation approaches.
Finally, MULTISPEECH also aims to estimate the reliability of phonetic segment boundaries and prosodic parameters for which no such information is currently available.</p>
          </li>
        </ul>
        <p>Although being interdependent, each of these three scientific challenges constitutes a founding research direction for the MULTISPEECH project. Consequently, the research program is organized along three research directions, each one matching a scientific challenge.
A large part of the research is conducted on French speech data; English and German languages are also considered in speech recognition experiments and language learning.
Adaptation to other languages of the machine learning based approaches is possible, depending on the availability of corresponding speech corpora.
Most of our research on signal processing, speech recognition, speech synthesis, etc., relies on deep learning approaches.</p>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid1.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid11.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
    </div>
  </body>
</html>
