MULTISPEECH - 2015 - Annual activity report

MULTISPEECH

MULTISPEECH - 2015

Project-Team Multispeech

Members

Overall Objectives

Research Program

Application Domains

Highlights of the Year

New Software and Platforms

New Results

Bilateral Contracts and Grants with Industry

Bilateral Contracts with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: Partnerships and Cooperations

National Initiatives

EQUIPEX ORTOLANG

Project acronym: ORTOLANG (http://www.ortolang.fr )
Project title: Open Resources and TOols for LANGuage
Duration: September 2012 - May 2016 (phase I, signed in January 2013)
Coordinator: Jean-Marie Pierrel, ATILF (Nancy)
Other partners: LPL (Aix en Provence), LORIA (Nancy), Modyco (Paris), LLL (Orléans), INIST (Nancy)
Abstract: The aim of ORTOLANG is to propose a network infrastructure offering a repository of language data (corpora, lexicons, dictionaries, etc.) and tools and their treatment that are readily available and well-documented. This will enable a real mutualization of analysis research, of modeling and automatic treatment of the French language. This will also facilitate the use and transfer of resources and tools set up within public laboratories towards industrial partners, in particular towards SME which often cannot develop such resources and tools for language treatment due to the costs of their realization. Moreover, this will promote the French language and local languages of France by sharing knowledge which has been acquired by public laboratories.

Several teams of the LORIA laboratory contribute to this Equipex, mainly with respect to providing tools for speech and language processing. MULTISPEECH contributes text-speech alignment and speech visualization tools.

ANR-DFG IFCASL

Project acronym: IFCASL
Project title: Individualized feedback in computer-assisted spoken language learning
Duration: March 2013 - February 2016
Coordinator: Jürgen Trouvain, Saarland University
Other partners: Saarland University (COLI department)
Abstract: The main objective of IFCASL is to investigate learning of oral French by German speakers, and oral German by French speakers at the phonetic level.

The work involved the design and recording of a French-German learner corpus. French speakers were recorded in Nancy, wheras German speakers were recorded in Saarbrücken. An automatic speech-text alignment process was applied on all the data. Then, the French speech data (native and non-native) were manually checked and annotated in France, and the German speech data (native and non-native) were manually checked and annotated in Germany. The corpora are currently used for analyzing non-native pronunciations, and studying feedback procedures.

ANR ContNomina

Project acronym: ContNomina
Project title: Exploitation of context for proper names recognition in diachronic audio documents
Duration: February 2013 - July 2016
Coordinator: Irina Illina, MULTISPEECH
Other partners: LIA, Synalp
Abstract: the ContNomina project focuses on the problem of proper names in automatic audio processing systems by exploiting in the most efficient way the context of the processed documents. To do this, the project addresses the statistical modeling of contexts and of relationships between contexts and proper names; the contextualization of the recognition module (through the dynamic adjustment of the lexicon and of the language model in order to make them more accurate and certainly more relevant in terms of lexical coverage, particularly with respect to proper names); and the detection of proper names (on the one hand, in text documents for building lists of proper names, and on the other hand, in the output of the recognition system to identify spoken proper names in the audio/video data).

ANR DYCI2

Project acronym: DYCI2 (http://repmus.ircam.fr/dyci2/ )
Project title: Creative Dynamics of Improvised Interaction
Duration: March 2015 - February 2018 (signed in October 2014)
Coordinator: Ircam (Paris)
Other partners: Inria (Nancy), University of La Rochelle
Abstract: The goal of this project is to design a music improvisation system which will be able to listen to the other musicians, improvise in their style, and modify its improvisation according to their feedback in real time.

ANR JCJC KAMoulox

Project acronym: KAMoulox
Project title: Kernel additive modelling for the unmixing of large audio archives
Duration: January 2016 - January 2019 (signed in October 2015)
Coordinator: Antoine Liutkus, MULTISPEECH
Abstract: Develop the theoretical and applied tools required to embed audio denoising and separation tools in web-based audio archives. The applicative scenario is to deal with large audio archives, and more precisely with the notorious "Archives du CNRS — Musée de l'homme", gathering about 50,000 recordings dating back to the early 1900s.

ANR ORFEO

Project acronym: ORFEO (http://www.agence-nationale-recherche.fr/en/anr-funded-project/?tx_lwmsuivibilan_pi2[CODE]=ANR-12-CORP-0005 )
Project title: Outils et Ressources pour le Français Écrit et Oral
Duration: February 2013 - February 2016
Coordinator: Jeanne-Marie DEBAISIEUX, Université Paris 3
Other partners: ATILF, CLLE-ERSS, ICAR, LIF, LORIA, LATTICE, MoDyCo
Abstract: The main objective of the ORFEO project is the constitution of a corpus for the study of contemporary French.

In this project, we are concerned by the automatic speech-text alignment at the word and phoneme levels for audio files from several corpora gathered by the project. These corpora orthographically transcribed with Transcriber contain mainly spontaneous speech, recorded under various conditions with a large SNR range and a lot of overlapping speech and anonymised speech segments. For the forced speech-text alignment phase, we applied our 2-step methodology (the first step uses a detailed acoustic model for finding the pronunciation variants; then, in the second step a more compact model is used to provide more temporally accurate boundaries).

FUI RAPSODIE

Project acronym: RAPSODIE (http://erocca.com/rapsodie )
Project title: Automatic Speech Recognition for Hard of Hearing or Handicapped People
Duration: March 2012 - February 2016 (signed in December 2012)
Coordinator: eRocca (Mieussy, Haute-Savoie)
Other partners: CEA (Grenoble), Inria (Nancy), CASTORAMA (France)
Abstract: The goal of the project is to realize a portable device that will help a hard-of-hearing person to communicate with other people. To achieve this goal the portable device will access a speech recognition system, adapted to this task. Another application of the device will be environment vocal control for handicapped persons.

In this project, MULTISPEECH is involved for optimizing the speech recognition models for the envisaged task, and contributes also to finding the best way of presenting the speech recognition results in order to maximize the communication efficiency between the hard-of-hearing person and the speaking person.

FUI VoiceHome

Project acronym: VoiceHome
Duration: February 2015 - July 2017
Coordinator: onMobile
Other partners: Orange, Delta Dore, Technicolor Connected Home, eSoftThings, Inria (Nancy), IRISA, LOUSTIC
Abstract: The goal of this project is to design a robust voice control system for smart home and multimedia applications. We are responsible for the robust automatic speech recognition brick.

ADT Plavis

Project acronym: Plavis
Project title: Platform for acquisition and audiovisual speech synthesis
Duration: January 2015 - December 2016
Coordinator: Vincent Colotte, MULTISPEECH
Abstract: The objective of this project is to develop a platform acquisition and audiovisual synthesis system (3D animation of the face synchronously with audio). The main purpose is to build a comprehensive platform for acquisition and processing of audio-visual corpus (selection, acquisition and acoustic processing, 3D visual processing and linguistic processing). The acquisition is performed using a motion capture system (Kinect-like) or from Vicon system or EMA system. We also propose to develop a 3D audiovisual synthesis system text to audio and 3D information of a talking head. The system will incorporate an animation module of the talking head to reconstruct the face animated with audio. During the first year of the project, we are setting up and testing the acquisition techniques that will be used. We have developed several tools to acquire the audiovisual data and to process it. A synchronization step was developed.

ADT VisArtico

Project acronym: VisArtico
Project title: Software for Processing, analysis and articulatory data visualization
Duration: November 2013 - October 2015
Coordinator: Slim Ouni, MULTISPEECH
Abstract: The Technological Development Action (ADT) Inria Visartico aims at developing and improving VisArtico, an articulatory vizualisation software (see 6.5 ). In addition to improving the basic functionalities, several articulatory analysis and processing tools are being integrated.

CORExp

Project acronym: CORExp
Project title: Acquisition, Processing and Analysis of a Corpus for the Synthesis of Expressive Audiovisual Speech
Duration: December 2014 - December 2016
Coordinator: S. Ouni, MULTISPEECH
Cofunded by Inria and Région Lorraine
Abstract: The main objective of this project is the acquisition of a bimodal corpus of a considerable size (several thousand sentences) to study the expressiveness and emotions during speech (for example, how to decode facial expressions that are merged with speech signal). The main purpose is to acquire, process and analyze the corpus and to study the expressiveness; the results will be used for the expressive audiovisual speech synthesis system.

LORIA exploratory project

Project title: Acquisition and processing of multimodal corpus in the context of interactive human communication
Duration: June 2015 - May 2016
Coordinator: S. Ouni, MULTISPEECH
Abstract : The aim of this project is the study of the various mechanisms involved in multimodal human communication that can be oral, visual, gestural and tactile. This project focuses on the identification and acquisition of a very large corpus of multimodal data from multiple information sources and acquired in the context of interaction and communication between two people or more. We will set up and integrate hardware and software acquisition. Thereafter, we will acquire and structure the multimodal data.

Previous |

Home | Next next