MULTISPEECH - 2018 - Annual activity report

MULTISPEECH

MULTISPEECH - 2018

Project-Team Multispeech

Team, Visitors, External Collaborators

Overall Objectives

Research Program

Application Domains

Highlights of the Year

New Software and Platforms

New Results

Bilateral Contracts and Grants with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: Partnerships and Cooperations

National Initiatives

ANR DYCI2

Project acronym: DYCI2 (http://repmus.ircam.fr/dyci2/)
Project title: Creative Dynamics of Improvised Interaction
Duration: March 2015 - February 2018
Coordinator: Ircam (Paris)
Other partners: Inria (Nancy), University of La Rochelle
Participants: Ken Déguernel, Nathan Libermann, Emmanuel Vincent
Abstract: The goal of this project was to design a music improvisation system able to listen to the other musicians, to improvise in their style, and to modify its improvisation according to their feedback in real time.

MULTISPEECH was responsible for designing a system able to improvise on multiple musical dimensions (melody, harmony) across multiple time scales.

ANR ArtSpeech

Project acronym: ArtSpeech
Project title: Synthèse articulatoire phonétique
Duration: October 2015 - March 2019
Coordinator: Yves Laprie
Other partners: Gipsa-Lab (Grenoble), IADI (Nancy), LPP (Paris)
Participants: Ioannis Douros, Yves Laprie, Anastasiia Tsukanova
Abstract: The objective is to synthesize speech from text via the numerical simulation of the human speech production processes, i.e. the articulatory, aerodynamic and acoustic aspects. Corpus based approaches have taken a hegemonic place in text to speech synthesis. They exploit very good acoustic quality speech databases while covering a high number of expressions and of phonetic contexts. This is sufficient to produce intelligible speech. However, these approaches face almost insurmountable obstacles as soon as parameters intimately related to the physical process of speech production have to be modified. On the contrary, an approach which rests on the simulation of the physical speech production process makes explicitly use of source parameters, anatomy and geometry of the vocal tract, and of a temporal supervision strategy. It thus offers direct control on the nature of the synthetic speech.

Static MRI acquisition of vowels (images plus acoustic signal) have been carried out this year and their exploitation started to explore the impact of the articulatory modeling and the plane wave assumption. Manual delineations of approximately 1000 images have been done and used to generate speech signals with articulatory copy synthesis.

ANR JCJC KAMoulox

Project acronym: KAMoulox
Project title: Kernel additive modelling for the unmixing of large audio archives
Duration: January 2016 - September 2019
Coordinator: Antoine Liutkus (Inria Zenith)
Participants: Mathieu Fontaine, Antoine Liutkus
Abstract: The objective is to develop the theoretical and applied tools required to embed audio denoising and separation tools in web-based audio archives. The applicative scenario is to deal with large audio archives, and more precisely with the notorious “Archives du CNRS — Musée de l'homme”, gathering about 50,000 recordings dating back to the early 1900s.

PIA2 ISITE LUE

Project acronym: ISITE LUE
Project title: Lorraine Université d’Excellence
Duration: starting in 2016
Coordinator: Univ. Lorraine
Participants: Ioannis Douros, Yves Laprie
Abstract: The initiative aims at developing and densifying the initial perimeter of excellence, within the scope of the social and economic challenges, so as to build an original model for a leading global engineering university, with a strong emphasis on technological research and education through research. For this, we have designed LUE as an “engine” for the development of excellence, by stimulating an original dialogue between knowledge fields.

MULTISPEECH is mainly concerned with challenge number 6: “Knowledge engineering”, i.e., engineering applied to the field of knowledge and language, which represent our immaterial wealth while being a critical factor for the consistency of future choices. This project funds the PhD thesis of Ioannis Douros.

E-FRAN METAL

Project acronym: E-FRAN METAL
Project title: Modèles Et Traces au service de l’Apprentissage des Langues
Duration: October 2016 - September 2020
Coordinator: Anne Boyer (LORIA)
Other partners: Interpsy, LISEC, ESPE de Lorraine, D@NTE (Univ. Versailles Saint Quentin), Sailendra SAS, ITOP Education, Rectorat.
Participants: Theo Biasutto-Lervat, Anne Bonneau, Vincent Colotte, Dominique Fohr, Denis Jouvet, Odile Mella, Slim Ouni, Anne-Laure Piat-Marchand, Elodie Gauthier, Thomas Girod
Abstract: METAL aims at improving the learning of languages (both written and oral components) through the development of new tools and the analysis of numeric traces associated with students' learning, in order to adapt to the needs and rythm of each learner.

MULTISPEECH is concerned by oral language learning aspects.

ANR VOCADOM

Project acronym: VOCADOM (http://vocadom.imag.fr/)
Project title: Robust voice command adapted to the user and to the context for ambient assisted living
Duration: January 2017 - December 2020
Coordinator: CNRS - LIG (Grenoble)
Other partners: Inria (Nancy), Univ. Lyon 2 - GREPS, THEORIS (Paris)
Participants: Dominique Fohr, Md Sahidullah, Sunit Sivasankaran, Emmanuel Vincent
Abstract: The goal of this project is to design a robust voice control system for smart home applications.

MULTISPEECH is responsible for wake-up word detection, overlapping speech separation, and speaker recognition.

ANR JCJC DiSCogs

Project acronym: DiSCogs
Project title: Distant speech communication with heterogeneous unconstrained microphone arrays
Duration: September 2018 – March 2022
Coordinator: Romain Serizel
Participants: Nicolas Furnon, Irina Illina, Romain Serizel, Emmanuel Vincent
Collaborators: Télécom ParisTech, 7sensing
Abstract: The objective is to solve fundamental sound processing issues in order to exploit the many devices equipped with microphones that populate our everyday life. The solution proposed is to apply machine learning methods based on deep learning to recast the problem of synchronizing devices at the signal level as a multi-view learning problem aiming at extracting complementary information from the devices at hand.

Previous |

Home | Next next