Bibliography
Major publications by the team in recent years
-
2A. Bonneau, D. Fohr, I. Illina, D. Jouvet, O. Mella, L. Mesbahi, L. Orosanu.
Gestion d'erreurs pour la fiabilisation des retours automatiques en apprentissage de la prosodie d'une langue seconde, in: Traitement Automatique des Langues, 2013, vol. 53, no 3.
https://hal.inria.fr/hal-00834278 -
3D. Jouvet, D. Fohr.
Combining Forward-based and Backward-based Decoders for Improved Speech Recognition Performance, in: InterSpeech - 14th Annual Conference of the International Speech Communication Association - 2013, Lyon, France, August 2013.
https://hal.inria.fr/hal-00834282 -
4K. Nathwani, J. A. Morales-Cordovilla, S. Sivasankaran, I. Illina, E. Vincent.
An extended experimental investigation of DNN uncertainty propagation for noise robust ASR, in: 5th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2017), San Francisco, United States, March 2017.
https://hal.inria.fr/hal-01446441 -
5A. A. Nugraha, A. Liutkus, E. Vincent.
Multichannel audio source separation with deep neural networks, in: IEEE/ACM Transactions on Audio, Speech, and Language Processing, June 2016, vol. 24, no 10, pp. 1652-1664. [ DOI : 10.1109/TASLP.2016.2580946 ]
https://hal.inria.fr/hal-01163369 -
6A. Ozerov, E. Vincent, F. Bimbot.
A General Flexible Framework for the Handling of Prior Information in Audio Source Separation, in: IEEE Transactions on Audio, Speech and Language Processing, May 2012, vol. 20, no 4, pp. 1118 - 1133, 16.
https://hal.archives-ouvertes.fr/hal-00626962 -
7A. Piquard-Kipffer, C. Blonz.
Je peux voir les mots que tu dis ! Histoire d'un projet, in: 13ème édition du Festival du film de chercheur CNRS 2012, Nancy, France, June 2012.
https://hal.inria.fr/hal-01263907 -
8A. Piquard-Kipffer, L. Sprenger-Charolles.
Predicting reading level at the end of Grade 2 from skills assessed in kindergarten: contribution of phonemic discrimination (Follow-up of 85 French-speaking children from 4 to 8 years old), in: Topics in Cognitive Psychology, 2013.
https://hal.inria.fr/hal-00833951
Doctoral Dissertations and Habilitation Theses
-
9B. Dumortier.
Acoustic control of wind farms, Université de Lorraine, September 2018.
https://tel.archives-ouvertes.fr/tel-01897853 -
10K. Déguernel.
Learning of musical structures in the context of improvisation, Université de Lorraine, March 2018.
https://tel.archives-ouvertes.fr/tel-01735308
Articles in International Peer-Reviewed Journals
-
11N. Bertin, E. Camberlein, R. Lebarbenchon, E. Vincent, S. Sivasankaran, I. Illina, F. Bimbot.
VoiceHome-2, an extended corpus for multichannel speech processing in real homes, in: Speech Communication, 2018.
https://hal.inria.fr/hal-01923108 -
12K. Déguernel, E. Vincent, G. Assayag.
Probabilistic Factor Oracles for Multidimensional Machine Improvisation, in: Computer Music Journal, June 2018, vol. 42, no 2, pp. 52-66. [ DOI : 10.1162/comj_a_00460 ]
https://hal.inria.fr/hal-01693750 -
13A. Houidhek, V. Colotte, Z. Mnasri, D. Jouvet.
Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic, in: International Journal of Speech Technology, November 2018, pp. 1-12. [ DOI : 10.1007/s10772-018-09558-6 ]
https://hal.inria.fr/hal-01936963 -
14D. Jouvet, D. Langlois, M. A. Menacer, D. Fohr, O. Mella, K. Smaïli.
Adaptation of speech recognition vocabularies for improved transcription of YouTube videos, in: Journal of International Science and General Applications, March 2018, vol. 1, no 1, pp. 1-9.
https://hal.archives-ouvertes.fr/hal-01873801 -
15K. Nathwani, E. Vincent, I. Illina.
DNN Uncertainty Propagation using GMM-Derived Uncertainty Features for Noise Robust ASR, in: IEEE Signal Processing Letters, January 2018. [ DOI : 10.1109/LSP.2018.2791534 ]
https://hal.inria.fr/hal-01680658 -
16S. Ouni, G. Gris.
Dynamic Lip Animation from a Limited number of Control Points: Towards an Effective Audiovisual Spoken Communication, in: Speech Communication, February 2018, vol. 96, pp. 49-57. [ DOI : 10.1016/j.specom.2017.11.006 ]
https://hal.inria.fr/hal-01631397 -
17Z. Wang, E. Vincent, R. Serizel, Y. Yan.
Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments, in: Computer Speech and Language, May 2018, vol. 49, pp. 37-51. [ DOI : 10.1016/j.csl.2017.11.003 ]
https://hal.inria.fr/hal-01634449
International Conferences with Proceedings
-
18B. Abdullah, I. Illina, D. Fohr.
Dynamic Extension of ASR Lexicon Using Wikipedia Data, in: IEEE Workshop on Spoken and Language Technology (SLT), Athènes, Greece, Proceedings of IEEE SLT, December 2018.
https://hal.archives-ouvertes.fr/hal-01874495 -
19J. Barker, S. Watanabe, E. Vincent, J. Trmal.
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines, in: Interspeech 2018 - 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, September 2018, https://arxiv.org/abs/1803.10609.
https://hal.inria.fr/hal-01744021 -
20K. Bartkova, D. Jouvet.
Analysis of prosodic correlates of emotional speech data, in: ExLing 2018 - 9th Tutorial and Research Workshop on Experimental Linguistics, Paris, France, August 2018.
https://hal.inria.fr/hal-01889932 -
21T. Biasutto– Lervat, S. Ouni.
Phoneme-to-Articulatory mapping using bidirectional gated RNN, in: Interspeech 2018 - 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, September 2018.
https://hal.inria.fr/hal-01862587 -
22A. Bonneau.
Impact of fluency and segmental categorization in L2: the case of French final fricatives uttered by German speakers, in: Speech Prosody 2018, Poznan, Poland, June 2018. [ DOI : 10.21437/speechprosody.2018-189 ]
https://hal.inria.fr/hal-01926657 -
23G. Carbajal, R. Serizel, E. Vincent, E. Humbert.
Multiple-input neural network-based residual echo suppression, in: ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Canada, April 2018, pp. 1-5.
https://hal.inria.fr/hal-01723630 -
24H. Delgado, M. Todisco, M. Sahidullah, N. Evans, T. Kinnunen, K. A. Lee, J. Yamagishi.
ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements, in: Odyssey 2018 - The Speaker and Language Recognition Workshop, Les Sables d'Olonne, France, June 2018.
https://hal.inria.fr/hal-01880206 -
25D. Di Carlo, A. Liutkus, K. Déguernel.
Interference reduction on full-length live recordings, in: ICASSP 2018 - IEEE International Conference on Acoustics, Speech, and Signal Processing, Calgary, Canada, IEEE, April 2018, pp. 736-740. [ DOI : 10.1109/ICASSP.2018.8462621 ]
https://hal.inria.fr/hal-01713889 -
26F. Fang, J. Yamagishi, I. Echizen, M. Sahidullah, T. Kinnunen.
Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems, in: WIFS 2018 - IEEE International Workshop on Information Forensics and Security, Hong Kong, Hong Kong SAR China, December 2018.
https://hal.inria.fr/hal-01889910 -
27M. Fontaine, F.-R. Stöter, A. Liutkus, U. Simsekli, R. Serizel, R. Badeau.
Multichannel Audio Modeling with Elliptically Stable Tensor Decomposition, in: LVA ICA 2018 - 14th International Conference on Latent Variable Analysis and Signal Separation, Surrey, United Kingdom, July 2018.
https://hal-lirmm.ccsd.cnrs.fr/lirmm-01766795 -
28B. Garcia-Zapirain, C. Castillo, A. Badiola, S. Zahia, A. Mendez, D. Langlois, D. Jouvet, J.-M. Torres-Moreno, M. Leszczuk, K. Smaïli.
A Proposed Methodology for Subjective Evaluation of Video and Text Summarization, in: MISSI 2018 - 11th edition of the International Conference on Multimedia and Network Information Systems, Wrocław, Poland, Advances in Intelligent Systems and Computing, Springer, September 2018, vol. 833, pp. 396-404. [ DOI : 10.1007/978-3-319-98678-4_40 ]
https://hal.archives-ouvertes.fr/hal-01873685 -
29M. L. Grega, K. Smaïli, M. Leszczuk, C.-E. González-Gallardo, J.-M. Torres-Moreno, E. Linhares Pontes, D. Fohr, O. Mella, M. A. Menacer, D. Jouvet.
An Integrated AMIS Prototype for Automated Summarization and Translation of Newscasts and Reports, in: MISSI 2018 - 11th International Conference on Multimedia and Network Information Systems, Wroclaw, Poland, K. Choroś, M. Kopel, E. Kukla, A. Siemiński (editors), Springer, September 2018, vol. 833, pp. 415-423. [ DOI : 10.1007/978-3-319-98678-4_42 ]
https://hal.archives-ouvertes.fr/hal-01873680 -
30A. Houidhek, V. Colotte, Z. Mnasri, D. Jouvet.
DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation, in: SLSP 2018 - 6th International Conference on Statistical Language and Speech Processing, Mons, Belgium, October 2018.
https://hal.inria.fr/hal-01904512 -
31N. Keriven, A. Deleforge, A. Liutkus.
Blind Source Separation Using Mixtures of Alpha-Stable Distributions, in: ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Canada, IEEE, April 2018, pp. 771-775, https://arxiv.org/abs/1711.04460. [ DOI : 10.1109/ICASSP.2018.8462095 ]
https://hal.inria.fr/hal-01633215 -
32T. Kinnunen, K. A. Lee, H. Delgado, N. Evans, M. Todisco, M. Sahidullah, J. Yamagishi, D. A. Reynolds.
t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification, in: Speaker Odyssey 2018 The Speaker and Language Recognition Workshop, Les Sables d’Olonne, France, June 2018.
https://hal.inria.fr/hal-01880306 -
33Y. Laprie, B. Elie, A. Tsukanova, P.-A. Vuissoz.
Centerline articulatory models of the velum and epiglottis for articulatory synthesis of speech, in: EUSIPCO 2018 - 26th European Signal Processing Conference, Rome, Italy, September 2018.
https://hal.inria.fr/hal-01921928 -
34L. Lee, K. Bartkova, M. Dargnat, D. Jouvet.
Prosodic and Pragmatic Values of Discourse Particles in French, in: ExLing 2018 - 9th Tutorial and Research Workshop on Experimental Linguistics, Paris, France, August 2018.
https://hal.inria.fr/hal-01889925 -
35A. Liutkus, C. Rohlfing, A. Deleforge.
Audio source separation with magnitude priors: the BEADS model, in: ICASSP: International Conference on Acoustics, Speech and Signal Processing, Calgary, Canada, Signal Processing and Artificial Intelligence: Changing the World, April 2018, pp. 1-5. [ DOI : 10.1109/ICASSP.2018.8462515 ]
https://hal.inria.fr/hal-01713886 -
36H. Peic Tukuljac, A. Deleforge, R. Gribonval.
MULAN: A Blind and Off-Grid Method for Multichannel Echo Retrieval, in: NeurIPS 2018 - Thirty-second Conference on Neural Information Processing Systems, Montréal, Canada, December 2018, pp. 1-11, https://arxiv.org/abs/1810.13338.
https://hal.inria.fr/hal-01906385 -
37L. Perotin, R. Serizel, E. Vincent, A. Guérin.
CRNN-based joint azimuth and elevation localization with the Ambisonics intensity vector, in: IWAENC 2018 - 16th International Workshop on Acoustic Signal Enhancement, Tokyo, Japan, September 2018.
https://hal.inria.fr/hal-01840453 -
38L. Perotin, R. Serizel, E. Vincent, A. Guérin.
Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings, in: 43rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2018), Calgary, Canada, April 2018.
https://hal.inria.fr/hal-01699759 -
39R. Scheibler, D. Di Carlo, A. Deleforge, I. Dokmanić.
Separake: Source Separation with a Little Help From Echoes, in: ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Canada, April 2018.
https://hal.inria.fr/hal-01909531 -
40R. Serizel, N. Turpault, H. Eghbal-Zadeh, A. Parag Shah.
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments, in: Workshop on Detection and Classification of Acoustic Scenes and Events, Woking, United Kingdom, November 2018, https://arxiv.org/abs/1807.10501 - Submitted to DCASE2018 Workshop.
https://hal.inria.fr/hal-01850270 -
41S. Sieranoja, M. Sahidullah, T. Kinnunen, J. Komulainen, A. Hadid.
Audiovisual Synchrony Detection with Optimized Audio Features, in: ICSIP 2018 - 3rd International Conference on Signal and Image Processing, Shenzhen, China, July 2018.
https://hal.inria.fr/hal-01889918 -
42S. Sivasankaran, B. M. L. Srivastava, S. Sitaram, K. Bali, M. Choudhury.
Phone Merging for Code-switched Speech Recognition, in: Third Workshop on Computational Approaches to Linguistic Code-switching, Melbourne, Australia, collocated with ACL 2018 , July 2018.
https://hal.inria.fr/hal-01800466 -
43S. Sivasankaran, E. Vincent, D. Fohr.
Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment, in: Interspeech 2018 - 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, September 2018.
https://hal.archives-ouvertes.fr/hal-01817519 -
45M. Strauss, P. Mordel, V. Miguet, A. Deleforge.
DREGON: Dataset and Methods for UAV-Embedded Sound Source Localization, in: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018), Madrid, Spain, October 2018.
https://hal.inria.fr/hal-01854878 -
46L. Terissi, G. Sad, M. Cerda, S. Ouni, R. Galvez, J. B. Gómez, B. Girau, N. Hitschfeld-Kahler.
A French-Spanish Multimodal Speech Communication Corpus Incorporating Acoustic Data, Facial, Hands and Arms Gestures Information, in: Interspeech 2018 - 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, September 2018.
https://hal.inria.fr/hal-01862585 -
47M. Todisco, H. Delgado, K. A. Lee, M. Sahidullah, N. Evans, T. Kinnunen, J. Yamagishi.
Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion, in: Interspeech 2018 - 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, ISCA, September 2018. [ DOI : 10.21437/Interspeech.2018-2289 ]
https://hal.inria.fr/hal-01889934 -
48Z. Wang, J. Li, Y. Yan, E. Vincent.
Semi-supervised learning with deep neural networks for relative transfer function inverse regression, in: ICASSP 2018 – IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Canada, April 2018.
https://hal.inria.fr/hal-01797886 -
49I. Zangar, Z. Mnasri, V. Colotte, D. Jouvet, A. Houidhek.
Duration modeling using DNN for Arabic speech synthesis, in: Speech Prosody 2018 - Proceedings of the 9th International Conference on Speech Prosody are now available!, Poznań, Poland, June 2018.
https://hal.inria.fr/hal-01889917
National Conferences with Proceedings
-
50N. Libermann, F. Bimbot, E. Vincent.
Exploration de dépendances structurelles mélodiques par réseaux de neurones récurrents, in: JIM 2018 - Journées d'Informatique Musicale, Amiens, France, May 2018, pp. 81-86.
https://hal.archives-ouvertes.fr/hal-01791381
Conferences without Proceedings
-
51M. Vacher, E. Vincent, M.-E. Bobillier Chaumon, T. Joubert, F. Portet, D. Fohr, S. Caffiau, T. Desot.
The VocADom Project: Speech Interaction for Well-being and Reliance Improvement, in: MobileHCI 2018 - 20th International Conference on Human-Computer Interaction with Mobile Devices and Services, Barcelona, Spain, September 2018.
https://hal.archives-ouvertes.fr/hal-01830217
Scientific Books (or Scientific Book chapters)
-
52A. Deleforge, A. Schmidt, W. Kellermann.
Audio-Motor Integration for Robot Audition, in: Multimodal Behavior Analysis in the Wild, Academic Press, November 2018, pp. 1-27.
https://hal.inria.fr/hal-01929388 -
53C. Févotte, E. Vincent, A. Ozerov.
Single-channel audio source separation with NMF: divergences, constraints and algorithms, in: Audio Source Separation, Springer, March 2018.
https://hal.inria.fr/hal-01631185 -
54T. Gerkmann, E. Vincent.
Spectral masking and filtering, in: Audio source separation and speech enhancement, E. Vincent, T. Virtanen, S. Gannot (editors), Wiley, August 2018.
https://hal.inria.fr/hal-01881425 -
55A. A. Nugraha, A. Liutkus, E. Vincent.
Deep neural network based multichannel audio source separation, in: Audio Source Separation, Springer, March 2018.
https://hal.inria.fr/hal-01633858 -
56A. Ozerov, C. Févotte, E. Vincent.
An introduction to multichannel NMF for audio source separation, in: Audio Source Separation, Signals and Communication Technology, Springer, March 2018.
https://hal.inria.fr/hal-01631187 -
57M. Sahidullah, H. Delgado, M. Todisco, T. Kinnunen, N. Evans, J. Yamagishi, K.-A. Lee.
Introduction to Voice Presentation Attack Detection and Recent Advances, in: Handbook of Biometric Anti-Spoofing: Presentation Attack Detection, S. Marcel, M. S. Nixo, J. Fierrez, N. Evans (editors), Advances in Computer Vision and Pattern Recognition, Springer, 2019, pp. 321-361.
https://hal.inria.fr/hal-01974528 -
58A. Tsukanova, B. Elie, Y. Laprie.
Articulatory Speech Synthesis from Static Context-Aware Articulatory Targets, in: Studies on Speech Production, Q. Fang, J. Dang, P. Perrier, J. Wei, L. Wang, N. Yan (editors), Lecture Notes in Computer Science, Springer, 2018, no 10733, pp. 37-47, Revised Selected Papers of the 11th International Seminar, ISSP 2017, Tianjin, China, October 16-19, 2017. [ DOI : 10.1007/978-3-030-00126-1_4 ]
https://hal.archives-ouvertes.fr/hal-01937950 -
59E. Vincent, S. Gannot, T. Virtanen.
Acoustics - Spatial properties, in: Audio source separation and speech enhancement, E. Vincent, T. Virtanen, S. Gannot (editors), Wiley, August 2018.
https://hal.inria.fr/hal-01881423 -
60E. Vincent, S. Gannot, T. Virtanen.
Introduction, in: Audio source separation and speech enhancement, E. Vincent, T. Virtanen, S. Gannot (editors), Wiley, August 2018.
https://hal.inria.fr/hal-01881422 -
61E. Vincent, T. Virtanen, S. Gannot.
Perspectives, in: Audio source separation and speech enhancement, E. Vincent, T. Virtanen, S. Gannot (editors), Wiley, August 2018.
https://hal.inria.fr/hal-01881424 -
62T. Virtanen, E. Vincent, S. Gannot.
Time-frequency processing - Spectral properties, in: Audio source separation and speech enhancement, E. Vincent, T. Virtanen, S. Gannot (editors), Wiley, August 2018.
https://hal.inria.fr/hal-01881426 -
63M. Zitt, A. Lelu, M. Cadot, G. Cabanac.
Bibliometric delineation of scientific fields, in: Handbook of Science and Technology Indicators, W. Glänzel, H. F. Moed, U. Schmoch, M. Thelwall (editors), Handbook of Science and Technology Indicators, Springer International Publishing, 2018. [ DOI : 10.1007/978-3-030-02511-3 ]
https://hal.archives-ouvertes.fr/hal-01942528
Books or Proceedings Editing
-
64E. Vincent, T. Virtanen, S. Gannot (editors)
Audio source separation and speech enhancement, Wiley, August 2018, 504 p. [ DOI : 10.1002/9781119279860 ]
https://hal.inria.fr/hal-01881431
Internal Reports
-
65M. Cadot, A. Lelu, M. Zitt.
Benchmarking seventeen clustering methods on a text dataset: Comparaison empirique de dix-sept méthodes de classification non-supervisée sur un corpus textuel, LORIA, March 2018, Version française en fichier complémentaire.
https://hal.archives-ouvertes.fr/hal-01532894
Patents
-
66S. Ouni, G. Gris.
Image processing device, March 2018, no US2018/0061109 A1.
https://hal.inria.fr/hal-01862639
Other Publications
-
67T. Kinnunen, R. G. Hautamäki, V. Vestman, M. Sahidullah.
Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection, 2018, (A slightly shorter version) has been submitted to IEEE ICASSP 2019.
https://hal.inria.fr/hal-01937767 -
68L. Perotin, R. Serizel, E. Vincent, A. Guérin.
CRNN-based multiple DoA estimation using Ambisonics acoustic intensity features, July 2018, Submited to the IEEE Journal of Selected Topics in Signal Processing, Special Issue on Acoustic Source Localization and Tracking in Dynamic Real-life Scenes.
https://hal.inria.fr/hal-01839883
-
69L. Sprenger-Charolles, P. Colé, D. Béchennec, A. Kipffer-Piquard.
French normative data on reading and related skills from EVALEC, a new computerized battery of tests (end Grade 1, Grade 2, Grade 3, and Grade 4), in: European Review of Applied Psychology / Revue Européenne de Psychologie Appliquée, 2005, no 55, pp. 157-186.
https://hal.inria.fr/inria-00184979