Discriminability of sound contrasts in the face of speaker variation quantified

COML Cognitive Machine Learning

Language, Speech and Audio

Perception, Cognition and Interaction

http://www.syntheticlearner.net Creation of the Team: 2017 May 04 Team A3.4.2. - Unsupervised learning A3.4.5. - Bayesian methods A3.4.6. - Neural networks A3.4.8. - Deep learning A5.7. - Audio modeling and processing A5.7.1. - Sound A5.7.3. - Speech A5.7.4. - Analysis A5.8. - Natural language processing A5.9. - Signal processing A5.9.1. - Sampling, acquisition A5.9.2. - Estimation, modeling A5.9.3. - Reconstruction, enhancement A5.9.4. - Signal processing over graphs A5.9.5. - Sparsity-aware processing A5.9.6. - Optimization tools A6.3.3. - Data processing A9.2. - Machine learning A9.3. - Signal analysis A9.4. - Natural language processing A9.6. - Decision support A9.7. - AI algorithmics B1.2. - Neuroscience and cognitive science B1.2.2. - Cognitive science Inria teams are typically groups of researchers working on the definition of a common project, and objectives, with the goal to arrive at the creation of a project-team. Such project-teams may include other partners (universities or research institutions). Emmanuel Dupoux Enseignant

Paris

Team leader, Ecole Normale Supérieure Paris, Professor, from May 2017 oui Bogdan Ludusan PostDoc

Paris

CNRS, until Oct 2017 Maria Julia Carbajal PhD

Paris

Ecole Normale Supérieure Paris Adriana Carolina Guevara Rukoz PhD

Paris

Ecole Normale Supérieure Paris Neil Zeghidour PhD

Paris

CIFRE Facebook; Ecole Normale Supérieure Paris Rahma Chaabouni PhD

Paris

CIFRE Facebook; Ecole Normale Supérieure Paris, from Sep 2017 Ronan Riochet PhD

Paris

Ecole Normale Supérieure Paris, from Sep 2017; part time with the WILLOW team Elin Larsen PhD

Paris

Ecole Normale Supérieure Paris, from Oct 2017 Mathieu Bernard Technique

Paris

Inria, from Nov 2017 Julien Karadayi Technique

Paris

Ecole Normale Supérieure Xuan Nga Cao Technique

Paris

Ecole Normale Supérieure Catherine Urban Assistant

Paris

Ecole Normale Supérieure Chantal Chazelas Assistant

Paris

Inria Overall Objectives Overall Objectives

Brain-inspired machine learning algorithms combined with big data have recently reached spectacular results, equalling or beating humans on specific high level tasks (e.g. the game of go). However, there are still a lot of domains in which even humans infants outperform machines: unsupervised learning of rules and language, common sense reasoning, and more generally, cognitive flexibility (the ability to quickly transfer competence from one domain to another one).

The aim of the Cognitive Computing team is to reverse engineer such human abilities, i.e., to construct effective and scalable algorithms which perform as well (or better) than humans, when provided with similar data, study their mathematical and algorithmic properties and test their empirical validity as models of humans by comparing their output with behavioral and neuroscientific data. The expected results are more adaptable and autonomous machine learning algorithm for complex tasks, and quantitative models of cognitive processes which can used to predict human developmental and processing data. Most of the work is focused on speech and language and common sense reasoning.

Research Program Background

In recent years, Artificial Intelligence (AI) has achieved important landmarks in matching or surpassing human level performance on a number of high level tasks (playing chess and go, driving cars, categorizing picture, etc., , , , , ). These strong advances were obtained by deploying on large amounts of data, massively parallel learning architectures with simple brain-inspired ‘neuronal’ elements. However, humans brains still outperform machines in several key areas (language, social interactions, common sense reasoning, motor skills), and are more flexible : Whereas machines require extensive expert knowledge and massive training for each particular application, humans learn autonomously over several time scales: over the developmental scale (months), humans infants acquire cognitive skills with noisy data and little or no expert feedback (weakly/unsupervised learning); over the short time scale (minutes, seconds), humans combine previously acquired skills to solve new tasks and apply rules systematically to draw inferences on the basis of extremely scarce data (learning to learn, domain adaptation, one- or zero-shot learning) .

The general aim of CoML, following the roadmap described in , is to bridge the gap in cognitive flexibility between humans and machines learning in language processing and common sense reasoning. We conduct work in three areas: weakly supervised and unsupervised algorithms, datasets and benchmarks, and machine intelligence evaluation.

Weakly/Unsupervised Learning

Much of standard machine learning is construed as regression or classification problems (mapping input data to expert-provided labels). Human infants rarely learn in this fashion, at least before going to school: they learn language, social cognition, and common sense autonomously (without expert labels) and when adults provide feedback, it is ambiguous and noisy and cannot be taken as a gold standard. Modeling or mimicking such achievement requires deploying unsupervised or weakly supervised algorithms which are less well known than their supervised counterparts.

We take inspiration from infant’s landmarks during their first years of life: they are able to learn acoustic models, a lexicon, and susbtantive elements of language models and world models from raw sensory inputs. Building on previous work , , , we use DNN and Bayesian architectures to model the emergence of linguistic representations without supervision. Our focus is to establish how the labels in supervised settings can be replaced by weaker signals coming either from multi-modal input or from hierarchically organised linguistic levels.

At the level of phonetic representations, we study how cross-modal information (lips and self feedback from articulation) can supplement top-down lexical information in a weakly supervised setting. We use siamese architectures or Deep CCA algorithms to combine the different views. We study how an attentional framework and uncertainty estimation can flexibly combine these informations in order to adapt to situations where one view is selectively degraded.

At the level of lexical representations, we study how audio/visual parallel information (ie. descriptions of images or activities) can help in segmenting and clustering word forms, and vice versa, help in deriving useful visual features. To achieve this, we will use architectures deployed in image captioning or sequence to sequence translation .

At the level of semantic and conceptual representations, we study how it is possible to learn elements of the laws of physics through the observation of videos (object permanence, solidity, spatio-temporal continuity, inertia, etc.), and how objects and relations between objects are mapped onto language.

Evaluating Machine Intelligence

Increasingly, complicated machine learning systems are being incorporated into real-life applications (e.g. self-driving cars, personal assistants), even though they cannot be formally verified, guaranteed statistically, nor even explained. In these cases, a well defined empirical approach to evaluation can offer interesting insights into the functioning and offer some control over these algorithms.

Several approaches exist to evaluate the 'cognitive' abilities of machines, from the subjective comparison of human and machine performance to application-specific metrics (e.g., in speech, word error rate). A recent idea consist in evaluating an AI system in terms of it's abilities , i.e., functional components within a more global cognitive architecture . Psychophysical testing can offer batteries of tests using simple tasks that are easy to understand by humans or animals (e.g, judging whether two stimuli are same or different, or judging whether one stimulus is ‘typical’) which can be made selective to a specific component and to rare but difficult or adversarial cases. Evaluations of learning rate, domain adaptation and transfer learning are simple applications of these measures. Psychophysically inspired tests have been proposed for unsupervised speech and language learning , .

Documenting human learning

Infants learn their first language in a spontaneous fashion, across a lot of variation in amount of speech and the nature of the infant/adult interaction. In some linguistic communities, adults barely address infants until they can themselves speak. Despite these large variations in quantity and content, language learning proceeds at similar paces. Documenting such resilience is an essential step in understanding the nature of the learning algorithms used by human infants. Hence, we propose to collect and/or analyse large datasets of inputs to infants and correlate this with outcome measure (phonetic learning, vocabulary growth, syntactic learning, etc.).

Application Domains Speech processing for underresourced languages

We plan to apply our algorithms for the unsupervised discovery of speech units to problems relevant to language documentation and the construction of speech processing pipelines for underresourced languages.

Tools for the analysis of naturalistic speech corpora

Daylong recordings of speech in the wild gives rise a to number of specific analysis difficulties. We plan to use our expertise in speech processing to develop tools for performing signal processing and helping annotation of such resources for the purpose of phonetic or linguistic analysis.

Highlights of the Year Highlights of the Year Awards New Software and Platforms abkhazia

Keywords: Speech recognition - Speech-text alignment

Functional Description: The Abkhazia sofware makes it easy to obtain simple baselines for supervised ASR (using Kaldi) and ABX tasks (using ABXpy) on the large corpora of speech recordings typically used in speech engineering, linguistics or cognitive science research.

Contact: Emmanuel Dupoux

URL: https://github.com/bootphon/abkhazia

TDE

Term Discovery Evaluation

Keywords: NLP - Speech recognition - Speech

Scientific Description: This toolbox allows the user to judge of the quality of a word discovery algorithm. It evaluates the algorithms on these criteria : - Boundary : efficiency of the algorithm to found the actual boundaries of the words - Group : efficiency of the algorithm to group similar words - Token/Type: efficiency of the algorithm to find all words from the corpus (types), and to find all occurences (token) of these words. - NED : Mean of the edit distance across all the word pairs found by the algorithm - Coverage : efficiency of the algorithm to find every discoverable phone in the corpus

Functional Description: Toolbox to evaluate algorithms that segment speech into words. It allows the user to evaluate the efficiency of algorithms to segment speech into words, and create clusters of similar words.

Contact: Emmanuel Dupoux

URL : https://github.com/bootphon/TDE

ABXpy

Keywords: Evaluation - Speech recognition - Machine learning

Functional Description: The ABX package gives a performance score to speech recognition systems by measuring their capacity to discriminate linguistic contrasts (accents, phonemes, speakers, etc...)

Contact: Emmanuel Dupoux

URL : https://github.com/bootphon/ABXpy

h5features

Keyword: File format

Functional Description: The h5features python package provides easy to use and efficient storage of large features data on the HDF5 binary file format.

Contact: Emmanuel Dupoux

URL : https://github.com/bootphon/h5features

New Results Development of cognitively inspired algorithms

Speech and language processing in humans infants and adults is particularly efficient. We use these as sources of inspiration for developing novel machine learning and speech technology algorithms. In this area, our results are as follows:

Recent works have explored deep architectures for learning multimodal speech representation (e.g. audio and images, articulation and audio) in a supervised way. In , we investigate the role of combining different speech modalities, i.e. audio and visual information representing the lips' movements, in a weakly-supervised way using Siamese networks and lexical same-different side information. In particular, we ask whether one modality can benefit from the other to provide a richer representation for phone recognition in a weakly supervised setting. We introduce mono-task and multi-task methods for merging speech and visual modalities for phone recognition. The mono-task learning consists in applying a Siamese network on the concatenation of the two modalities, while the multi-task learning receives several different combinations of modalities at train time. We show that multi-task learning enhances discriminability for visual and multimodal inputs while minimally impacting auditory inputs. Furthermore, we present a qualitative analysis of the obtained phone embeddings, and show that cross-modal visual input can improve the discriminability of phonetic features which are visually discernable (rounding, open/close, labial place of articulation), resulting in representations that are closer to abstract linguistic features than those based on audio only.

In , we explore the role of speech register and prosody for the task of word segmentation. Since these two factors are thought to play an important role in early language acquisition, we aim to quantify their contribution for this task. We study a Japanese corpus containing both infant- and adult-directed speech and we apply four different word segmentation models, with and without knowledge of prosodic boundaries. The results showed that the difference between registers is smaller than previously reported and that prosodic boundary information helps more adult- than infant-directed speech.

Phonemic segmentation of speech is a critical step of speech recognition systems. In , we propose a novel unsupervised algorithm based on sequence prediction mod- els such as Markov chains and recurrent neural networks. Our approach consists in analyzing the error profile of a model trained to predict speech features frame- by-frame. Specifically, we try to learn the dynamics of speech in the MFCC space and hypothesize boundaries from lo- cal maxima in the prediction error. We evaluate our system on the TIMIT dataset, with improvements over similar methods.

In , we describe a new challenge aimed at discovering subword and word units from raw speech. This challenge is the follow-up to the Zero Resource Speech Challenge 2015. It aims at constructing systems that generalize across languages and adapt to new speakers. The design features and evaluation metrics of the challenge are presented and the results of seventeen models are discussed.

Evaluation of AI algorithms

Machine learning algorithms are typically evaluated in terms of end-to-end tasks, but it is very often difficult to get a grasp of how they achieve these tasks, what could be their break point, and more generally, how they would compare to the algorithms used by humans to do the same tasks. This is especially true of Deep Learning systems which are particularly opaque. The team develops evaluation methods based on psycholinguistic/linguistic criteria, and deploy them for systematic comparison of systems.

What is the information captured by neural network models of language? In , we address this question in the case of character-level recurrent neural language models. These models do not have explicit word representations; do they acquire implicit ones? We assess the lexical capacity of a network using the lexical decision task common in psycholinguistics: the system is required to decide whether or not a string of characters forms a word. We explore how accuracy on this task is affected by the architecture of the network, focusing on cell type (LSTM vs. SRN), depth and width. We also compare these architectural properties to a simple count of the parameters of the network. The overall number of parameters in the network turns out to be the most important predictor of accuracy; in particular, there is little evidence that deeper networks are beneficial for this task.

Learnability relevant descriptions of linguistic corpora

Evidently, infants are acquiring their language based on whatever linguistic input is available around them. The extent of variation that can be found across languages, cultures and socio-economic background provides strong constraints (lower bounds on data, higher bounds on noise, and variation and ambiguity) for language learning algorithms.

In , we provide an estimation of how frequently, and from whom, children aged 0-11 years (Ns between 9 and 24) receive one-on-one verbal input among Tsimane forager-horticulturalists of lowland Bolivia. Analyses of systematic daytime behavioral observations reveal $<$ 1 min per daylight hour is spent talking to children younger than 4 years of age, which is 4 times less than estimates for others present at the same time and place. Adults provide a majority of the input at 0–3 years of age but not afterward. When integrated with previous work, these results reveal large cross-cultural variation in the linguistic experiences provided to young children. Consideration of more diverse human populations is necessary to build generalizable theories of language acquisition.

In , we provide a new measure of how the acoustic realizations of a given phonetic segment are affected by coarticulation with the preceding and following phonetic context. While coarticulation has been extensively studied using descriptive phonetic measurements, little is known about the functional impact of coarticulation for speech processing, and in particular, learnability. Here, we use DTW-based similarity defined on raw acoustic features and ABX scores to derive a measure of the effect of coarticulation on phonetic discriminability. This measure does not rely on defining segment-specific phonetic cues (formants, duration, etc.) and can be applied systematically and automatically to any segment in large scale corpora. We illustrate our method using stimuli in English and Japanese. We replicate some well-known results, i.e., stronger anticipatory than perseveratory coarticulation and stronger coarticulation for lax/short vowels than for tense/long vowels. We then quantify for the first time the impact of coarticulation across different segment types (like vowels and consonants).

Test of the psychological validity of AI algorithms.

In this section, we focus on the utilisation of machine learning algorithms of speech and language processing to derive testable quantitative predictions in humans (adults or infants).

In we aim to quantify the relative contributions of phonetic categories and acoustic detail on phonotactically induced perceptual vowel epenthesis in Japanese listeners. A vowel identification task tested whether a vowel was perceived within illegal consonant clusters and, if so, which vowel was heard. Cross-spliced stimuli were used in which vowel coarticulation present in the cluster did not match the quality of the flanking vowel. Two clusters were used, /hp/ and /kp/, the former containing larger amounts of resonances of the preceding vowel. While both flanking vowel and coarticulation influenced vowel quality, the influence of coarticulation was larger, especially for /hp/.

In , we explore the well documented example of vowel epenthesis, a phenomenon in which non-existent vowels are hallucinated by listeners, for stimuli containingr illegal consonantal sequences. As reported in previous work, this occurs in Japanese (JP) and Brazilian Portuguese (BP), languages for which the 'default' epenthetic vowels are /u/ and /i/, respectively. In a perceptual experiment, we corroborate the finding that the quality of this illusory vowel is language-dependent, but also that this default choice can be overridden by coarticulatory information present on the consonant cluster. In a second step, we analyse recordings of JP and BP speakers producing 'epenthesized' versions of stimuli from the perceptual task. Results reveal that the default vowel corresponds to the vowel with the most reduced acoustic characteristics, also the one for which formants are acoustically closest to formant transitions present in consonantal clusters. Lastly, we model behavioural responses from the perceptual experiment with an exemplar model using dynamic time warping (DTW)-based similarity measures on MFCCs.

A range of computational approaches have been used to model the discovery of word forms from continuous speech by infants. Typically, these algorithms are evaluated with respect to the ideal 'gold standard' word segmentation and lexicon. These metrics assess how well an algorithm matches the adult state, but may not reflect the intermediate states of the child's lexical development. In , we set up a new evaluation method based on the correlation between word frequency counts derived from the application of an algorithm onto a corpus of child-directed speech, and the proportion of infants knowing the words according to parental reports. We evaluate a representative set of 4 algorithms, applied to transcriptions of the Brent corpus, which have been phonologized using either phonemes or syllables as basic units. Results show remarkable variation in the extent to which these 8 algorithm-unit combinations predicted infant vocabulary, with some of these predictions surpassing those derived from the adult gold standard segmentation. We argue that infant vocabulary prediction provides a useful complement to traditional evaluation; for example, the best predictor model was also one of the worst in terms of segmentation score, and there was no clear relationship between token or boundary F-score and vocabulary prediction.

A central assumption of most computational models of language acquisition is the reliance on statistical processes. This would predict that the frequency of particular sounds or contrasts in a given language should have a massive effect on perception. Surprisingly , this has not up to now been put to empirical test. In , we elucidated indicators of frequency-dependent perceptual attunement in the brain of 5–8-month-old Dutch infants. We tested the' discrimination of tokens containing a highly frequent [haet-he:t] and a highly infrequent [hYt-hø:t] native vowel contrast as well as a non-native [ht̂-hæt] vowel contrast in a behavioral visual habituation paradigm (Experiment 1). Infants discriminated both native contrasts similarly well, but did not discriminate the non-native contrast. We sought further evidence for subtle differences in the processing of the two native contrasts using near-infrared spectroscopy and a within-participant design (Experiment 2). The neuroimaging data did not provide additional evidence that responses to native contrasts are modulated by frequency of exposure. These results suggest that even large differences in exposure to a native contrast may not directly translate to behavioral and neural indicators of perceptual attunement, raising the possibility that frequency of exposure does not influence improvements in discriminating native contrasts.

Bilateral Contracts and Grants with Industry Bilateral Grants with Industry

Grant from MSR (Zero Resources Challenge, 2017) - 5K€

AWS Grant (Zero Resources Challenge, 2017) - 20K€

Partnerships and Cooperations Regional Initiatives

Collaboration with the Willow Team:

co-advising with J. Sivic and I. Laptev of a PhD student: Ronan Riochet.

construction of a naive physics benchmark

National Initiatives ANR

Transatlantic Platform "Digging into Data". Title: "Analysis of Children’s Language Experiences Around the World. (ACLEW)"; (coordinating PI : M. Soderstrom; Leader of tools development and co-PI : E. Dupoux), (2017–2020. 5 countries; Total budget: 1.4M€)

European Initiatives FP7 & H2020 Projects

ERC Advanced Grant (BOOTPHON, PI: E. Dupoux, Budget 2.4M€).

International Initiatives Informal International Partners

Johns Hopkins University, Baltimore, USA: S. Kudanpur, H. Hermanksy

RIKEN Institute, Tokyo, Japan: R. Mazuka

International Research Visitors Visits of International Scientists

Valentina Gliozzi (Professor, Univ. di Torino, Visiting Professor Spring 2017)

Dissemination Promoting Scientific Activities Scientific Events Organisation General Chair, Scientific Chair

Zero Resource Challenge 2017, held as a special session of EEE ASRU 2017, Okinawa.

Member of the Organizing Committees

Executive committee of SIGMORPHON (Association for Computational Linguistics Special Interest Group, http://www.sigmorphon.org/).

Scientific Events Selection Reviewer

Invited editor for international conferences: Interspeech, NIPS, ACL, etc. (around 5-10 papers per conferences, 2 conferences per year)

Journal Member of the Editorial Boards

Member of the editorial board of: Mathématiques et Sciences Humaines, L'Année Psychologique, Frontiers in Psychology.

Reviewer - Reviewing Activities

Invited Reviewer for Frontiers in Psychology, Cognitive Science, Cognition, Transactions in Acoustics Signal Processing and Language, Speech Communication, etc. (around 4 papers per year)

Invited Talks

Learning in Machines and Brains (CIFAR) invited talk, 2017, Paris.

CBMM (Center for Brain Mind and Machine) Workshop on Speech representation, perception and recognition. Invited talk. Feb 02-03, 2017, MIT

Scientific Expertise

E. Dupoux is invited expert for ERC, ANR, and other granting agencies (around 2 per year).

Research Administration

Executive committee of the Foundation Cognition, the research programme IRIS-PSL "Sciences des Données et Données des Sciences", the industrial chair Almerys (2016-) and the collective organization DARCLE (www.darcle.org).

Teaching - Supervision - Juries Teaching

Licence : E. Dupoux, "Introduction to the cognitive science of language", 8h, L2, PSL, France

Master : E. Dupoux, "Theoretical Cognitive Science: Connections and symbols", 8h, M1/M2, PSL,Paris 5, Paris France

Master : E. Dupoux, "Cognitive Engineering", 80h, M2, ITI-PSL, Paris France

Doctorat : E. Dupoux, "Computational models of cognitice development", 32 h, Séminaire EHESS, Paris France

Supervision

Six PhD theses are currently conducted in the team. Two are programmed to be defended in september 2018.

Juries

E. Dupoux participated in the PhD jury of Martin Felipe Perez-Guevara on novembre 29th, 2017 at UPMC (supervisor: C. Pallier).

Popularization

E. Dupoux talked in two general public conferences on speech recognition, one organised par France is AI (oct 17, 2017) and the other by Paris Sciences& Data (dec 7, 2017). The public was composed of entrepreneurs in machine learning. He also gave a training course in speech and language technology organised by the Institut de l'Ecole Normale (june 21) aimed at information technology professionals. He gave three interview on the limits of deep learning in general public outlets (Le Monde, La Recherche http://www.larecherche.fr/lintelligence-artificielle-a-lassaut-des-labos, Usine Nouvelle).

N. Zeghidour did a high level presentation of AI to 30 students and 8 teachers from Sciences-Po and Ecole 42 during the Policy Innovation Lab http://www.sciencespo.fr/public/en/policy-lab/public-policy-incubator following the visit of Facebook's COO. He has been a Mentor (technical advisor) for 9 European start-ups during a pitching event for the IBM Watson AI Xprize https://ai.xprize.org/. He animated a booth on AI at the Platform Meetup Paris of Facebook. He gave an interview to Libération included in a 100 page special issue on AI http://www.liberation.fr/voyage-au-coeur-de-lIA/2017/12/20/en-kiosque-notre-hors-serie-voyage-au-coeur-de-l-ia_1617896. He taught a class on Speech Recognition as part of the Facebook AI Masterclass, a class for developers located at Ecole 42 and broadcasted in developer circles over 25 cities of Europe, Middle-Est and Africa with 850 people attending, followed by a live Q&A session with several of these cities over video-conference.

Discriminability of sound contrasts in the face of speaker variation quantified Christina Bergmann C. Alejandrina Cristia A. Emmanuel Dupoux E. Proceedings of the 38th Annual Conference of the Cognitive Science Society 2016 1331-1336 The role of causal and intentional reasoning in moral judgment in individuals with High Functioning Autism Marine Buon M. Emmanuel Dupoux E. Pierre Jacob P. Pauline Chaste P. Marion Leboyer M. Tiziana Zalla T. Journal of Autism and Developmental Disorders 43 2 2013 458-70 A non-mentalistic cause-based heuristic in human social evaluations M. Buon M. P. Jacob P. E. Loissel E. E. Dupoux E. Cognition 126 2 2013 149-155 Friend or foe? Early social evaluation of human interactions Marine Buon M. Pierre Jacob P. Sylvie Margules S. Isabelle Brunet I. Michel Dutat M. Dominique Cabrol D. Emmanuel Dupoux E. PloS One 9 2 2014 e88612 The 'Language Filter' Hypothesis: Modeling Language Separation in Infants using I-vectors Julia Carbajal J. Ahmad Dawud A. Roland Thiollière R. Emmanuel Dupoux E. EPIROB 2016 2016 195-201 Modeling language discrimination in infants using i-vector representations Julia Carbajal J. Radek Fér R. Emmanuel Dupoux E. Proceedings of the 38th Annual Conference of the Cognitive Science Society 2016 889-896 The second person in `I'-`you'-`it' triadic interactions Laurent Cleret de Langavant L. Charlotte Jacquemot C. Anne-Catherine Bachoud-Lévi A.-C. Emmanuel Dupoux E. Behavioral and Brain Sciences 36 416-417 2013 An online database of infant functional Near InfraRed Spectroscopy studies: A community-augmented systematic review. Alejandrina Cristia A. Emmanuel Dupoux E. Y. Hakuna Y. Sarah Lloyd-Fox S. M. Schuetze M. J. Kivits J. T. Bergvelt T. M. van Gelder M. Lucas Filippin L. Sylvain Charron S. Yasuyo Minagawa-Kawai Y. PLoS One 8 3 2013 e58906 Neural correlates of infant dialect discrimination: A fNIRS study Alejandrina Cristia A. Yasuyo Minagawa-Kawai Y. Natalia Egorova N. Judit Gervain J. Luca Filippin L. Dominique Cabrol D. Emmanuel Dupoux E. Developmental Science 17 4 2014 628-635 Responses to vocalizations and auditory controls in the human newborn brain Alejandrina Cristia A. Yasuyo Minagawa-Kawai Y. Inga Vendelin I. Dominique Cabrol D. Emmanuel Dupoux E. Plos One 9 12 2014 e115162 COMT Val158Met Polymorphism Modulates Huntington's Disease Progression Ruth de Diego-Balaguer R. Catherine Schramm C. Isabelle Rebeix I. Emmanuel Dupoux E. Alexandra Durr A. Alexis Brice A. Perrine Charles P. Laurent Cleret de Langavant L. Katia Youssov K. Christophe Verny C. Vincent Damotte V. Jean-Philippe Azulay J.-P. Cyril Goizet C. Clémence Simonin C. Christine Tranchant C. Patrick Maison P. Amandine Rialland A. David Schmitz D. Charlotte Jacquemot C. Bertrand Fontaine B. Anne-Catherine Bachoud-Lévi A.-C. Plos One 11 9 2016 e0161106 Geometric constraints on human speech sound inventories Ewan Dunbar E. Emmanuel Dupoux E. Frontiers in Psychology 7 1061 2016 Quantitative methods for comparing featural representations Ewan Dunbar E. Gabriel Synnaeve G. Emmanuel Dupoux E. ICPhS 2015 paper number 1024 Towards Quantitative Studies of Early Cognitive Development Emmanuel Dupoux E. Autonomous Mental Development Technical Committee Newsletter 11 1 2014 10-11 Category Learning in Songbirds: top-down effects are not unique to humans Emmanuel Dupoux E. Current Biology 25 16 2015 R718-R720 Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner E. Dupoux E. Cognition 2018 WhyisEnglishsoeasytosegment Abdellah Fourtassi A. Benjamin Boerschinger B. Mark Johnson M. Emmanuel Dupoux E. Proceedings of the 4th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2013) Sofia, Bulgaria ACL 2013 1-10 Self Consistency as an Inductive Bias in Early Language Acquisition Abdellah Fourtassi A. Ewan Dunbar E. Emmanuel Dupoux E. Proceedings of the 36th Annual Meeting of the Cognitive Science Society CogSci 2014 469-474 A corpus-based evaluation method for Distributional Semantic Models Abdellah Fourtassi A. Emmanuel Dupoux E. Proceedings of ACL-SRW 2013 Sofia, Bulgaria ACL 2013 165-171 A Rudimentary Lexicon and Semantics Help Bootstrap Phoneme Acquisition Abdellah Fourtassi A. Emmanuel Dupoux E. Proceedings of the 18th Conference on Computational Natural Language Learning (CoNLL) Baltimore, Maryland USA Association for Computational Linguistics June 2014 191-200 The role of word-word co-occurrence in word learning Abdellah Fourtassi A. Emmanuel Dupoux E. Proceedings of the 38th Annual Conference of the Cognitive Science Society 2016 662-667 Exploring the Relative Role of Bottom-up and Top-down Information in Phoneme Learning Abdellah Fourtassi A. Thomas Schatz T. Balakrishnan Varadarajan B. Emmanuel Dupoux E. Proceedings of the 52nd Annual meeting of the ACL Baltimore, Maryland 2 Association for Computational Linguistics ACL 2014 1-6 Priming Children's Use of Intentions in Moral Judgement with Metacognitive Training Kararina Gvozdic K. Sylvain Moutier S. Emmanuel Dupoux E. Marine Buon M. Frontiers in Language Sciences 7 190 2016 Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek memorial workshop in Prague Hynek Hermansky H. Lukas Burget L. Jordan Cohen J. Emmanuel Dupoux E. Naomi Feldman N. John Godfrey J. Sanjeev Khudanpur S. Matthew Maciejewski M. Sri Harish Mallidi S. H. Anjali Menon A. Tetsuji Ogawa T. Vijayaditya Peddinti V. Richard Rose R. Richard Stern R. Matthew Wiesner M. Karel Vesely K. ICASSP-2015 (IEEE International Conference on Acoustics Speech and Signal Processing) Brisbane, Australia 19-24 April 2015 5009-5013 A summary of the 2012 JH CLSP Workshop on zero resource speech technologies and models of early language acquisition Aren Jansen A. Emmanuel Dupoux E. Sharon Goldwater S. Mark Johnson M. Sanjeev Khudanpur S. Kenneth Church K. Naomi Feldman N. Hynek Hermansky H. Florian Metze F. Richard Rose R. Michael Seltzer M. Pascal Clark P. Ian McGraw I. Balakrishnan Varadarajan B. Erin Bennett E. Benjamin Borschinger B. Justin Chiu J. Ewan Dunbar E. Abdellah Fourtassi A. David Harwath D. Chia-ying Lee C.-y. Keith Levin K. Atta Norouzian A. Vijayaditya Peddinti V. Rachael Richardson R. Thomas Schatz T. Samuel Thomas S. ICASSP-2013 (IEEE International Conference on Acoustics Speech and Signal Processing) Vancouver, BC, Canada IEEE May 2013 8111-8115 Modelling function words improves unsupervised word segmentation Mark Johnson M. Anne Christophe A. K.D. Demuth K. Emmanuel Dupoux E. Proceedings of the 52nd Annual meeting of the ACL ACL 2014 282–292 Sign constraints on feature weights improve a joint model of word segmentation and phonology Mark Johnson M. Joe Pater J. Robert Staub R. Emmanuel Dupoux E. NAACL HLT 2015 ACL 2015 303-313 Assessing the ability of LSTMs to learn syntax-sensitive dependencies Tal Linzen T. Emmanuel Dupoux E. Yoav Goldberg Y. Transactions of the Association for Computational Linguistics 4 2016 521-535 Quantificational features in distributional word representations Tal Linzen T. Emmanuel Dupoux E. Benjamin Spector B. Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics 2016 1-11 Exploring multi-language resources for unsupervised spoken term discovery Bogdan Ludusan B. Alexandru Caranica A. Horia Cucu H. Andi Buzo A. Corneliu Burileanu C. Emmanuel Dupoux E. Speech Technology and Human-Computer Dialogue (SpeD), 2015 International Conference on 2015 1-6 Learnability of prosodic boundaries: Is infant-directed speech easier? Bogdan Ludusan B. Alejandrina Cristia A. Andrew Martin A. Reiko Mazuka R. Emmanuel Dupoux E. Journal of the Acoustical Society of America 140 2 2016 1239-1250 Towards Low Resource Prosodic Boundary Detection Bogdan Ludusan B. Emmanuel Dupoux E. Proceedings of International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU'14) St Petersburg, Russia SLTU May 14-16 2014 231-237 A multilingual study on intensity as a cue for marking prosodic boundaries Bogdan Ludusan B. Emmanuel Dupoux E. ICPhS 2015 e982 Automatic syllable segmentation using broad phonetic class information Bogdan Ludusan B. Emmanuel Dupoux E. SLTU-2016 Procedia Computer Science 81 2016 101-106 The role of prosodic boundaries in word discovery: Evidence from a computational model Bogdan Ludusan B. Emmanuel Dupoux E. Journal of the Acoustical Society of America 140 1 2016 EL1 Incorporating Prosodic Boundaries in Unsupervised Term Discovery Bogdan Ludusan B. Guillaume Gravier G. Emmanuel Dupoux E. Proceedings of Speech Prosody Dublin, Ireland 7 2014 939-943 Rhythm-Based Syllabic Stress Learning without Labelled Data Bogdan Ludusan B. Antonio Origlia A. Emmanuel Dupoux E. Proceedings of Statistical Language and Speech Processing -SLSP 2015 2015 185-196 Motif discovery in infant- and adult-directed speech Bogdan Ludusan B. Amanda Seidl A. Emmanuel Dupoux E. Alejandrina Cristia A. Proceedings of CogACLL2015 ACL 2015 93-102 Prosodic boundary information helps unsupervised word segmentation Bogdan Ludusan B. Gabriel Synnaeve G. Emmanuel Dupoux E. NAACL HLT 2015 ACL 2015 953-963 Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems Bogdan Ludusan B. Maarten Versteegh M. Aren Jansen A. Guillaume Gravier G. Xuan-Nga Cao X.-N. Mark Johnson M. Emmanuel Dupoux E. Proceedings of LREC 2014 Some place LREC 2014 560-567 Learning Phonemes with a Proto-lexicon Andrew Martin A. Sharon Peperkamp S. Emmanuel Dupoux E. Cognitive Science 37 2013 103-124 Mothers speak less clearly to infants: A comprehensive test of the hyperarticulation hypothesis Andrew Martin A. Thomas Schatz T. Maarten Versteegh M. Kouki Miyazawa K. Reiko Mazuka R. Emmanuel Dupoux E. Alejandrina Cristia A. Psychological Science 26 3 2015 341-347 Salient dimensions in implicit phonotactic learning Elise Michon E. Emmanuel Dupoux E. Alejandrina Cristia A. INTERSPEECH-2015 2015 2665-2669 Insights on NIRS sensitivity from a cross-linguistic study on the emergence of phonological grammar Yasuyo Minagawa-Kawai Y. Alejandrina Cristia A. Bria Long B. Inga Vendelin I. Yoko Hakuno Y. Michel Dutat M. Luca Filippin L. Dominique Cabrol D. Emmanuel Dupoux E. Frontiers in Language Sciences 4 170 April 16 2013 Nonwords, nonwords, nonwords: Evidence for a proto-lexicon during the first year of life Céline Ngon C. Andrew Martin A. Emmanuel Dupoux E. Dominique Cabrol D. Sharon Peperkamp S. Developmental Science 16 1 2013 24-34 A new efficient measure for accuracy prediction and its application to multistream-based unsupervised adaptation Tetsuji Ogawa T. Sri Harish Mallidi S. H. Emmanuel Dupoux E. Jordan Cohen J. Naomi Feldman N. Hynek Hermansky H. ICPR 2016 Evaluating speech features with the Minimal-Pair ABX task: Analysis of the classical MFC/PLP pipeline Thomas Schatz T. Vijayaditya Peddinti V. Francis Bach F. Aren Jansen A. Hermansky Hynek H. Emmanuel Dupoux E. INTERSPEECH-2013 Lyon, France International Speech Communication Association 2013 1781-1785 Evaluating speech features with the Minimal-Pair ABX task (II): Resistance to noise Thomas Schatz T. Vijayaditya Peddinti V. Xuan-Nga Cao X.-N. Francis Bach F. Hermansky Hynek H. Emmanuel Dupoux E. INTERSPEECH-2014 International Speech Communication Association 2014 915-919 Unsupervised word segmentation in context Gabriel Synnaeve G. Isabelle Dautriche I. Benjamin Boerschinger B. Mark Johnson M. Emmanuel Dupoux E. Proceedings of 25th International Conference on Computational Linguistics (CoLing) CoLing 2014 2326-2334 In Depth Deep Beliefs Networks for Phone Recognition Gabriel Synnaeve G. Emmanuel Dupoux E. Poster presented in NIPS-2013 NIPS 2013 Weakly Supervised Multi-Embeddings Learning of Acoustic Models Gabriel Synnaeve G. Emmanuel Dupoux E. ICLR Workshop 2015 ArXiv 1412.6645 [cs.SD] A temporal coherence loss function for learning unsupervised acoustic embeddings Gabriel Synnaeve G. Emmanuel Dupoux E. SLTU-2016 Procedia Computer Science 81 ISCA-ITRW 2016 95-100 Phonetics embedding learning with side information Gabriel Synnaeve G. Thomas Schatz T. Emmanuel Dupoux E. IEEE Spoken Language Technology Workshop IEEE 2014 106 - 111 Learning words from images and speech Gabriel Synnaeve G. Maarten Versteegh M. Emmanuel Dupoux E. NIPS Workshop on Learning Semantics Montreal, Canada 2014 A Hybrid Dynamic Time Warping-Deep Neural Network Architecture for Unsupervised Acoustic Modeling Roland Thiollière R. Ewan Dunbar E. Gabriel Synnaeve G. Maarten Versteegh M. Emmanuel Dupoux E. INTERSPEECH-2015 2015 3179-3183 The Zero Resource Speech Challenge 2015: Proposed Approaches and Results Maarten Versteegh M. Xavier Anguera X. Aren Jansen A. Emmanuel Dupoux E. SLTU-2016 Procedia Computer Science 81 ISCA-ITRW 2016 67-72 The Zero Resource Speech Challenge 2015 Maarten Versteegh M. Roland Thiollière R. Thomas Schatz T. Xuan-Nga Cao X.-N. Xavier Anguera X. Aren Jansen A. Emmanuel Dupoux E. INTERSPEECH-2015 2015 3169-3173 Joint Learning of Speaker and Phonetic Similarities with Siamese Networks Neil Zeghidour N. Gabriel Synnaeve G. Nicolas Usunier N. Emmanuel Dupoux E. INTERSPEECH-2016 ISCA 2016 1295-1299 A Deep Scattering Spectrum - Deep Siamese Network Pipeline For Unsupervised Acoustic Modeling Neil Zeghidour N. Gabriel Synnaeve G. Maarten Versteegh M. Emmanuel Dupoux E. ICASSP-2016 IEEE 2016 4965-4969 The Zero Resource Speech Challenge 2017 Ewan Dunbar E. Xuan-Nga Cao X.-N. Juan Benjumea J. Julien Karadayi J. Mathieu Bernard M. Laurent Besacier L. Xavier Anguera X. Emmanuel Dupoux E. 2017 https://hal.inria.fr/hal-01687504 https://arxiv.org/abs/1712.04313 - IEEE ASRU (Automatic Speech Recognition and Understanding) 2017. Okinawa, Japan Child-Directed Speech Is Infrequent in a Forager-Farmer Population: A Time Allocation Study Alejandrina Cristia A. Emmanuel Dupoux E. Michael Gurven M. Jonathan Stieglitz J. 0009-3920 Child Development 2017 https://hal.inria.fr/hal-01687336 Which epenthetic vowel? Phonetic categories versus acoustic detail in perceptual vowel epenthesis Adriana Guevara-Rukoz A. Isabelle Lin I. Masahiro Morii M. Yasuyo Minagawa-Kawai Y. Emmanuel Dupoux E. Sharon Peperkamp S. 0001-4966 Journal of the Acoustical Society of America 142 2 August 2017 EL211 - EL217 https://hal.inria.fr/hal-01687489 The more, the better? Behavioral and neural correlates of frequent and infrequent vowel exposure Sho Tsuji S. Paula Fikkert P. Yasuyo Minagawa-Kawai Y. Emmanuel Dupoux E. Luca Filippin L. Maarten Versteegh M. Peter Hagoort P. Alejandrina Cristia A. 0012-1630 Developmental Psychobiology 59 5 July 2017 603 - 612 https://hal.inria.fr/hal-01687403 Learning Weakly Supervised Multimodal Phoneme Embeddings Rahma Chaabouni R. Ewan Dunbar E. Neil Zeghidour N. Emmanuel Dupoux E. Interspeech 2017 Stockholm, Sweden ISCA 2017 https://hal.inria.fr/hal-01687415 Annual Conference of the International Speech Communication Association 18 INTERSPEECH Predicting Epenthetic Vowel Quality from Acoustics Adriana Guevara-Rukoz A. Erika Parlato-Oliveira E. Shi Yu S. Yuki Hirose Y. Sharon Peperkamp S. Emmanuel Dupoux E. Interspeech 2017 Stockholm, Sweden ISCA 2017 https://hal.inria.fr/hal-01687378 Annual Conference of the International Speech Communication Association 18 INTERSPEECH Relating Unsupervised Word Segmentation to Reported Vocabulary Acquisition Elin Larsen E. Emmanuel Dupoux E. Alejandrina Cristia A. Interspeech 2017 Stockholm, Sweden ISCA 2017 https://hal.inria.fr/hal-01687534 Annual Conference of the International Speech Communication Association 18 INTERSPEECH Comparing Character-level Neural Language Models Using a Lexical Decision Task Gaël Le Godais G. Tal Linzen T. Emmanuel Dupoux E. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers Valencia, Spain Association for Computational Linguistics April 2017 https://hal.inria.fr/hal-01687583 Annual Meeting of the Association for Computational Linguistics 55 ACL The Role of Prosody and Speech Register in Word Segmentation: A Computational Modelling Perspective Bogdan Ludusan B. Reiko Mazuka R. Mathieu Bernard M. Alejandrina Cristia A. Emmanuel Dupoux E. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) Vancouver, Canada Association for Computational Linguistics July 2017 https://hal.inria.fr/hal-01687451 Annual Meeting of the Association for Computational Linguistics 55 ACL Blind Phoneme Segmentation With Temporal Prediction Errors Michel Paul M. Okko Rasanen O. Roland Thiollière R. Emmanuel Dupoux E. Proceedings of ACL 2017, Student Research Workshop Vancouver, Canada Association for Computational Linguistics July 2017 https://hal.inria.fr/hal-01687524 Annual Meeting of the Association for Computational Linguistics 55 ACL https://arxiv.org/abs/1608.00508 A Quantitative Measure of the Impact of Coarticulation on Phone Discriminability Thomas Schatz T. Rory Turnbull R. Francis Bach F. Emmanuel Dupoux E. Interspeech 2017 Stockholm, Sweden ISCA 2017 https://hal.inria.fr/hal-01687436 Annual Conference of the International Speech Communication Association 18 INTERSPEECH Introduction to “this is watson” David A Ferrucci D. A. IBM Journal of Research and Development 56 3.4 2012 1–1 Delving deep into rectifiers: Surpassing human-level performance on imagenet classification Kaiming He K. Xiangyu Zhang X. Shaoqing Ren S. Jian Sun J. Proceedings of the IEEE International Conference on Computer Vision 2015 1026–1034 Computer models solving intelligence test problems: Progress and implications José Hernández-Orallo J. Fernando Martínez-Plumed F. Ute Schmid U. Michael Siebers M. David L Dowe D. L. Artificial Intelligence 230 2016 74–107 Building machines that learn and think like people Brenden M Lake B. M. Tomer D Ullman T. D. Joshua B Tenenbaum J. B. Samuel J Gershman S. J. arXiv preprint arXiv:1604.00289 2016 Surpassing human-level face verification performance on LFW with GaussianFace Chaochao Lu C. Xiaoou Tang X. arXiv preprint arXiv:1404.3840 2014 A partial implementation of the BICA cognitive decathlon using the Psychology Experiment Building Language (PEBL) Shane T Mueller S. T. International Journal of Machine Consciousness 2 02 2010 273–288 Mastering the game of Go with deep neural networks and tree search David Silver D. Aja Huang A. Chris J. Maddison C. J. Arthur Guez A. Laurent Sifre L. George van den Driessche G. Julian Schrittwieser J. Ioannis Antonoglou I. Veda Panneershelvam V. Marc Lanctot M. Sander Dieleman S. Dominik Grewe D. John Nham J. Nal Kalchbrenner N. Ilya Sutskever I. Timothy Lillicrap T. Madeleine Leach M. Koray Kavukcuoglu K. Thore Graepel T. Demis Hassabis D. Nature 529 7587 2016 484–489 Sequence to sequence learning with neural networks Ilya Sutskever I. Oriol Vinyals O. Quoc V Le Q. V. Advances in neural information processing systems 2014 3104–3112 Computing machinery and intelligence Alan M. Turing A. M. Mind 59 236 1950 433–460 Achieving human parity in conversational speech recognition Wayne Xiong W. Jasha Droppo J. Xuedong Huang X. Frank Seide F. Mike Seltzer M. Andreas Stolcke A. Dong Yu D. Geoffrey Zweig G. arXiv preprint arXiv:1610.05256 2016