Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–20 of 20 results for author: Dunbar, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.01515  [pdf, other

    cs.CL cs.SD eess.AS

    Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training

    Authors: Sean Robertson, Ewan Dunbar

    Abstract: It has been generally assumed in the automatic speech recognition (ASR) literature that it is better for models to have access to wider context windows. Yet, many of the potential reasons this might be true in the supervised setting do not necessarily transfer over to the case of unsupervised learning. We investigate how much context is necessary to achieve high-quality pre-trained acoustic models… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: Repository at https://github.com/sdrobert/scpc. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

    ACM Class: I.2.7

  2. arXiv:2310.03018  [pdf, other

    eess.AS cs.CL cs.SD

    Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages

    Authors: Kuan-Po Huang, Chih-Kai Yang, Yu-Kuan Fu, Ewan Dunbar, Hung-yi Lee

    Abstract: We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders. We showcase a baseline system of language modeling on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed in a zero-resource manner. Our experiments encompass a variety of well-known speech enco… ▽ More

    Submitted 18 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by ICASSP 2024 (v2)

  3. arXiv:2210.15775  [pdf, other

    cs.CL cs.SD eess.AS

    Evaluating context-invariance in unsupervised speech representations

    Authors: Mark Hallap, Emmanuel Dupoux, Ewan Dunbar

    Abstract: Unsupervised speech representations have taken off, with benchmarks (SUPERB, ZeroSpeech) demonstrating major progress on semi-supervised speech recognition, speech synthesis, and speech-only language modelling. Inspiration comes from the promise of ``discovering the phonemes'' of a language or a similar low-bitrate encoding. However, one of the critical properties of phoneme transcriptions is cont… ▽ More

    Submitted 30 May, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: INTERSPEECH 2023

  4. arXiv:2210.15759  [pdf, other

    cs.CL cs.SD eess.AS

    Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge

    Authors: Ewan Dunbar, Nicolas Hamilakis, Emmanuel Dupoux

    Abstract: Recent progress in self-supervised or unsupervised machine learning has opened the possibility of building a full speech processing system from raw audio without using any textual representations or expert labels such as phonemes, dictionaries or parse trees. The contribution of the Zero Resource Speech Challenge series since 2015 has been to break down this long-term objective into four well-defi… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Journal ref: Journal: IEEE Journal of Selected Topics in Signal Processing Publication Date: OCTOBER 2022 Volume: 16, Issue: 6 On Page(s): 1211-1226 Print ISSN: 1932-4553 Online ISSN: 1941-0484 Digital Object Identifier: 10.1109/JSTSP.2022.3206084

  5. arXiv:2210.02956  [pdf, other

    cs.CL

    Are word boundaries useful for unsupervised language learning?

    Authors: Tu Anh Nguyen, Maureen de Seyssel, Robin Algayres, Patricia Roze, Ewan Dunbar, Emmanuel Dupoux

    Abstract: Word or word-fragment based Language Models (LM) are typically preferred over character-based ones in many downstream applications. This may not be surprising as words seem more linguistically relevant units than characters. Words provide at least two kinds of relevant information: boundary information and meaningful units. However, word boundary information may be absent or unreliable in the case… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: This is an archived version from September 2020

  6. arXiv:2206.01685  [pdf, other

    q-bio.NC cs.AI cs.CL

    Toward a realistic model of speech processing in the brain with self-supervised learning

    Authors: Juliette Millet, Charlotte Caucheteux, Pierre Orhan, Yves Boubenec, Alexandre Gramfort, Ewan Dunbar, Christophe Pallier, Jean-Remi King

    Abstract: Several deep neural networks have recently been shown to generate activations similar to those of the brain in response to the same input. These algorithms, however, remain largely implausible: they require (1) extraordinarily large amounts of data, (2) unobtainable supervised labels, (3) textual rather than raw sensory input, and / or (4) implausibly large memory (e.g. thousands of contextual wor… ▽ More

    Submitted 20 March, 2023; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: Accepted to NeurIPS 2022

    Journal ref: Neural Information Processing Systems (NeurIPS), 2022

  7. arXiv:2205.15823  [pdf, other

    cs.CL cs.SD eess.AS

    Predicting non-native speech perception using the Perceptual Assimilation Model and state-of-the-art acoustic models

    Authors: Juliette Millet, Ioana Chitoran, Ewan Dunbar

    Abstract: Our native language influences the way we perceive speech sounds, affecting our ability to discriminate non-native sounds. We compare two ideas about the influence of the native language on speech perception: the Perceptual Assimilation Model, which appeals to a mental classification of sounds into native phoneme categories, versus the idea that rich, fine-grained phonetic representations tuned to… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

    Journal ref: 2021. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 661-673, Online. Association for Computational Linguistics

  8. arXiv:2205.15819  [pdf, other

    cs.CL cs.SD eess.AS

    Do self-supervised speech models develop human-like perception biases?

    Authors: Juliette Millet, Ewan Dunbar

    Abstract: Self-supervised models for speech processing form representational spaces without using any external labels. Increasingly, they appear to be a feasible way of at least partially eliminating costly manual annotations, a problem of particular concern for low-resource languages. But what kind of representational spaces do these models construct? Human perception specializes to the sounds of listeners… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

    Journal ref: 2022. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7591-7605, Dublin, Ireland. Association for Computational Linguistics

  9. arXiv:2104.14700  [pdf, ps, other

    cs.CL cs.AI

    The Zero Resource Speech Challenge 2021: Spoken language modelling

    Authors: Ewan Dunbar, Mathieu Bernard, Nicolas Hamilakis, Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Eugene Kharitonov, Emmanuel Dupoux

    Abstract: We present the Zero Resource Speech Challenge 2021, which asks participants to learn a language model directly from audio, without any text or labels. The challenge is based on the Libri-light dataset, which provides up to 60k hours of audio from English audio books without any associated text. We provide a pipeline baseline system consisting on an encoder based on contrastive predictive coding (C… ▽ More

    Submitted 9 August, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: Submitted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2011.11588

  10. arXiv:2102.11749  [pdf, other

    cs.CL cs.AI

    Paraphrases do not explain word analogies

    Authors: Louis Fournier, Ewan Dunbar

    Abstract: Many types of distributional word embeddings (weakly) encode linguistic regularities as directions (the difference between "jump" and "jumped" will be in a similar direction to that of "walk" and "walked," and so on). Several attempts have been made to explain this fact. We respond to Allen and Hospedales' recent (ICML, 2019) theoretical explanation, which claims that word2vec and GloVe will encod… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: To appear in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

  11. arXiv:2011.11588  [pdf, other

    cs.CL cs.SD eess.AS

    The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

    Authors: Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Evgeny Kharitonov, Alexei Baevski, Ewan Dunbar, Emmanuel Dupoux

    Abstract: We introduce a new unsupervised task, spoken language modeling: the learning of linguistic representations from raw audio signals without any labels, along with the Zero Resource Speech Benchmark 2021: a suite of 4 black-box, zero-shot metrics probing for the quality of the learned models at 4 linguistic levels: phonetics, lexicon, syntax and semantics. We present the results and analyses of a com… ▽ More

    Submitted 1 December, 2020; v1 submitted 23 November, 2020; originally announced November 2020.

    Comments: 14 pages, including references and supplementary material

  12. arXiv:2010.05967  [pdf, other

    cs.CL cs.AI

    The Zero Resource Speech Challenge 2020: Discovering discrete subword and word units

    Authors: Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

    Abstract: We present the Zero Resource Speech Challenge 2020, which aims at learning speech representations from raw audio signals without any labels. It combines the data sets and metrics from two previous benchmarks (2017 and 2019) and features two tasks which tap into two levels of speech representation. The first task is to discover low bit-rate subword representations that optimize the quality of speec… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of Interspeech 2020

  13. arXiv:2010.05961  [pdf, other

    cs.CL cs.AI

    Perceptimatic: A human speech perception benchmark for unsupervised subword modelling

    Authors: Juliette Millet, Ewan Dunbar

    Abstract: In this paper, we present a data set and methods to compare speech processing models and human behaviour on a phone discrimination task. We provide Perceptimatic, an open data set which consists of French and English speech stimuli, as well as the results of 91 English- and 93 French-speaking listeners. The stimuli test a wide range of French and English contrasts, and are extracted directly from… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of Interspeech 2020

  14. arXiv:2010.03446  [pdf, other

    cs.CL cs.AI

    Analogies minus analogy test: measuring regularities in word embeddings

    Authors: Louis Fournier, Emmanuel Dupoux, Ewan Dunbar

    Abstract: Vector space models of words have long been claimed to capture linguistic regularities as simple vector translations, but problems have been raised with this claim. We decompose and empirically analyze the classic arithmetic word analogy test, to motivate two new metrics that address the issues with the standard test, and which distinguish between class-wise offset concentration (similar direction… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of CoNLL 2020

  15. arXiv:2005.03418  [pdf, other

    cs.CL cs.SD eess.AS

    The Perceptimatic English Benchmark for Speech Perception Models

    Authors: Juliette Millet, Ewan Dunbar

    Abstract: We present the Perceptimatic English Benchmark, an open experimental benchmark for evaluating quantitative models of speech perception in English. The benchmark consists of ABX stimuli along with the responses of 91 American English-speaking listeners. The stimuli test discrimination of a large number of English and French phonemic contrasts. They are extracted directly from corpora of read speech… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Comments: Accepted to CogSci Conference 2020

  16. arXiv:1911.06573  [pdf, other

    eess.AS cs.CL cs.SD

    Independent and automatic evaluation of acoustic-to-articulatory inversion models

    Authors: Maud Parrot, Juliette Millet, Ewan Dunbar

    Abstract: Reconstruction of articulatory trajectories from the acoustic speech signal has been proposed for improving speech recognition and text-to-speech synthesis. However, to be useful in these settings, articulatory reconstruction must be speaker independent. Furthermore, as most research focuses on single, small datasets with few speakers, robust articulatory reconstrucion could profit from combining… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

    Comments: 5 pages, 1 figure

  17. arXiv:1904.11469  [pdf, other

    cs.CL cs.SD eess.AS

    The Zero Resource Speech Challenge 2019: TTS without T

    Authors: Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. Black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

    Abstract: We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text). We provide raw audio for a target voice in an unknown language (the Voice dataset), but no alignment, text or labels. Participants must discover subword units in an unsupervised way (using the Unit Discovery datase… ▽ More

    Submitted 7 July, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

    Comments: Interspeech 2019

  18. arXiv:1812.08718  [pdf, other

    cs.CL

    RNNs Implicitly Implement Tensor Product Representations

    Authors: R. Thomas McCoy, Tal Linzen, Ewan Dunbar, Paul Smolensky

    Abstract: Recurrent neural networks (RNNs) can learn continuous vector representations of symbolic structures such as sequences and sentences; these representations often exhibit linear regularities (analogies). Such regularities motivate our hypothesis that RNNs that show such regularities implicitly compile symbolic structures into tensor product representations (TPRs; Smolensky, 1990), which additively c… ▽ More

    Submitted 5 March, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

    Comments: Accepted to ICLR 2019

  19. arXiv:1712.04313  [pdf, ps, other

    cs.CL

    The Zero Resource Speech Challenge 2017

    Authors: Ewan Dunbar, Xuan Nga Cao, Juan Benjumea, Julien Karadayi, Mathieu Bernard, Laurent Besacier, Xavier Anguera, Emmanuel Dupoux

    Abstract: We describe a new challenge aimed at discovering subword and word units from raw speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It aims at constructing systems that generalize across languages and adapt to new speakers. The design features and evaluation metrics of the challenge are presented and the results of seventeen models are discussed.

    Submitted 12 December, 2017; originally announced December 2017.

    Comments: IEEE ASRU (Automatic Speech Recognition and Understanding) 2017. Okinawa, Japan

  20. arXiv:1704.06913  [pdf, other

    cs.CL cs.LG

    Learning weakly supervised multimodal phoneme embeddings

    Authors: Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux

    Abstract: Recent works have explored deep architectures for learning multimodal speech representation (e.g. audio and images, articulation and audio) in a supervised way. Here we investigate the role of combining different speech modalities, i.e. audio and visual information representing the lips movements, in a weakly supervised way using Siamese networks and lexical same-different side information. In par… ▽ More

    Submitted 18 October, 2017; v1 submitted 23 April, 2017; originally announced April 2017.