Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–15 of 15 results for author: Zaiem, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00756  [pdf, other

    eess.AS cs.SD

    Less Forgetting for Better Generalization: Exploring Continual-learning Fine-tuning Methods for Speech Self-supervised Representations

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid

    Abstract: Despite being trained on massive and diverse datasets, speech self-supervised encoders are generally used for downstream purposes as mere frozen feature extractors or model initializers before fine-tuning. The former severely limits the exploitation of large encoders, while the latter hurts the robustness acquired during pretraining, especially in low-resource scenarios. This work explores middle-… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 5 Pages

  2. arXiv:2407.00463  [pdf, other

    cs.LG cs.AI cs.CL cs.HC eess.AS

    Open-Source Conversational AI with SpeechBrain 1.0

    Authors: Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Pierre Champion, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Xuechen Liu , et al. (7 additional authors not shown)

    Abstract: SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper prese… ▽ More

    Submitted 18 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

    Comments: Submitted to JMLR (Machine Learning Open Source Software)

  3. arXiv:2406.10735  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    How Should We Extract Discrete Audio Tokens from Self-Supervised Models?

    Authors: Pooneh Mousavi, Jarod Duret, Salah Zaiem, Luca Della Libera, Artem Ploujnikov, Cem Subakan, Mirco Ravanelli

    Abstract: Discrete audio tokens have recently gained attention for their potential to bridge the gap between audio and language processing. Ideal audio tokens must preserve content, paralinguistic elements, speaker identity, and many other audio details. Current audio tokenization methods fall into two categories: Semantic tokens, acquired through quantization of Self-Supervised Learning (SSL) models, and N… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 4 pages, 2 figures, 2 tables, Accepted at Interspeech 2024

  4. arXiv:2309.12712  [pdf, other

    eess.AS cs.LG cs.SD

    Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

    Authors: Hugo Malard, Salah Zaiem, Robin Algayres

    Abstract: Recent progress in Automatic Speech Recognition (ASR) has been coupled with a substantial increase in the model sizes, which may now contain billions of parameters, leading to slow inferences even with adapted hardware. In this context, several ASR models exist in various sizes, with different inference costs leading to different performance levels. Based on the observation that smaller models per… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  5. arXiv:2309.11327  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition

    Authors: Ahmed Amine Ben Abdallah, Ata Kabboudi, Amir Kanoun, Salah Zaiem

    Abstract: Crafting an effective Automatic Speech Recognition (ASR) solution for dialects demands innovative approaches that not only address the data scarcity issue but also navigate the intricacies of linguistic diversity. In this paper, we address the aforementioned ASR challenge, focusing on the Tunisian dialect. First, textual and audio data is collected and in some cases annotated. Second, we explore s… ▽ More

    Submitted 25 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: 6 pages, submitted to ICASSP 2024

  6. arXiv:2309.09546  [pdf, other

    eess.AS cs.CL cs.SD

    Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

    Authors: George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti

    Abstract: The ability to dynamically adjust the computational load of neural models during inference is crucial for on-device processing scenarios characterised by limited and time-varying computational resources. A promising solution is presented by early-exit architectures, in which additional exit branches are appended to intermediate layers of the encoder. In self-attention models for automatic speech r… ▽ More

    Submitted 22 February, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted at the ICASSP Workshop Self-supervision in Audio, Speech and Beyond 2024

  7. arXiv:2308.14456  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads

    Authors: Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli

    Abstract: Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, while the number of considered tasks has bee… ▽ More

    Submitted 21 February, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: 18 Pages

  8. arXiv:2306.00481  [pdf, other

    eess.AS cs.LG

    Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid

    Abstract: Self-Supervised Learning (SSL) has allowed leveraging large amounts of unlabeled speech data to improve the performance of speech recognition models even with small annotated datasets. Despite this, speech SSL representations may fail while facing an acoustic mismatch between the pretraining and target datasets. To address this issue, we propose a novel supervised domain adaptation method, designe… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 6 pages,INTERSPEECH 2023

  9. arXiv:2306.00452  [pdf, ps, other

    eess.AS cs.LG

    Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?

    Authors: Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli

    Abstract: Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data. The high number of proposed approaches fostered the need and rise of extended benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. Howe… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 6 pages

    Journal ref: INTERSPEECH 2023

  10. arXiv:2303.06740  [pdf, other

    eess.AS cs.LG

    Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative Study

    Authors: Salah Zaiem, Robin Algayres, Titouan Parcollet, Slim Essid, Mirco Ravanelli

    Abstract: Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance in low-resource settings. In this context, it has been demonstrated that larger self-supervised feature extractors are crucial for achieving lower downstream ASR error rates. Thus, better performance might be sanctioned with longer inferences. This article explores different approaches… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

    Comments: Submitted to ICASSP "Self-supervision in Audio, Speech and Beyond" workshop

  11. arXiv:2204.04170  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid

    Abstract: Contrastive learning enables learning useful audio and speech representations without ground-truth labels by maximizing the similarity between latent representations of similar signal segments. In this framework various data augmentation techniques are usually exploited to help enforce desired invariances within the learned representations, improving performance on various audio tasks thanks to mo… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  12. arXiv:2107.00594  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Pretext Tasks selection for multitask self-supervised speech representation learning

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid, Abdel Heba

    Abstract: Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In audio/speech signal processing, a wide range of features where engineered through decades of research efforts. As it turns out, learning to predict such features (a.k.a pseudo-labels) has proven to be a particularl… ▽ More

    Submitted 11 November, 2022; v1 submitted 1 July, 2021; originally announced July 2021.

  13. arXiv:2104.14297  [pdf, other

    cs.SD cs.LG eess.AS

    End-to-End Speech Recognition from Federated Acoustic Models

    Authors: Yan Gao, Titouan Parcollet, Salah Zaiem, Javier Fernandez-Marques, Pedro P. B. de Gusmao, Daniel J. Beutel, Nicholas D. Lane

    Abstract: Training Automatic Speech Recognition (ASR) models under federated learning (FL) settings has attracted a lot of attention recently. However, the FL scenarios often presented in the literature are artificial and fail to capture the complexity of real FL systems. In this paper, we construct a challenging and realistic ASR federated experimental setup consisting of clients with heterogeneous data di… ▽ More

    Submitted 9 July, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

  14. arXiv:2104.07388  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Conditional independence for pretext task selection in Self-supervised speech representation learning

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid

    Abstract: Through solving pretext tasks, self-supervised learning (SSL) leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. A common pretext task consists in pretraining a SSL model on pseudo-labels derived from the original signal. This technique is particularly relevant for speech data where various meaningful signal processing fea… ▽ More

    Submitted 1 July, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: 5 pages, Accepted for presentation at Interspeech2021

  15. arXiv:2007.13542  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Evaluating the reliability of acoustic speech embeddings

    Authors: Robin Algayres, Mohamed Salah Zaiem, Benoit Sagot, Emmanuel Dupoux

    Abstract: Speech embeddings are fixed-size acoustic representations of variable-length speech sequences. They are increasingly used for a variety of tasks ranging from information retrieval to unsupervised term discovery and speech segmentation. However, there is currently no clear methodology to compare or optimise the quality of these embeddings in a task-neutral way. Here, we systematically compare two p… ▽ More

    Submitted 6 November, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

    Comments: Conference paper at Interspeech 2020