Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–17 of 17 results for author: Aldarmaki, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02578  [pdf

    cs.CL

    Mixat: A Data Set of Bilingual Emirati-English Speech

    Authors: Maryam Al Ali, Hanan Aldarmaki

    Abstract: This paper introduces Mixat: a dataset of Emirati speech code-mixed with English. Mixat was developed to address the shortcomings of current speech recognition resources when applied to Emirati speech, and in particular, to bilignual Emirati speakers who often mix and switch between their local dialect and English. The data set consists of 15 hours of speech derived from two public podcasts featur… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: SIGUL 2024

  2. arXiv:2311.10771  [pdf, other

    cs.CL cs.SD

    Automatic Restoration of Diacritics for Speech Data Sets

    Authors: Sara Shatnawi, Sawsan Alqahtani, Hanan Aldarmaki

    Abstract: Automatic text-based diacritic restoration models generally have high diacritic error rates when applied to speech transcripts as a result of domain and style shifts in spoken language. In this work, we explore the possibility of improving the performance of automatic diacritic restoration when applied to speech data by utilizing parallel spoken utterances. In particular, we use the pre-trained Wh… ▽ More

    Submitted 6 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  3. arXiv:2311.09319  [pdf, other

    cs.CL cs.AI

    Spoken Word2Vec: Learning Skipgram Embeddings from Speech

    Authors: Mohammad Amaan Sayeed, Hanan Aldarmaki

    Abstract: Text word embeddings that encode distributional semantics work by modeling contextual similarities of frequently occurring words. Acoustic word embeddings, on the other hand, typically encode low-level phonetic similarities. Semantic embeddings for spoken words have been previously explored using analogous algorithms to Word2Vec, but the resulting vectors still mainly encoded phonetic rather than… ▽ More

    Submitted 1 July, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  4. arXiv:2310.16621  [pdf

    cs.CL cs.AI cs.SD eess.AS

    ArTST: Arabic Text and Speech Transformer

    Authors: Hawau Olamide Toyin, Amirbek Djanibekov, Ajinkya Kulkarni, Hanan Aldarmaki

    Abstract: We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language. The model architecture follows the unified-modal framework, SpeechT5, that was recently released for English, and is focused on Modern Standard Arabic (MSA), with plans to extend the model for dialectal and code-switched Arabic in future editions. We pre-traine… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 11 pages, 1 figure, SIGARAB ArabicNLP 2023

  5. arXiv:2310.13812  [pdf, other

    cs.CL cs.SD eess.AS

    Yet Another Model for Arabic Dialect Identification

    Authors: Ajinkya Kulkarni, Hanan Aldarmaki

    Abstract: In this paper, we describe a spoken Arabic dialect identification (ADI) model for Arabic that consistently outperforms previously published results on two benchmark datasets: ADI-5 and ADI-17. We explore two architectural variations: ResNet and ECAPA-TDNN, coupled with two types of acoustic features: MFCCs and features exratected from the pre-trained self-supervised model UniSpeech-SAT Large, as w… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: ACCEPTED AT ArabicNLP 2023

  6. arXiv:2310.07423  [pdf, other

    cs.CL cs.SD eess.AS

    Adapting the adapters for code-switching in multilingual ASR

    Authors: Atharva Kulkarni, Ajinkya Kulkarni, Miguel Couceiro, Hanan Aldarmaki

    Abstract: Recently, large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition (ASR) to many low-resource languages. Some of these models employ language adapters in their formulation, which helps to improve monolingual performance and avoids some of the drawbacks of multi-lingual modeling on resource-rich languages. However, this formulation restricts the usab… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Submitted to ICASSP 2024

  7. arXiv:2305.16337  [pdf, other

    cs.CL cs.AI

    Handling Realistic Label Noise in BERT Text Classification

    Authors: Maha Tufail Agro, Hanan Aldarmaki

    Abstract: Labels noise refers to errors in training labels caused by cheap data annotation methods, such as web scraping or crowd-sourcing, which can be detrimental to the performance of supervised classifiers. Several methods have been proposed to counteract the effect of random label noise in supervised classification, and some studies have shown that BERT is already robust against high rates of randomly… ▽ More

    Submitted 20 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  8. arXiv:2303.00069  [pdf, other

    cs.CL cs.SD eess.AS

    ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus

    Authors: Ajinkya Kulkarni, Atharva Kulkarni, Sara Abedalmonem Mohammad Shatnawi, Hanan Aldarmaki

    Abstract: At present, Text-to-speech (TTS) systems that are trained with high-quality transcribed speech data using end-to-end neural models can generate speech that is intelligible, natural, and closely resembles human speech. These models are trained with relatively large single-speaker professionally recorded audio, typically extracted from audiobooks. Meanwhile, due to the scarcity of freely available s… ▽ More

    Submitted 28 February, 2023; originally announced March 2023.

    Comments: None

    MSC Class: none

  9. Diacritic Recognition Performance in Arabic ASR

    Authors: Hanan Aldarmaki, Ahmad Ghannam

    Abstract: We present an analysis of diacritic recognition performance in Arabic Automatic Speech Recognition (ASR) systems. As most existing Arabic speech corpora do not contain all diacritical marks, which represent short vowels and other phonetic information in Arabic script, current state-of-the-art ASR models do not produce full diacritization in their output. Automatic text-based diacritization has pre… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Report number: 2344

    Journal ref: Proceedings of INTERSPEECH 2023

  10. arXiv:2301.01020  [pdf, other

    cs.CL cs.SD eess.AS

    Supervised Acoustic Embeddings And Their Transferability Across Languages

    Authors: Sreepratha Ram, Hanan Aldarmaki

    Abstract: In speech recognition, it is essential to model the phonetic content of the input signal while discarding irrelevant factors such as speaker variations and noise, which is challenging in low-resource settings. Self-supervised pre-training has been proposed as a way to improve both supervised and unsupervised speech recognition, including frame-level feature representations and Acoustic Word Embedd… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: Presented at ICNLSP 2022

  11. Unsupervised Automatic Speech Recognition: A Review

    Authors: Hanan Aldarmaki, Asad Ullah, Nazar Zaki

    Abstract: Automatic Speech Recognition (ASR) systems can be trained to achieve remarkable performance given large amounts of manually transcribed speech, but large labeled data sets can be difficult or expensive to acquire for all languages of interest. In this paper, we review the research literature to identify models and ideas that could lead to fully unsupervised ASR, including unsupervised segmentation… ▽ More

    Submitted 20 March, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

    Comments: 26 pages + 10 pages of references, 3 figures. Speech Communication (2022)

  12. Homograph Disambiguation Through Selective Diacritic Restoration

    Authors: Sawsan Alqahtani, Hanan Aldarmaki, Mona Diab

    Abstract: Lexical ambiguity, a challenging phenomenon in all natural languages, is particularly prevalent for languages with diacritics that tend to be omitted in writing, such as Arabic. Omitting diacritics leads to an increase in the number of homographs: different words with the same spelling. Diacritic restoration could theoretically help disambiguate these words, but in practice, the increase in overal… ▽ More

    Submitted 9 December, 2019; originally announced December 2019.

    Comments: accepted in WANLP 2019

  13. arXiv:1909.03104  [pdf, other

    cs.CL

    Efficient Sentence Embedding using Discrete Cosine Transform

    Authors: Nada Almarwani, Hanan Aldarmaki, Mona Diab

    Abstract: Vector averaging remains one of the most popular sentence embedding methods in spite of its obvious disregard for syntactic structure. While more complex sequential or convolutional networks potentially yield superior classification performance, the improvements in classification accuracy are typically mediocre compared to the simple vector averaging. As an efficient alternative, we propose the us… ▽ More

    Submitted 8 January, 2020; v1 submitted 6 September, 2019; originally announced September 2019.

    Comments: To appear in EMNLP 2019

    Journal ref: EMNLP 2019

  14. arXiv:1904.05542  [pdf, ps, other

    cs.CL

    Scalable Cross-Lingual Transfer of Neural Sentence Embeddings

    Authors: Hanan Aldarmaki, Mona Diab

    Abstract: We develop and investigate several cross-lingual alignment approaches for neural sentence embedding models, such as the supervised inference classifier, InferSent, and sequential encoder-decoder models. We evaluate three alignment frameworks applied to these models: joint modeling, representation transfer learning, and sentence mapping, using parallel text to guide the alignment. Our results suppo… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.

    Comments: accepted in *SEM 2019

  15. arXiv:1903.03243  [pdf, ps, other

    cs.CL

    Context-Aware Cross-Lingual Mapping

    Authors: Hanan Aldarmaki, Mona Diab

    Abstract: Cross-lingual word vectors are typically obtained by fitting an orthogonal matrix that maps the entries of a bilingual dictionary from a source to a target vector space. Word vectors, however, are most commonly used for sentence or document-level representations that are calculated as the weighted average of word embeddings. In this paper, we propose an alternative to word-level mapping that bette… ▽ More

    Submitted 31 March, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

    Comments: NAACL-HLT 2019 (short paper). 5 pages, 1 figure

  16. arXiv:1806.04713  [pdf, other

    cs.CL

    Evaluation of Unsupervised Compositional Representations

    Authors: Hanan Aldarmaki, Mona Diab

    Abstract: We evaluated various compositional models, from bag-of-words representations to compositional RNN-based models, on several extrinsic supervised and unsupervised evaluation benchmarks. Our results confirm that weighted vector averaging can outperform context-sensitive models in most benchmarks, but structural features encoded in RNN models can also be useful in certain classification tasks. We anal… ▽ More

    Submitted 14 June, 2018; v1 submitted 12 June, 2018; originally announced June 2018.

    Comments: 12 pages, 5 figures. COLING 2018

    Journal ref: Proceedings of the 27th International Conference on Computational Linguistics (2018)

  17. arXiv:1712.06961  [pdf

    cs.CL

    Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings

    Authors: Hanan Aldarmaki, Mahesh Mohan, Mona Diab

    Abstract: Most existing methods for automatic bilingual dictionary induction rely on prior alignments between the source and target languages, such as parallel corpora or seed dictionaries. For many language pairs, such supervised alignments are not readily available. We propose an unsupervised approach for learning a bilingual dictionary for a pair of languages given their independently-learned monolingual… ▽ More

    Submitted 29 January, 2018; v1 submitted 19 December, 2017; originally announced December 2017.

    Comments: 10 pages, 8 figures; will appear in Transactions of the Association for Computational Linguistics (TACL)

    Journal ref: TACL 6 (2018) 185-196