Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3508546.3508641acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacaiConference Proceedingsconference-collections
research-article

Lightweight domain adaptation: A filtering pipeline to improve accuracy of an Automatic Speech Recognition (ASR) engine

Published: 25 February 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Transformer models have accelerated the field of speech recognition; deriving a low word error rate (WER) is demonstrably achievable under varying conditions. However, most ASR engines are trained on acoustic and language models constructed from corpora that include news feeds, books, and blogs in order to demonstrate generalization, leading to errors when the model is applied to a specific domain. While the increase in WER is acute for very specific domains (health and medicine), our work shows that it is sizable even when the domain is general (hospitality). For such domains, a lightweight adaptation approach can help; lightweight because the adaptation does not require extensive post-hoc training of additional domain-specific acoustic- or language-models that act as adjutants to the base ASR engine. We present our work on such lightweight filtering pipeline that seamlessly integrates lightweight models (n − gram, decision trees) with powerful, pre-trained, bi-directional transformer models, all working in conjunction to derive a 1-best hypothesis word selection algorithm. Our pipeline reduces the WER between 1.6% to 2.5% absolute while treating the ASR engine as a black box, and without requiring additional complex discriminative training.

    References

    [1]
    W. A. Ainsworth and S. R. Pratt. 1992. Feedback Strategies for Error Correction in Speech Recognition Systems. Int. J. Man-Mach. Stud. 36, 6 (June 1992), 833–842.
    [2]
    Youssef Bassil and Paul Semaan. 2012. ASR Context-Sensitive Error Correction Based on Microsoft N-Gram Dataset. ArXiv abs/1203.5262(2012).
    [3]
    Tom B. Brown 2020. Language Models are Few-Shot Learners. arxiv:2005.14165 [cs.CL]
    [4]
    Xiaodong Cui, Wei Zhang, Ulrich Finkler, George Saon, Michael Picheny, and David S. Kung. 2020. Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition: A comparison of current training strategies. IEEE Signal Process. Mag. 37, 3 (2020), 39–49. https://doi.org/10.1109/MSP.2020.2969859
    [5]
    Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (Jan. 2008), 107–113. https://doi.org/10.1145/1327452.1327492
    [6]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv e-prints, Article arXiv:1810.04805 (Oct. 2018), arXiv:1810.04805 pages. arxiv:1810.04805 [cs.CL]
    [7]
    Luis D’Haro and Rafael Banchs. 2016. Automatic Correction of ASR Outputs by Using Machine Translation. In Interspeech. 3469–3473.
    [8]
    Rahhal Errattahi, Asmaa El Hannani, and Hassan Ouahmane. 2018. Automatic speech recognition errors detection and correction: A review. Procedia Computer Science 128 (2018), 32–37.
    [9]
    Yohei Fusayasu, Katsuyuki Tanaka, Tetsuya Takiguchi, and Yasuo Ariki. 2015. Word-Error Correction of Continuous Speech Recognition Based on Normalized Relevance Distance. In IJCAI.
    [10]
    Jordan Hosier, Vijay K Gurbani, and Neil Milstead. 2019. Disambiguation and Error Resolution in Call Transcripts. In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 4602–4607.
    [11]
    Chang Liu, Pengyuan Zhang, Ta Li, and Yonghong Yan. 2019. Semantic Features Based N-Best Rescoring Methods for Automatic Speech Recognition. Applies Sciences 9(23):5053 (2019).
    [12]
    Yanhua Long, Yijie Li, Shuang Wei, Qiaozheng Zhang, and Chunxia Yang. 2019. Large-Scale Semi-Supervised Training in Deep Learning Acoustic Model for ASR. IEEE Access 7(2019), 133615–133627.
    [13]
    A. Mani 2020. ASR Error Correction and Domain Adaptation Using Machine Translation. In IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP).
    [14]
    Peters Matthew 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 2227–2237. https://doi.org/10.18653/v1/N18-1202
    [15]
    Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781(2013).
    [16]
    Gwendolyn B Moore 1977. Accessing Individual Records from Personal Data Files Using Non-Unique Identifiers. Final Report. Computer Science & Technology Series.(1977).
    [17]
    Ryohei Nakatani, Tetsuya Takiguchi, and Yasuo Ariki. 2013. Two-step correction of speech recognition errors based on n-gram and long contextual information. In INTERSPEECH.
    [18]
    J. M. Noyes and C. R. Frankish. 1994. Errors and error correction in automatic speech recognition systems. Ergonomics 37, 11 (1994), 1943–1957.
    [19]
    Daniel Povey 2011. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society.
    [20]
    J. R. Quinlan. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (March 1986), 81–106.
    [21]
    Brian Roark, Murat Saraclar, and Michael Collins. 2007. Discriminative n-gram language modeling. Computer Speech & Language 21, 2 (2007), 373 – 392. https://doi.org/10.1016/j.csl.2006.06.006
    [22]
    George Saon 2017. English Conversational Telephone Speech Recognition by Humans and Machines. In Proc. Interspeech 2017. 132–136.
    [23]
    Arup Sarma 2004. Context-Based Speech Recognition Error Detection and Correction. In Proc. of HLT-NAACL 2004: Short Papers (Boston, Massachusetts). Assn. for Computational Linguistics, 85–88.
    [24]
    A. R. Setlur 1996. Correcting recognition errors via discriminative utterance verification. In Proc. of 4th Intl. Conf. on Spoken Language Processing., Vol. 2. 602–605.
    [25]
    Prashanth Gurunath Shivakumar 2019. Learning from past mistakes: improving automatic speech recognition output via noisy-clean phrase context modeling. APSIPA Trans. on Signal and Information Processing 8 (2019).
    [26]
    Y. Tam 2014. ASR error detection using recurrent neural network language model and complementary ASR. In IEEE Intl. Conf. on Acoustics, Speech and Signal Processing.
    [27]
    P.C. Woodland and D. Povey. 2002. Large scale discriminative training of hidden Markov models for speech recognition. Computer Speech & Language 16, 1 (2002), 25 – 47. https://doi.org/10.1006/csla.2001.0182
    [28]
    Xiaodong Cui, Liang Gu, Bing Xiang, Wei Zhang, and Yuqing Gao. 2008. Developing high performance ASR in the IBM multilingual speech-to-speech translation system. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. 5121–5124.
    [29]
    Wayne Xiong 2017. Toward Human Parity in Conversational Speech Recognition. IEEE/ACM Trans. Audio, Speech and Lang. Proc. 25, 12 (Dec. 2017), 2410–2423.
    [30]
    Dong Yu and Li Deng. 2015. Automatic Speech Recognition. Springer-Verlag, London.
    [31]
    Zhengyu Zhou 2006. A multi-pass error detection and correction framework for Mandarin LVCSR. In INTERSPEECH.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence
    December 2021
    699 pages
    ISBN:9781450385053
    DOI:10.1145/3508546
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 February 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ASR
    2. BERT
    3. decision trees
    4. domain adaptation
    5. filtering pipeline
    6. transformer architecture

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ACAI'21

    Acceptance Rates

    Overall Acceptance Rate 173 of 395 submissions, 44%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 88
      Total Downloads
    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media