research-article

Lightweight domain adaptation: A filtering pipeline to improve accuracy of an Automatic Speech Recognition (ASR) engine

Authors:

Nikhita Sharma,

Vijay K. GurbaniAuthors Info & Claims

ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

Article No.: 95, Pages 1 - 9

https://doi.org/10.1145/3508546.3508641

Published: 25 February 2022 Publication History

Abstract

Transformer models have accelerated the field of speech recognition; deriving a low word error rate (WER) is demonstrably achievable under varying conditions. However, most ASR engines are trained on acoustic and language models constructed from corpora that include news feeds, books, and blogs in order to demonstrate generalization, leading to errors when the model is applied to a specific domain. While the increase in WER is acute for very specific domains (health and medicine), our work shows that it is sizable even when the domain is general (hospitality). For such domains, a lightweight adaptation approach can help; lightweight because the adaptation does not require extensive post-hoc training of additional domain-specific acoustic- or language-models that act as adjutants to the base ASR engine. We present our work on such lightweight filtering pipeline that seamlessly integrates lightweight models (n − gram, decision trees) with powerful, pre-trained, bi-directional transformer models, all working in conjunction to derive a 1-best hypothesis word selection algorithm. Our pipeline reduces the WER between 1.6% to 2.5% absolute while treating the ASR engine as a black box, and without requiring additional complex discriminative training.

References

[1]

W. A. Ainsworth and S. R. Pratt. 1992. Feedback Strategies for Error Correction in Speech Recognition Systems. Int. J. Man-Mach. Stud. 36, 6 (June 1992), 833–842.

Digital Library

[2]

Youssef Bassil and Paul Semaan. 2012. ASR Context-Sensitive Error Correction Based on Microsoft N-Gram Dataset. ArXiv abs/1203.5262(2012).

[3]

Tom B. Brown 2020. Language Models are Few-Shot Learners. arxiv:2005.14165 [cs.CL]

[4]

Xiaodong Cui, Wei Zhang, Ulrich Finkler, George Saon, Michael Picheny, and David S. Kung. 2020. Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition: A comparison of current training strategies. IEEE Signal Process. Mag. 37, 3 (2020), 39–49. https://doi.org/10.1109/MSP.2020.2969859

[5]

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (Jan. 2008), 107–113. https://doi.org/10.1145/1327452.1327492

Digital Library

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv e-prints, Article arXiv:1810.04805 (Oct. 2018), arXiv:1810.04805 pages. arxiv:1810.04805 [cs.CL]

[7]

Luis D’Haro and Rafael Banchs. 2016. Automatic Correction of ASR Outputs by Using Machine Translation. In Interspeech. 3469–3473.

[8]

Rahhal Errattahi, Asmaa El Hannani, and Hassan Ouahmane. 2018. Automatic speech recognition errors detection and correction: A review. Procedia Computer Science 128 (2018), 32–37.

[9]

Yohei Fusayasu, Katsuyuki Tanaka, Tetsuya Takiguchi, and Yasuo Ariki. 2015. Word-Error Correction of Continuous Speech Recognition Based on Normalized Relevance Distance. In IJCAI.

[10]

Jordan Hosier, Vijay K Gurbani, and Neil Milstead. 2019. Disambiguation and Error Resolution in Call Transcripts. In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 4602–4607.

[11]

Chang Liu, Pengyuan Zhang, Ta Li, and Yonghong Yan. 2019. Semantic Features Based N-Best Rescoring Methods for Automatic Speech Recognition. Applies Sciences 9(23):5053 (2019).

[12]

Yanhua Long, Yijie Li, Shuang Wei, Qiaozheng Zhang, and Chunxia Yang. 2019. Large-Scale Semi-Supervised Training in Deep Learning Acoustic Model for ASR. IEEE Access 7(2019), 133615–133627.

[13]

A. Mani 2020. ASR Error Correction and Domain Adaptation Using Machine Translation. In IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP).

[14]

Peters Matthew 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 2227–2237. https://doi.org/10.18653/v1/N18-1202

[15]

Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781(2013).

[16]

Gwendolyn B Moore 1977. Accessing Individual Records from Personal Data Files Using Non-Unique Identifiers. Final Report. Computer Science & Technology Series.(1977).

[17]

Ryohei Nakatani, Tetsuya Takiguchi, and Yasuo Ariki. 2013. Two-step correction of speech recognition errors based on n-gram and long contextual information. In INTERSPEECH.

[18]

J. M. Noyes and C. R. Frankish. 1994. Errors and error correction in automatic speech recognition systems. Ergonomics 37, 11 (1994), 1943–1957.

[19]

Daniel Povey 2011. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society.

[20]

J. R. Quinlan. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (March 1986), 81–106.

Digital Library

[21]

Brian Roark, Murat Saraclar, and Michael Collins. 2007. Discriminative n-gram language modeling. Computer Speech & Language 21, 2 (2007), 373 – 392. https://doi.org/10.1016/j.csl.2006.06.006

Digital Library

[22]

George Saon 2017. English Conversational Telephone Speech Recognition by Humans and Machines. In Proc. Interspeech 2017. 132–136.

[23]

Arup Sarma 2004. Context-Based Speech Recognition Error Detection and Correction. In Proc. of HLT-NAACL 2004: Short Papers (Boston, Massachusetts). Assn. for Computational Linguistics, 85–88.

Digital Library

[24]

A. R. Setlur 1996. Correcting recognition errors via discriminative utterance verification. In Proc. of 4th Intl. Conf. on Spoken Language Processing., Vol. 2. 602–605.

[25]

Prashanth Gurunath Shivakumar 2019. Learning from past mistakes: improving automatic speech recognition output via noisy-clean phrase context modeling. APSIPA Trans. on Signal and Information Processing 8 (2019).

[26]

Y. Tam 2014. ASR error detection using recurrent neural network language model and complementary ASR. In IEEE Intl. Conf. on Acoustics, Speech and Signal Processing.

[27]

P.C. Woodland and D. Povey. 2002. Large scale discriminative training of hidden Markov models for speech recognition. Computer Speech & Language 16, 1 (2002), 25 – 47. https://doi.org/10.1006/csla.2001.0182

Digital Library

[28]

Xiaodong Cui, Liang Gu, Bing Xiang, Wei Zhang, and Yuqing Gao. 2008. Developing high performance ASR in the IBM multilingual speech-to-speech translation system. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. 5121–5124.

[29]

Wayne Xiong 2017. Toward Human Parity in Conversational Speech Recognition. IEEE/ACM Trans. Audio, Speech and Lang. Proc. 25, 12 (Dec. 2017), 2410–2423.

[30]

Dong Yu and Li Deng. 2015. Automatic Speech Recognition. Springer-Verlag, London.

[31]

Zhengyu Zhou 2006. A multi-pass error detection and correction framework for Mandarin LVCSR. In INTERSPEECH.

Cited By

Recommendations

Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

In this paper, continuous Punjabi speech recognition model is presented using Kaldi toolkit. For speech recognition, the extraction of Mel frequency cepstral coefficients (MFCC) features and perceptual linear prediction (PLP) features were extracted ...
Research of Automatic Speech Recognition of Asante-Twi Dialect For Translation
EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering

This paper presents a new way of building low-resourced dialect Automatic Speech Recognition (ASR) systems using a small database using the Asante-Twi dialect. Three different ASR systems with different features and methods have been tested and tried ...
An efficient multistage rover method for automatic speech recognition
ICME'09: Proceedings of the 2009 IEEE international conference on Multimedia and Expo

In this paper, we implemented a multistage Recognizer Output Voting Error Reduction (ROVER) method for better Automatic Speech Recognition (ASR). The first stage ROVER is conducted by combining three recognizers, which are respectively trained with ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

December 2021

699 pages

ISBN:9781450385053

DOI:10.1145/3508546

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ACAI'21

ACAI'21: 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

December 22 - 24, 2021

Sanya, China

Acceptance Rates

Overall Acceptance Rate 173 of 395 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
88
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)1

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents