research-article

L2RS: A Learning-to-Rescore Mechanism for Hybrid Speech Recognition

Authors:

Yuanfeng Song,

Di Jiang,

Xuefang Zhao,

Qian Xu,

Raymond Chi-Wing Wong,

Lixin Fan,

Qiang YangAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 1157 - 1166

https://doi.org/10.1145/3474085.3481542

Published: 17 October 2021 Publication History

Get Access

Abstract

This paper aims to advance the performance of industrial ASR systems by exploring a more effective method for N-best rescoring, a critical step that greatly affects the final recognition accuracy. Existing rescoring approaches suffer the following issues: (i) limited performance since they optimize an unnecessarily harder problem, namely predicting accurate grammatical legitimacy scores of the N-best hypotheses rather than directly predicting their partial orders regarding a specific acoustic input; (ii) hard to incorporate various information by advanced natural language processing (NLP) models such as BERT to achieve a comprehensive evaluation of each N-best candidate. To relieve the above drawbacks, we propose a simple yet effective mechanism, Learning-to-Rescore (L2RS), to empower ASR systems with state-of-the-art information retrieval (IR) techniques. Specifically, L2RS utilizes a wide range of textual information from the state-of-the-art NLP models and automatically deciding their weights to directly learn the ranking order of each N-best hypothesis with respect to a specific acoustic input. We incorporate various features including BERT sentence embeddings, the topic vectors, and perplexity scores produced by an n-gram language model (LM), topic modeling LM, BERT, and RNNLM to train the rescoring model. Experimental results on a public dataset show that L2RS outperforms not only traditional rescoring methods but also its deep neural network counterparts by a substantial margin of 20.85% in terms of NDCG@10. The L2RS toolkit has been successfully deployed for many online commercial services in WeBank Co., Ltd, China's leading digital bank. The efficacy and applicability of L2RS are validated by real-life online customer datasets.

References

[1]

Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, et almbox. 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning. 173--182.

Abstract

References

Cited By

Index Terms

Recommendations

Prosody modification for speech recognition in emotionally mismatched conditions

Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach

Merging of Native and Non-native Speech for Low-resource Accented ASR

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations