Google Scholar

LSTM-LM with long-term history for first-pass decoding in conversational speech recognition

X Chen, S Parthasarathy, W Gale, S Chang… - arXiv preprint arXiv …, 2020 - arxiv.org

X Chen, S Parthasarathy, W Gale, S Chang, M Zeng

arXiv preprint arXiv:2010.11349, 2020•arxiv.org

LSTM language models (LSTM-LMs) have been proven to be powerful and yielded significant performance improvements over count based n-gram LMs in modern speech recognition systems. Due to its infinite history states and computational load, most previous studies focus on applying LSTM-LMs in the second-pass for rescoring purpose. Recent work shows that it is feasible and computationally affordable to adopt the LSTM-LMs in the first-pass decoding within a dynamic (or tree based) decoder framework. In this work, the LSTM-LM is composed with a WFST decoder on-the-fly for the first-pass decoding. Furthermore, motivated by the long-term history nature of LSTM-LMs, the use of context beyond the current utterance is explored for the first-pass decoding in conversational speech recognition. The context information is captured by the hidden states of LSTM-LMs across utterance and can be used to guide the first-pass search effectively. The experimental results in our internal meeting transcription system show that significant performance improvements can be obtained by incorporating the contextual information with LSTM-LMs in the first-pass decoding, compared to applying the contextual information in the second-pass rescoring.

arxiv.org

Show moreShow less

Save Cite Cited by 5 Related articles All 3 versions View as HTML

Cite

Advanced search

Saved to My library

LSTM-LM with long-term history for first-pass decoding in conversational speech recognition