Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

Yang, Zhanheng; Sun, Sining; Wang, Xiong; Zhang, Yike; Ma, Long; Xie, Lei

Computer Science > Sound

arXiv:2301.06735 (cs)

[Submitted on 17 Jan 2023 (v1), last revised 8 Jun 2023 (this version, v3)]

Title:Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

Authors:Zhanheng Yang, Sining Sun, Xiong Wang, Yike Zhang, Long Ma, Lei Xie

View PDF

Abstract:It is difficult for an E2E ASR system to recognize words such as entities appearing infrequently in the training data. A widely used method to mitigate this issue is feeding contextual information into the acoustic model. Previous works have proven that a compact and accurate contextual list can boost the performance significantly. In this paper, we propose an efficient approach to obtain a high quality contextual list for a unified streaming/non-streaming based E2E model. Specifically, we make use of the phone-level streaming output to first filter the predefined contextual word list then fuse it into non-casual encoder and decoder to generate the final recognition results. Our approach improve the accuracy of the contextual ASR system and speed up the inference process. Experiments on two datasets demonstrates over 20% CER reduction comparing to the baseline system. Meanwhile, the RTF of our system can be stabilized within 0.15 when the size of the contextual word list grows over 6,000.

Comments:	accepted by interspeech 2023
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2301.06735 [cs.SD]
	(or arXiv:2301.06735v3 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2301.06735

Submission history

From: Zhanheng Yang [view email]
[v1] Tue, 17 Jan 2023 07:29:26 UTC (123 KB)
[v2] Sun, 21 May 2023 07:11:38 UTC (123 KB)
[v3] Thu, 8 Jun 2023 13:29:38 UTC (122 KB)

Computer Science > Sound

Title:Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators