Multi-blank Transducers for Speech Recognition

Xu, Hainan; Jia, Fei; Majumdar, Somshubra; Watanabe, Shinji; Ginsburg, Boris

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2211.03541 (eess)

[Submitted on 4 Nov 2022 (v1), last revised 11 Apr 2024 (this version, v2)]

Title:Multi-blank Transducers for Speech Recognition

Authors:Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg

View PDF HTML (experimental)

Abstract:This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition (ASR). In standard RNN-T, the emission of a blank symbol consumes exactly one input frame; in our proposed method, we introduce additional blank symbols, which consume two or more input frames when emitted. We refer to the added symbols as big blanks, and the method multi-blank RNN-T. For training multi-blank RNN-Ts, we propose a novel logit under-normalization method in order to prioritize emissions of big blanks. With experiments on multiple languages and datasets, we show that multi-blank RNN-T methods could bring relative speedups of over +90%/+139% to model inference for English Librispeech and German Multilingual Librispeech datasets, respectively. The multi-blank RNN-T method also improves ASR accuracy consistently. We will release our implementation of the method in the NeMo (this https URL) toolkit.

Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2211.03541 [eess.AS]
	(or arXiv:2211.03541v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2211.03541
Journal reference:	ICASSP 2023

Submission history

From: Hainan Xu [view email]
[v1] Fri, 4 Nov 2022 16:24:46 UTC (2,370 KB)
[v2] Thu, 11 Apr 2024 22:58:21 UTC (2,371 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Multi-blank Transducers for Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Multi-blank Transducers for Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators