Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers

Le, Duc; Seide, Frank; Wang, Yuhao; Li, Yang; Schubert, Kjell; Kalinli, Ozlem; Seltzer, Michael L.

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2211.00896 (eess)

[Submitted on 2 Nov 2022 (v1), last revised 4 Mar 2023 (this version, v2)]

Title:Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers

Authors:Duc Le, Frank Seide, Yuhao Wang, Yang Li, Kjell Schubert, Ozlem Kalinli, Michael L. Seltzer

View PDF

Abstract:We show how factoring the RNN-T's output distribution can significantly reduce the computation cost and power consumption for on-device ASR inference with no loss in accuracy. With the rise in popularity of neural-transducer type models like the RNN-T for on-device ASR, optimizing RNN-T's runtime efficiency is of great interest. While previous work has primarily focused on the optimization of RNN-T's acoustic encoder and predictor, this paper focuses the attention on the joiner. We show that despite being only a small part of RNN-T, the joiner has a large impact on the overall model's runtime efficiency. We propose to utilize HAT-style joiner factorization for the purpose of skipping the more expensive non-blank computation when the blank probability exceeds a certain threshold. Since the blank probability can be computed very efficiently and the RNN-T output is dominated by blanks, our proposed method leads to a 26-30% decoding speed-up and 43-53% reduction in on-device power consumption, all the while incurring no accuracy degradation and being relatively simple to implement.

Comments:	Accepted for publication at ICASSP 2023
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2211.00896 [eess.AS]
	(or arXiv:2211.00896v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2211.00896

Submission history

From: Duc Le [view email]
[v1] Wed, 2 Nov 2022 05:42:53 UTC (169 KB)
[v2] Sat, 4 Mar 2023 22:08:23 UTC (169 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators