Improved training for online end-to-end speech recognition systems

Kim, Suyoun; Seltzer, Michael L.; Li, Jinyu; Zhao, Rui

Computer Science > Computation and Language

arXiv:1711.02212 (cs)

[Submitted on 6 Nov 2017 (v1), last revised 30 Aug 2018 (this version, v2)]

Title:Improved training for online end-to-end speech recognition systems

Authors:Suyoun Kim, Michael L. Seltzer, Jinyu Li, Rui Zhao

View PDF

Abstract:Achieving high accuracy with end-to-end speech recognizers requires careful parameter initialization prior to training. Otherwise, the networks may fail to find a good local optimum. This is particularly true for online networks, such as unidirectional LSTMs. Currently, the best strategy to train such systems is to bootstrap the training from a tied-triphone system. However, this is time consuming, and more importantly, is impossible for languages without a high-quality pronunciation lexicon. In this work, we propose an initialization strategy that uses teacher-student learning to transfer knowledge from a large, well-trained, offline end-to-end speech recognition model to an online end-to-end model, eliminating the need for a lexicon or any other linguistic resources. We also explore curriculum learning and label smoothing and show how they can be combined with the proposed teacher-student learning for further improvements. We evaluate our methods on a Microsoft Cortana personal assistant task and show that the proposed method results in a 19 % relative improvement in word error rate compared to a randomly-initialized baseline system.

Comments:	Interspeech 2018
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1711.02212 [cs.CL]
	(or arXiv:1711.02212v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1711.02212

Submission history

From: Suyoun Kim [view email]
[v1] Mon, 6 Nov 2017 22:59:48 UTC (4,056 KB)
[v2] Thu, 30 Aug 2018 19:39:17 UTC (7,179 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2017-11

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Suyoun Kim
Michael L. Seltzer
Jinyu Li
Rui Zhao

export BibTeX citation

Computer Science > Computation and Language

Title:Improved training for online end-to-end speech recognition systems

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Improved training for online end-to-end speech recognition systems

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators