Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition

Settle, Shane; Audhkhasi, Kartik; Livescu, Karen; Picheny, Michael

Computer Science > Computation and Language

arXiv:1903.12306 (cs)

[Submitted on 29 Mar 2019]

Title:Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition

Authors:Shane Settle, Kartik Audhkhasi, Karen Livescu, Michael Picheny

View PDF

Abstract:Direct acoustics-to-word (A2W) systems for end-to-end automatic speech recognition are simpler to train, and more efficient to decode with, than sub-word systems. However, A2W systems can have difficulties at training time when data is limited, and at decoding time when recognizing words outside the training vocabulary. To address these shortcomings, we investigate the use of recently proposed acoustic and acoustically grounded word embedding techniques in A2W systems. The idea is based on treating the final pre-softmax weight matrix of an AWE recognizer as a matrix of word embedding vectors, and using an externally trained set of word embeddings to improve the quality of this matrix. In particular we introduce two ideas: (1) Enforcing similarity at training time between the external embeddings and the recognizer weights, and (2) using the word embeddings at test time for predicting out-of-vocabulary words. Our word embedding model is acoustically grounded, that is it is learned jointly with acoustic embeddings so as to encode the words' acoustic-phonetic content; and it is parametric, so that it can embed any arbitrary (potentially out-of-vocabulary) sequence of characters. We find that both techniques improve the performance of an A2W recognizer on conversational telephone speech.

Comments:	To appear at ICASSP 2019
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1903.12306 [cs.CL]
	(or arXiv:1903.12306v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1903.12306

Submission history

From: Shane Settle [view email]
[v1] Fri, 29 Mar 2019 00:44:14 UTC (435 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2019-03

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Shane Settle
Kartik Audhkhasi
Karen Livescu
Michael Picheny

export BibTeX citation

Computer Science > Computation and Language

Title:Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators