Speaker-aware speech-transformer

Fan, Zhiyun; Li, Jie; Zhou, Shiyu; Xu, Bo

Computer Science > Computation and Language

arXiv:2001.01557 (cs)

[Submitted on 2 Jan 2020]

Title:Speaker-aware speech-transformer

Authors:Zhiyun Fan, Jie Li, Shiyu Zhou, Bo Xu

View PDF

Abstract:Recently, end-to-end (E2E) models become a competitive alternative to the conventional hybrid automatic speech recognition (ASR) systems. However, they still suffer from speaker mismatch in training and testing condition. In this paper, we use Speech-Transformer (ST) as the study platform to investigate speaker aware training of E2E models. We propose a model called Speaker-Aware Speech-Transformer (SAST), which is a standard ST equipped with a speaker attention module (SAM). The SAM has a static speaker knowledge block (SKB) that is made of i-vectors. At each time step, the encoder output attends to the i-vectors in the block, and generates a weighted combined speaker embedding vector, which helps the model to normalize the speaker variations. The SAST model trained in this way becomes independent of specific training speakers and thus generalizes better to unseen testing speakers. We investigate different factors of SAM. Experimental results on the AISHELL-1 task show that SAST achieves a relative 6.5% CER reduction (CERR) over the speaker-independent (SI) baseline. Moreover, we demonstrate that SAST still works quite well even if the i-vectors in SKB all come from a different data source other than the acoustic training set.

Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2001.01557 [cs.CL]
	(or arXiv:2001.01557v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2001.01557

Submission history

From: Zhiyun Fan [view email]
[v1] Thu, 2 Jan 2020 15:04:08 UTC (203 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-01

Change to browse by:

cs
cs.SD
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jie Li
Shiyu Zhou
Bo Xu

export BibTeX citation

Computer Science > Computation and Language

Title:Speaker-aware speech-transformer

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Speaker-aware speech-transformer

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators