DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

Liu, Sen; Guo, Yiwei; Du, Chenpeng; Chen, Xie; Yu, Kai

Computer Science > Sound

arXiv:2306.14145 (cs)

[Submitted on 25 Jun 2023]

Title:DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

Authors:Sen Liu, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu

View PDF

Abstract:Although high-fidelity speech can be obtained for intralingual speech synthesis, cross-lingual text-to-speech (CTTS) is still far from satisfactory as it is difficult to accurately retain the speaker timbres(i.e. speaker similarity) and eliminate the accents from their first language(i.e. nativeness). In this paper, we demonstrated that vector-quantized(VQ) acoustic feature contains less speaker information than mel-spectrogram. Based on this finding, we propose a novel dual speaker embedding TTS (DSE-TTS) framework for CTTS with authentic speaking style. Here, one embedding is fed to the acoustic model to learn the linguistic speaking style, while the other one is integrated into the vocoder to mimic the target speaker's timbre. Experiments show that by combining both embeddings, DSE-TTS significantly outperforms the state-of-the-art SANE-TTS in cross-lingual synthesis, especially in terms of nativeness.

Comments:	Accepted to Interspeech 2023
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2306.14145 [cs.SD]
	(or arXiv:2306.14145v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2306.14145

Submission history

From: Sen Liu [view email]
[v1] Sun, 25 Jun 2023 06:46:36 UTC (1,196 KB)

Computer Science > Sound

Title:DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators