Learning Latent Representations for Speech Generation and Transformation

Hsu, Wei-Ning; Zhang, Yu; Glass, James

Computer Science > Computation and Language

arXiv:1704.04222v1 (cs)

[Submitted on 13 Apr 2017 (this version), latest version 22 Sep 2017 (v2)]

Title:Learning Latent Representations for Speech Generation and Transformation

Authors:Wei-Ning Hsu, Yu Zhang, James Glass

View PDF

Abstract:An ability to model a generative process and learn a latent representation for speech in an unsupervised fashion will be crucial to process vast quantities of unlabelled speech data. Recently, deep probabilistic generative models such as Variational Autoencoders (VAEs) have achieved tremendous success in modeling natural images. In this paper, we apply a convolutional VAE to model the generative process of natural speech. We derive latent space arithmetic operations to disentangle learned latent representations. We demonstrate the capability of our model to modify the phonetic content or the speaker identity for speech segments using the derived operations, without the need for parallel supervisory data.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1704.04222 [cs.CL]
	(or arXiv:1704.04222v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1704.04222

Submission history

From: Wei-Ning Hsu [view email]
[v1] Thu, 13 Apr 2017 17:41:11 UTC (645 KB)
[v2] Fri, 22 Sep 2017 16:41:54 UTC (1,180 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2017-04

Change to browse by:

cs
cs.LG
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Wei-Ning Hsu
Yu Zhang
James R. Glass

export BibTeX citation

Computer Science > Computation and Language

Title:Learning Latent Representations for Speech Generation and Transformation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Learning Latent Representations for Speech Generation and Transformation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators