Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding

Wang, Chao; Li, Zhonghao; Tang, Benlai; Yin, Xiang; Wan, Yuan; Yu, Yibiao; Ma, Zejun

Computer Science > Sound

arXiv:2110.04754 (cs)

[Submitted on 10 Oct 2021]

Title:Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding

Authors:Chao Wang, Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Yibiao Yu, Zejun Ma

View PDF

Abstract:Recently, phonetic posteriorgrams (PPGs) based methods have been quite popular in non-parallel singing voice conversion systems. However, due to the lack of acoustic information in PPGs, style and naturalness of the converted singing voices are still limited. To solve these problems, in this paper, we utilize an acoustic reference encoder to implicitly model singing characteristics. We experiment with different auxiliary features, including mel spectrograms, HuBERT, and the middle hidden feature (PPG-Mid) of pretrained automatic speech recognition (ASR) model, as the input of the reference encoder, and finally find the HuBERT feature is the best choice. In addition, we use contrastive predictive coding (CPC) module to further smooth the voices by predicting future observations in latent space. Experiments show that, compared with the baseline models, our proposed model can significantly improve the naturalness of converted singing voices and the similarity with the target singer. Moreover, our proposed model can also make the speakers with just speech data sing.

Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2110.04754 [cs.SD]
	(or arXiv:2110.04754v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2110.04754

Submission history

From: Benlai Tang [view email]
[v1] Sun, 10 Oct 2021 10:27:20 UTC (447 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.SD

< prev | next >

new | recent | 2021-10

Change to browse by:

cs
cs.CL
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

Chao Wang
Benlai Tang
Xiang Yin
Zejun Ma

export BibTeX citation

Computer Science > Sound

Title:Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators