Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels

Ye, Zipeng; Xia, Mengfei; Yi, Ran; Zhang, Juyong; Lai, Yu-Kun; Huang, Xuwei; Zhang, Guoxin; Liu, Yong-jin

doi:10.1109/TMM.2022.3142387

Computer Science > Computer Vision and Pattern Recognition

arXiv:2201.05986 (cs)

[Submitted on 16 Jan 2022]

Title:Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels

Authors:Zipeng Ye, Mengfei Xia, Ran Yi, Juyong Zhang, Yu-Kun Lai, Xuwei Huang, Guoxin Zhang, Yong-jin Liu

View PDF

Abstract:In this paper, we present a dynamic convolution kernel (DCK) strategy for convolutional neural networks. Using a fully convolutional network with the proposed DCKs, high-quality talking-face video can be generated from multi-modal sources (i.e., unmatched audio and video) in real time, and our trained model is robust to different identities, head postures, and input audios. Our proposed DCKs are specially designed for audio-driven talking face video generation, leading to a simple yet effective end-to-end system. We also provide a theoretical analysis to interpret why DCKs work. Experimental results show that our method can generate high-quality talking-face video with background at 60 fps. Comparison and evaluation between our method and the state-of-the-art methods demonstrate the superiority of our method.

Comments:	in IEEE Transactions on Multimedia
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Cite as:	arXiv:2201.05986 [cs.CV]
	(or arXiv:2201.05986v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2201.05986
Related DOI:	https://doi.org/10.1109/TMM.2022.3142387

Submission history

From: Zipeng Ye [view email]
[v1] Sun, 16 Jan 2022 07:07:59 UTC (5,606 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2022-01

Change to browse by:

cs
cs.MM

References & Citations

DBLP - CS Bibliography

listing | bibtex

Zipeng Ye
Ran Yi
Juyong Zhang
Yu-Kun Lai
Yong-Jin Liu

export BibTeX citation

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computer Vision and Pattern Recognition

Title:Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computer Vision and Pattern Recognition

Title:Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators