research-article

Open access

D²Animator: Dual Distillation of StyleGAN For High-Resolution Face Animation

Authors:

Zhuo Chen,

Chaoyue Wang,

Haimei Zhao,

Bo Yuan,

Xiu LiAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 1769 - 1778

https://doi.org/10.1145/3503161.3548002

Published: 10 October 2022 Publication History

PDF eReader

Abstract

The style-based generator architectures (e.g. StyleGAN v1, v2) largely promote the controllability and explainability of Generative Adversarial Networks (GANs). Many researchers have applied the pretrained style-based generators to image manipulation and video editing by exploring the correlation between linear interpolation in the latent space and semantic transformation in the synthesized image manifold. However, most previous studies focused on manipulating separate discrete attributes, which is insufficient to animate a still image to generate videos with complex and diverse poses and expressions. In this work, we devise a dual distillation strategy (D2Animator) for generating animated high-resolution face videos conditioned on identities and poses from different images. Specifically, we first introduce a Clustering-based Distiller (CluDistiller) to distill diverse interpolation directions in the latent space, and synthesize identity-consistent faces with various poses and expressions, such as blinking, frowning, looking up/down, etc. Then we propose an Augmentation-based Distiller (AugDistiller) that learns to encode arbitrary face deformation into a combination of interpolation directions via training on augmentation samples synthesized by CluDistiller. Through assembling the two distillation methods, D2Animator can generate high-resolution face animation videos without training on video sequences. Extensive experiments on self-driving, cross-identity and sequence-driving tasks demonstrate the superiority of the proposed D2Animator over existing StyleGAN manipulation and face animation methods in both generation quality and animation fidelity.

Supplementary Material

MP4 File (MM2022_camera_ready_d_2_animator_dual_distillation.mp4)

"Presentation video", "High-Resolution Face Animation"

Download
21.74 MB

References

[1]

Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2stylegan: how to embed images into the stylegan latent space? In Proceedings of the IEEE/CVF International Conference on Computer Vision, 4432--4441.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Audio-driven talking face generation with diverse yet realistic facial animations

GANimation: Anatomically-Aware Facial Animation from a Single Image

A virtual teleconferencing system based on face detection and 3D animation in a low‐bandwidth environment

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations