research-article

PADVG: A Simple Baseline of Active Protection for Audio-Driven Video Generation

Authors:

Yao ZhaoAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 6

Article No.: 168, Pages 1 - 19

https://doi.org/10.1145/3638556

Published: 08 March 2024 Publication History

Get Access

Abstract

Over the past few years, deep generative models have significantly evolved, enabling the synthesis of realistic content and also bringing security concerns of illegal misuse. Therefore, active protection for generative models has been proposed recently, aiming to generate samples with hidden messages for future identification while preserving the original generating performance. However, existing active protection methods are specifically designed for generative adversarial networks (GANs), restricted to handling unconditional image generation. We observe that they get limited identification performance and visual quality when handling audio-driven video generation conditioned on target audio and source input to drive video generation with consistent context, e.g., identity and movement, between frame sequences. To address this issue, we introduce a simple yet effective active Protection framework for Audio-Driven Video Generation, named PADVG. To be specific, we present a novel frame-shared embedding module in which messages to hide are first transformed into frame-shared message coefficients. Then, these coefficients are assembled with the intermediate feature maps of video generators at multiple feature levels to generate the embedded video frames. Besides, PADVG further considers two visual consistent losses: (i) intra-frame loss is utilized to keep the visual consistency with different hidden messages; (ii) inter-frame loss is used to preserve the visual consistency across different video frames. Moreover, we also propose an auxiliary denoising training strategy through perturbing the assembled features by learnable pixel-level noise to improve identification performance, while enhancing robustness against real-world disturbances. Extensive experiments demonstrate that our proposed PADVG for audio-driven video generation can effectively identify the generated videos and achieve high visual quality.

References

[1]

Triantafyllos Afouras, Joon Son Chung, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. 2018. Deep audiovisual speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 12 (2018), 8717–8727.

Abstract

References

Cited By

Index Terms

Recommendations

Audio dual watermarking scheme for copyright protection and content authentication

A new content-based digital audio watermarking algorithm for copyright protection

Video Description Generation using Audio and Visual Cues

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Full Text

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations