Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Presentation + Paper
7 June 2024 Efficient and consistent zero-shot video generation with diffusion models
Ethan Frakes, Umar Khalid, Chen Chen
Author Affiliations +
Abstract
Recent diffusion-based generative models employ methods such as one-shot fine-tuning an image diffusion model for video generation. However, this leads to long video generation times and suboptimal efficiency. To resolve this long generation time, zero-shot text-to-video models eliminate the fine-tuning method entirely and can generate novel videos from a text prompt alone. While the zero-shot generation method greatly reduces generation time, many models rely on inefficient cross-frame attention processors, hindering the diffusion model’s utilization for real-time video generation. We address this issue by introducing more efficient attention processors to a video diffusion model. Specifically, we use attention processors (i.e. xFormers, FlashAttention, and HyperAttention) that are highly optimized for efficiency and hardware parallelization. We then apply these processors to a video generator and test with both older diffusion models such as Stable Diffusion 1.5 and newer, high-quality models such as Stable Diffusion XL. Our results show that using efficient attention processors alone can reduce generation time by around 25%, while not resulting in any change in video quality. Combined with the use of higher quality models, this use of efficient attention processors in zero-shot generation presents a substantial efficiency and quality increase, greatly expanding the video diffusion model’s application to real-time video generation.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Ethan Frakes, Umar Khalid, and Chen Chen "Efficient and consistent zero-shot video generation with diffusion models", Proc. SPIE 13034, Real-Time Image Processing and Deep Learning 2024, 1303407 (7 June 2024); https://doi.org/10.1117/12.3013575
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Diffusion

Video processing

Video acceleration

Depth maps

Motion models

Visual process modeling

RELATED CONTENT

Segmentation of kidney stones in endoscopic video feeds
Proceedings of SPIE (April 04 2022)
Novel CNN approach for video prediction based on FitVid
Proceedings of SPIE (March 25 2023)
Rate control for immersive video depth map coding
Proceedings of SPIE (October 20 2023)
Deep RNNs for video denoising
Proceedings of SPIE (September 28 2016)

Back to Top