Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Wang, Hongjie; Liu, Difan; Kang, Yan; Li, Yijun; Lin, Zhe; Jha, Niraj K.; Liu, Yuchen

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.05252 (cs)

[Submitted on 8 May 2024]

Title:Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Authors:Hongjie Wang, Difan Liu, Yan Kang, Yijun Li, Zhe Lin, Niraj K. Jha, Yuchen Liu

View PDF HTML (experimental)

Abstract:Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable. To this end, we introduce the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework that leverages attention maps to perform run-time pruning of redundant tokens, without the need for any retraining. Specifically, for single-denoising-step pruning, we develop a novel ranking algorithm, Generalized Weighted Page Rank (G-WPR), to identify redundant tokens, and a similarity-based recovery method to restore tokens for the convolution operation. In addition, we propose a Denoising-Steps-Aware Pruning (DSAP) approach to adjust the pruning budget across different denoising timesteps for better generation quality. Extensive evaluations show that AT-EDM performs favorably against prior art in terms of efficiency (e.g., 38.8% FLOPs saving and up to 1.53x speed-up over Stable Diffusion XL) while maintaining nearly the same FID and CLIP scores as the full model. Project webpage: this https URL.

Comments:	Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
Cite as:	arXiv:2405.05252 [cs.CV]
	(or arXiv:2405.05252v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.05252

Submission history

From: Hongjie Wang [view email]
[v1] Wed, 8 May 2024 17:56:47 UTC (31,641 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators