Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior

Li, Zhiyuan; Zhou, Yanhui; Wei, Hao; Ge, Chenyang; Jiang, Jingwen

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2404.18820v2 (eess)

[Submitted on 29 Apr 2024 (v1), revised 13 Jun 2024 (this version, v2), latest version 4 Sep 2024 (v4)]

Title:Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior

Authors:Zhiyuan Li, Yanhui Zhou, Hao Wei, Chenyang Ge, Jingwen Jiang

View PDF HTML (experimental)

Abstract:Image compression at extremely low bitrates (below 0.1 bits per pixel (bpp)) is a significant challenge due to substantial information loss. In this work, we propose a novel two-stage extreme image compression framework that exploits the powerful generative capability of pre-trained diffusion models to achieve realistic image reconstruction at extremely low bitrates. In the first stage, we treat the latent representation of images in the diffusion space as guidance, employing a VAE-based compression approach to compress images and initially decode the compressed information into content variables. The second stage leverages pre-trained stable diffusion to reconstruct images under the guidance of content variables. Specifically, we introduce a small control module to inject content information while keeping the stable diffusion model fixed to maintain its generative capability. Furthermore, we design a space alignment loss to force the content variables to align with the diffusion space and provide the necessary constraints for optimization. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches in terms of visual performance at extremely low bitrates.

Comments:	Submitted to IEEE TCSVT
Subjects:	Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.18820 [eess.IV]
	(or arXiv:2404.18820v2 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2404.18820

Submission history

From: Zhiyuan Li [view email]
[v1] Mon, 29 Apr 2024 16:02:38 UTC (9,274 KB)
[v2] Thu, 13 Jun 2024 05:41:27 UTC (13,525 KB)
[v3] Sun, 28 Jul 2024 05:34:10 UTC (24,770 KB)
[v4] Wed, 4 Sep 2024 03:35:58 UTC (21,271 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators