research-article

Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach

Authors:

Lingyu DuanAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 1431 - 1442

https://doi.org/10.1145/3581783.3611851

Published: 27 October 2023 Publication History

Abstract

Traditional image codecs prioritize signal fidelity and human perception, often neglecting machine vision tasks. Deep learning approaches have shown promising coding performance by leveraging rich semantic embeddings that can be optimized for both human and machine vision. However, these compact embeddings struggle to represent low-level details like contours and textures, leading to imperfect reconstructions. Additionally, existing learning-based coding tools lack scalability. To address these challenges, this paper presents a content-adaptive diffusion model for scalable image compression. The method encodes accurate texture through a diffusion process, enhancing human perception while preserving important features for machine vision tasks. It employs a Markov palette diffusion model with commonly-used feature extractors and image generators, enabling efficient data compression. By utilizing collaborative texture-semantic feature extraction and pseudo-label generation, the approach accurately learns texture information. A content-adaptive Markov palette diffusion model is then applied to capture both low-level texture and high-level semantic knowledge in a scalable manner. This framework enables elegant compression ratio control by flexibly selecting intermediate diffusion states, eliminating the need for deep learning model re-training at different operating points. Extensive experiments demonstrate the effectiveness of the proposed framework in image reconstruction and downstream machine vision tasks such as object detection, segmentation, and facial landmark detection. It achieves superior perceptual quality scores compared to state-of-the-art methods.

References

[1]

Johannes Ballé, Nick Johnston, and David Minnen. 2018. Integer networks for data compression with latent-variable models. In International Conference on Learning Representations.

[2]

Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. 2018. Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018).

[3]

Arpit Bansal, Eitan Borgnia, Hong-Min Chu, Jie S Li, Hamid Kazemi, Furong Huang, Micah Goldblum, Jonas Geiping, and Tom Goldstein. 2022. Cold diffusion: Inverting arbitrary image transforms without noise. arXiv preprint arXiv:2208.09392 (2022).

[4]

Yochai Blau and Tomer Michaeli. 2018. The perception-distortion tradeoff. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6228--6237.

[5]

Ronald Newbold Bracewell. 1986. The Fourier transform and its applications NY, McGraw-Hill.

[6]

Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Jianle Chen, Gary J Sullivan, and Jens-Rainer Ohm. 2021. Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (2021), 3736--3764.

[7]

Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. 2019. Variable rate deep image compression with a conditional autoencoder. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3146--3154.

[8]

Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. 2020. Retinaface: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5203--5212.

[9]

Yu Deng, Jiaolong Yang, Dong Chen, Fang Wen, and Xin Tong. 2020. Disentangled and controllable face image generation via 3d imitative-contrastive learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5154--5163.

[10]

Keyan Ding, Yi Liu, Xueyi Zou, Shiqi Wang, and Kede Ma. 2021. Locally adaptive structure and texture similarity for image quality assessment. In Proceedings of the 29th ACM International Conference on Multimedia. 2483--2491.

Digital Library

[11]

Tim Dockhorn, Arash Vahdat, and Karsten Kreis. 2022. Genie: Higher-order denoising diffusion solvers. Advances in Neural Information Processing Systems 35 (2022), 30150--30166.

[12]

Lingyu Duan, Jiaying Liu, Wenhan Yang, Tiejun Huang, and Wen Gao. 2020. Video coding for machines: A paradigm of collaborative compression and intelligent analytics. IEEE Transactions on Image Processing 29 (2020), 8680--8695.

Digital Library

[13]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets in advances in neural information processing systems (NIPS). Curran Associates, Inc. Red Hook, NY, USA (2014), 2672--2680.

[14]

Noor Fathima Goose, Jens Petersen, Auke Wiggers, Tianlin Xu, and Guillaume Sautiere. 2023. Neural Image Compression with a Diffusion-Based Decoder. arXiv preprint arXiv:2301.05489 (2023).

[15]

John A Hartigan and Manchek A Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Journal of the royal statistical society. series c (applied statistics) 28, 1 (1979), 100--108.

[16]

Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang. 2022. Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5718--5727.

[17]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729--9738.

[18]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems. 6626--6637.

[19]

Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P Kingma, Ben Poole, Mohammad Norouzi, David J Fleet, et al. 2022. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022).

[20]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840--6851.

[21]

Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Mohammad Norouzi, and Tim Salimans. 2022. Cascaded Diffusion Models for High Fidelity Image Generation. J. Mach. Learn. Res. 23, 47 (2022), 1--33.

[22]

Weixin Hong, Tong Chen, Ming Lu, Shiliang Pu, and Zhan Ma. 2020. Efficient neural image decoding via fixed-point inference. IEEE Transactions on Circuits and Systems for Video Technology 31, 9 (2020), 3618--3630.

[23]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architec-ture for generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 4401--4410.

[24]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).

[25]

Jani Lainema, Frank Bossen, Woo-Jin Han, Junghye Min, and Kemal Ugur. 2012. Intra coding of the HEVC standard. IEEE transactions on circuits and systems for video technology 22, 12 (2012), 1792--1801.

Digital Library

[26]

Kwot Sin Lee, Ngoc-Trung Tran, and Ngai-Man Cheung. 2021. Infomax-gan: Improved adversarial image generation via information maximization and contrastive learning. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 3942--3952.

[27]

Yue Li, Dong Liu, Houqiang Li, Li Li, Feng Wu, Hong Zhang, and Haitao Yang. 2017. Convolutional neural network-based block up-sampling for intra frame coding. IEEE Transactions on Circuits and Systems for Video Technology 28, 9 (2017), 2316--2330.

[28]

Jingyun Liang, Andreas Lugmayr, Kai Zhang, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2021. Hierarchical conditional flow: A unified framework for image super-resolution and image rescaling. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4076--4085.

[29]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision (ECCV). Springer, 740--755.

[30]

Fan Liu and Yong Deng. 2020. Determine the number of unknown targets in open world based on elbow method. IEEE Transactions on Fuzzy Systems 29, 5 (2020), 986--995.

[31]

Rui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, and Hongsheng Li. 2021. Divco: Diverse conditional image synthesis via contrastive generative adversarial network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16377--16386.

[32]

Eric Luhman and Troy Luhman. 2021. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388 (2021).

[33]

Zhaoyang Lyu, Xudong Xu, Ceyuan Yang, Dahua Lin, and Bo Dai. 2022. Accelerating diffusion models via early stop of the diffusion process. arXiv preprint arXiv:2205.12524 (2022).

[34]

Siwei Ma, Tiejun Huang, Cliff Reader, and Wen Gao. 2015. AVS2? Making video coding smarter [standards in a nutshell]. IEEE Signal Processing Magazine 32, 2 (2015), 172--183.

[35]

Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik Kingma, Stefano Ermon, Jonathan Ho, and Tim Salimans. 2023. On distillation of guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14297--14306.

[36]

Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. 2018. Conditional probability models for deep image compression. In Advances in Neural Information Processing Systems (NeurIPS). 4390--4401.

[37]

Fabian Mentzer, George D Toderici, Michael Tschannen, and Eirikur Agustsson. 2020. High-fidelity generative image compression. Advances in Neural Information Processing Systems 33 (2020), 11913--11924.

[38]

David Minnen, Johannes Ballé, and George D Toderici. 2018. Joint autoregressive and hierarchical priors for learned image compression. Advances in neural information processing systems 31 (2018).

[39]

David Minnen, George Toderici, Michele Covell, Troy Chinen, Nick Johnston, Joel Shor, Sung Jin Hwang, Damien Vincent, and Saurabh Singh. 2017. Spatially adaptive image compression using a tiled deep network. In 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2796--2800.

Digital Library

[40]

Debargha Mukherjee and Sanjit K Mitra. 2014. WebP: A new image format for the web. Journal of Signal Processing Systems 74, 3 (2014), 327--338.

[41]

Taesung Park, Alexei A Efros, Richard Zhang, and Jun-Yan Zhu. 2020. Contrastive learning for unpaired image-to-image translation. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IX 16. Springer, 319--345.

Digital Library

[42]

Jonathan Pfaff, Alexey Filippov, Shan Liu, Xin Zhao, Jianle Chen, Santiago DeLuxán-Hernández, Thomas Wiegand, Vasily Rufitskiy, Adarsh Krishnan Ramasubramonian, and Geert Van der Auwera. 2021. Intra prediction and mode coding in VVC. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (2021), 3834--3847.

[43]

Pradeep Ramachandran, Dzung T Nguyen, Vinod Pandit, Cheng Xu, Jianle Li, San Li, Shijun Li, Wenli Xu, Wei Liu, Zongming Li, et al. 2013. x265: A HEVC/H.265 Video Encoder Implementation. IEEE Transactions on Circuits and Systems for Video Technology 23, 9 (2013), 1485--1497.

[44]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684--10695.

[45]

Tim Salimans and Jonathan Ho. 2022. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512 (2022).

[46]

Jérémy Salomon and Thomas Lecroq. 2012. WebP: A new image format for the Web. Signal Processing: Image Communication 27, 3 (2012), 157--167.

[47]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[48]

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning. PMLR, 2256--2265.

[49]

Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on circuits and systems for video technology 22, 12 (2012), 1649--1668.

Digital Library

[50]

Wanjie Sun and Zhenzhong Chen. 2020. Learned image downscaling for upscaling using content adaptive resampler. IEEE Transactions on Image Processing 29 (2020), 4027--4040.

Digital Library

[51]

Gregory K Wallace. 1992. The JPEG still picture compression standard. IEEE transactions on consumer electronics 38, 1 (1992), xviii--xxxiv.

Digital Library

[52]

Ce Wang, Bin He, Shengsen Wu, Renjie Wan, Boxin Shi, and Ling-Yu Duan. 2023. Coarse-to-fine Disentangling Demoiréing Framework for Recaptured Screen Images. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).

Digital Library

[53]

Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, et al. 2022. Internimage: Exploring large-scale vision foundation models with deformable convolutions. arXiv preprint arXiv:2211.05778 (2022).

[54]

Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, and Lei Li. 2021. Dense contrastive learning for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3024--3033.

[55]

Daniel Watson, William Chan, Jonathan Ho, and Mohammad Norouzi. 2021. Learning fast samplers for diffusion models by differentiating through sample quality. In International Conference on Learning Representations.

[56]

Thomas Wiegand, Gary J Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H. 264/AVC video coding standard. IEEE Transactions on circuits and systems for video technology 13, 7 (2003), 560--576.

Digital Library

[57]

Mingqing Xiao, Shuxin Zheng, Chang Liu, Zhouchen Lin, and Tie-Yan Liu. 2023. Invertible Rescaling Network and Its Extensions. International Journal of Computer Vision 131, 1 (2023), 134--159.

Digital Library

[58]

Yueqi Xie, Ka Leong Cheng, and Qifeng Chen. 2021. Enhanced invertible encoding for learned image compression. In Proceedings of the 29th ACM international conference on multimedia. 162--170.

Digital Library

[59]

Shuo Yang, Ping Luo, Chen-Change Loy, and Xiaoou Tang. 2016. Wider face: A face detection benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5525--5533.

[60]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586--595.

[61]

Huangjie Zheng, Pengcheng He, Weizhu Chen, and Mingyuan Zhou. 2022. Truncated diffusion probabilistic models. stat 1050 (2022), 7.

[62]

Yingbo Zhou, Pengcheng Zhao, Weiqin Tong, and Yongxin Zhu. 2021. CDL-GAN: contrastive distance learning generative adversarial network for image generation. Applied Sciences 11, 4 (2021), 1380.

Index Terms

Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach
1. Computing methodologies
2. Information systems
  1. Data management systems
    1. Data structures
      1. Data layout
        Data compression

Index terms have been assigned to the content through auto-classification.

Recommendations

Lossless-by-Lossy Coding for Scalable Lossless Image Compression

This paper presents a method of scalable lossless image compression by means of lossy coding. A progressive decoding capability and a full decoding for the lossless rendition are equipped with the losslessly encoded bit stream. Embedded coding is ...
Conditional Entropy Coding of VQ Indexes for Image Compression
DCC '97: Proceedings of the Conference on Data Compression

Vector quantization (VQ) is a source coding methodology with provable rate-distortion optimality. However, despite more than two decades of intensive research, VQ theoretical promise is yet to be fully realized in image compression practice. Restricted ...
A novel approach to medical image compression

As medical/biological imaging facilities move towards complete film-less imaging, compression plays a key role. Although lossy compression techniques yield high compression rates, the medical community has been reluctant to adopt these methods, largely ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the Basic and Frontier Research Project of PCL
the National Natural Science Foundation of China

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
371
Total Downloads

Downloads (Last 12 months)371
Downloads (Last 6 weeks)33

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents