research-article

A Universal Optimization Framework for Learning-based Image Codec

Authors:

Yan LuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 1

Article No.: 16, Pages 1 - 19

https://doi.org/10.1145/3580499

Published: 25 August 2023 Publication History

Abstract

Recently, machine learning-based image compression has attracted increasing interests and is approaching the state-of-the-art compression ratio. But unlike traditional codec, it lacks a universal optimization method to seek efficient representation for different images. In this paper, we develop a plug-and-play optimization framework for seeking higher compression ratio, which can be flexibly applied to existing and potential future compression networks. To make the latent representation more efficient, we propose a novel latent optimization algorithm to adaptively remove the redundancy for each image. Additionally, inspired by the potential of side information for traditional codecs, we introduce side information into our framework, and integrate side information optimization with latent optimization to further enhance the compression ratio. In particular, with the joint side information and latent optimization, we can achieve fine rate control using only single model instead of training different models for different rate-distortion trade-offs, which significantly reduces the training and storage cost to support multiple bit rates. Experimental results demonstrate that our proposed framework can remarkably boost the machine learning-based compression ratio, achieving more than 10% additional bit rate saving on three different representative network structures. With the proposed optimization framework, we can achieve 7.6% bit rate saving against the latest traditional coding standard VVC on Kodak dataset, yielding the state-of-the-art compression ratio.

References

[1]

Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, and Luc Van Gool. 2017. Soft-to-hard vector quantization for end-to-end learning compressible representations. Neural Information Processing Systems (2017), 1141–1151.

[2]

Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, and Luc Van Gool. 2019. Generative adversarial networks for extreme learned image compression. IEEE International Conference on Computer Vision (2019), 221–231.

[3]

Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. 2016. Density modeling of images using a generalized normalization transformation. International Conference on Learning Representations (2016).

[4]

Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. 2016. End-to-end optimization of nonlinear transform codes for perceptual quality. Picture Coding Symposium (PCS) (2016), 1–5.

[5]

Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. 2017. End-to-end optimized image compression. International Conference on Learning Representations (2017).

[6]

Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. 2018. Variational image compression with a scale hyperprior. International Conference on Learning Representations (2018).

[7]

Jean Bégaint, Fabien Racapé, Simon Feltman, and Akshay Pushparaja. 2020. CompressAI: A PyTorch library and evaluation platform for end-to-end compression research. arXiv preprint arXiv:2011.03029 (2020).

[8]

Gisle Bjontegaard. 2001. Calculation of average PSNR differences between RD-curves. VCEG-M33 (2001).

[9]

Frank Bossen, Xiang Li, and Karsten Suehring. 2004. JVET AHG report: Test model software development (AHG3). JVET document, JVET-Q0003, Jan. 2020.

[10]

Chunlei Cai, Li Chen, Xiaoyun Zhang, and Zhiyong Gao. 2018. Efficient variable rate image compression with multi-scale decomposition network. IEEE Transactions on Circuits and Systems for Video Technology 29, 12 (2018), 3687–3700.

[11]

Joaquim Campos, Simon Meierhans, Abdelaziz Djelouah, and Christopher Schroers. 2019. Content adaptive optimization for neural image compression. IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019).

[12]

Tong Chen, Haojie Liu, Zhan Ma, Qiu Shen, Xun Cao, and Yao Wang. 2021. End-to-end learnt image compression via non-local attention optimization and improved context modeling. IEEE Transactions on Image Processing 30 (2021), 3179–3191.

[13]

Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2019. Deep residual learning for image compression. IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019).

[14]

Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2020. Learned image compression with discretized Gaussian mixture likelihoods and attention modules. IEEE Conference on Computer Vision and Pattern Recognition (2020), 7939–7948.

[15]

Inchoon Choi, Jeyun Lee, and Byeungwoo Jeon. 2006. Fast coding mode selection with rate-distortion optimization for MPEG-4 part-10 AVC/H. 264. IEEE Transactions on Circuits and Systems for Video Technology 16, 12 (2006), 1557–1561.

Digital Library

[16]

Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. 2019. Variable rate deep image compression with a conditional autoencoder. IEEE International Conference on Computer Vision (2019), 3146–3154.

[17]

CLIC. 2021. Workshop and Challenge on Learned Image Compression. http://www.compression.cc/challenge/.

[18]

Ze Cui, Jing Wang, Shangyin Gao, Tiansheng Guo, Yihui Feng, and Bo Bai. 2021. Asymmetric gained deep image compression with continuous rate adaptation. IEEE Conference on Computer Vision and Pattern Recognition (2021), 10532–10541.

[19]

Chih-Ming Fu, Elena Alshina, Alexander Alshin, Yu-Wen Huang, Ching-Yeh Chen, Chia-Yang Tsai, Chih-Wei Hsu, Shaw-Min Lei, Jeong-Hoon Park, and Woo-Jin Han. 2012. Sample adaptive offset in the HEVC standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1755–1764.

Digital Library

[20]

T. Fu, Z. Cheng, J. Hu, L. Guo, S. Wang, X. Zhao, D. Zhou, and Y. Song. 2021. Quality enhancement of VVC intra-frame coding based on HGRDN. 4th Challenge on Learned Image Compression (June2021).

[21]

Zongyu Guo, Yaojun Wu, Runsen Feng, Zhizheng Zhang, and Zhibo Chen. 2020. 3-D context entropy model for improved practical image compression. IEEE Conference on Computer Vision and Pattern Recognition Workshops (2020), 116–117.

[22]

Zongyu Guo, Zhizheng Zhang, Runsen Feng, and Zhibo Chen. 2021. Causal contextual prediction for learned image compression. IEEE Transactions on Circuits and Systems for Video Technology (2021).

[23]

HM. 2021. HEVC Reference Software. https://vcgit.hhi.fraunhofer.de/jvet/HM.

[24]

Yueyu Hu, Wenhan Yang, and Jiaying Liu. 2020. Coarse-to-fine hyper-prior modeling for learned image compression. AAAI Conference on Artificial Intelligence 34, 07 (2020), 11013–11020.

[25]

Yueyu Hu, Wenhan Yang, Zhan Ma, and Jiaying Liu. 2021. Learning end-to-end lossy image compression: A benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).

Digital Library

[26]

Yi-Hsin Huang, Tao-Sheng Ou, Po-Yen Su, and Homer H. Chen. 2010. Perceptual rate-distortion optimization using structural similarity index as quality metric. IEEE Transactions on Circuits and Systems for Video Technology 20, 11 (2010), 1614–1624.

Digital Library

[27]

Nick Johnston, Damien Vincent, David Minnen, Michele Covell, Saurabh Singh, Troy Chinen, Sung Jin Hwang, Joel Shor, and George Toderici. 2018. Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. IEEE Conference on Computer Vision and Pattern Recognition (2018), 4385–4393.

[28]

Jan Klopp, Yu-Chiang Frank Wang, Shao-Yi Chien, and Liang-Gee Chen. 2018. Learning a code-space predictor by exploiting intra-image-dependencies. British Machine Vision Virtual Conference (2018), 124.

[29]

Kodak. 2013. Kodak lossless true color image suite. http://r0k.us/graphics/kodak/.

[30]

Hoyoung Lee, Seungha Yang, Younghyeon Park, and Byeungwoo Jeon. 2015. Fast quantization method with simplified rate–distortion optimized quantization for an HEVC encoder. IEEE Transactions on Circuits and Systems for Video Technology 26, 1 (2015), 107–116.

Digital Library

[31]

Jooyoung Lee, Seunghyun Cho, and Seung-Kwon Beack. 2018. Context-adaptive entropy model for end-to-end optimized image compression. International Conference on Learning Representations (2018).

[32]

Mu Li, Wangmeng Zuo, Shuhang Gu, Debin Zhao, and David Zhang. 2018. Learning convolutional networks for content-weighted image compression. IEEE Conference on Computer Vision and Pattern Recognition (2018), 3214–3223.

[33]

Jerry Liu, Shenlong Wang, Wei-Chiu Ma, Meet Shah, Rui Hu, Pranaab Dhawan, and Raquel Urtasun. 2020. Conditional entropy coding for efficient video compression. European Conference on Computer Vision (2020), 453–468.

[34]

Guo Lu, Chunlei Cai, Xiaoyun Zhang, Li Chen, Wanli Ouyang, Dong Xu, and Zhiyong Gao. 2020. Content adaptive and error propagation aware deep video compression. European Conference on Computer Vision (2020), 456–472.

[35]

Haichuan Ma, Dong Liu, Ning Yan, Houqiang Li, and Feng Wu. 2020. End-to-end optimized versatile image compression with wavelet-like transform. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).

[36]

Siwei Ma, Wen Gao, and Yan Lu. 2005. Rate-distortion analysis for H. 264/AVC video coding and its application to rate control. IEEE Transactions on Circuits and Systems for Video Technology 15, 12 (2005), 1533–1544.

Digital Library

[37]

Siwei Ma, Xinfeng Zhang, Chuanmin Jia, Zhenghui Zhao, Shiqi Wang, and Shanshe Wang. 2019. Image and video compression with neural networks: A review. IEEE Transactions on Circuits and Systems for Video Technology 30, 6 (2019), 1683–1698.

[38]

Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. 2018. Conditional probability models for deep image compression. IEEE Conference on Computer Vision and Pattern Recognition (2018), 4394–4402.

[39]

Fabian Mentzer, George Toderici, Michael Tschannen, and Eirikur Agustsson. 2020. High-fidelity generative image compression. Neural Information Processing Systems (2020).

[40]

David Minnen, Johannes Ballé, and George Toderici. 2018. Joint autoregressive and hierarchical priors for learned image compression. Neural Information Processing Systems (2018).

[41]

David Minnen and Saurabh Singh. 2020. Channel-wise autoregressive entropy models for learned image compression. IEEE International Conference on Image Processing (2020), 3339–3343.

[42]

Dipti Mishra, Satish Kumar Singh, and Rajat Kumar Singh. 2020. Wavelet-based deep auto encoder-decoder (WDAED)-based image compression. IEEE Transactions on Circuits and Systems for Video Technology 31, 4 (2020), 1452–1462.

[43]

Jens-Rainer Ohm and Gary J. Sullivan. 2018. Versatile video coding–towards the next generation of video compression. Picture Coding Symposium 2018 (2018).

[44]

OpenJPEG. 2000. JPEG2000 Reference Software. https://jpeg.org/jpeg2000/software.html.

[45]

Majid Rabbani and Rajan Joshi. 2002. An overview of the JPEG 2000 still image compression standard. Signal Processing: Image Communication 17, 1 (2002), 3–48.

[46]

Oren Rippel and Lubomir Bourdev. 2017. Real-time adaptive image compression. International Conference on Machine Learning (2017), 2922–2930.

[47]

Myungseo Song, Jinyoung Choi, and Bohyung Han. 2021. Variable-rate deep image compression through spatially-adaptive feature transform. IEEE International Conference on Computer Vision (2021), 2380–2389.

[48]

Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649–1668.

Digital Library

[49]

Gary J. Sullivan and Thomas Wiegand. 1998. Rate-distortion optimization for video compression. IEEE Signal Processing Magazine 15, 6 (1998), 74–90.

[50]

Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Huszár. 2017. Lossy image compression with compressive autoencoders. International Conference on Learning Representations (2017).

[51]

George Toderici, Sean M. O’Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, and Rahul Sukthankar. 2015. Variable rate image compression with recurrent neural networks. International Conference on Learning Representations (2015).

[52]

George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, and Michele Covell. 2017. Full resolution image compression with recurrent neural networks. IEEE Conference on Computer Vision and Pattern Recognition (2017), 5306–5314.

[53]

Chia-Yang Tsai, Ching-Yeh Chen, Tomoo Yamakage, In Suk Chong, Yu-Wen Huang, Chih-Ming Fu, Takayuki Itoh, Takashi Watanabe, Takeshi Chujoh, Marta Karczewicz, et al. 2013. Adaptive loop filtering for video coding. IEEE Journal of Selected Topics in Signal Processing 7, 6 (2013), 934–945.

[54]

Michael Tschannen, Eirikur Agustsson, and Mario Lucic. 2018. Deep generative models for distribution-preserving lossy compression. Neural Information Processing Systems (2018).

[55]

VTM. 2021. VVC Reference Software. https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware _VTM. Accessed: 2021-03-09.

[56]

Gregory K. Wallace. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1 (1992).

Digital Library

[57]

Yefei Wang, Dong Liu, Siwei Ma, Feng Wu, and Wen Gao. 2020. Ensemble learning-based rate-distortion optimization for end-to-end image compression. IEEE Transactions on Circuits and Systems for Video Technology 31, 3 (2020), 1193–1207.

[58]

Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.

Digital Library

[59]

Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T. Freeman. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127, 8 (2019), 1106–1125.

Digital Library

[60]

En-Hui Yang and Xiang Yu. 2007. Rate distortion optimization for H. 264 interframe coding: A general framework and algorithms. IEEE Transactions on Image Processing 16, 7 (2007), 1774–1784.

Digital Library

[61]

Fei Yang, Luis Herranz, Yongmei Cheng, and Mikhail G. Mozerov. 2021. Slimmable compressive autoencoders for practical neural image compression. IEEE Conference on Computer Vision and Pattern Recognition (2021), 4998–5007.

[62]

Kaifang Yang, Shuai Wan, Yanchao Gong, Hong Ren Wu, and Yan Feng. 2015. Perceptual based SAO rate-distortion optimization method with a simplified JND model for H. 265/HEVC. Signal Processing: Image Communication 31 (2015), 10–24.

Digital Library

[63]

Kai Zhang, Yawei Li, Wangmeng Zuo, Lei Zhang, Luc Van Gool, and Radu Timofte. 2021. Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).

Digital Library

[64]

Jing Zhao, Bin Li, Jiahao Li, Ruiqin Xiong, and Yan Lu. 2021. A universal encoder rate distortion optimization framework for learned compression. IEEE Conference on Computer Vision and Pattern Recognition Workshops (2021), 1880–1884.

[65]

Xin Zhao, Li Zhang, Siwei Ma, and Wen Gao. 2011. Video coding with rate-distortion optimized transform. IEEE Transactions on Circuits and Systems for Video Technology 22, 1 (2011), 138–151.

Digital Library

Cited By

Bowen DHaiquan WYuxuan LZhao JMa YRunhe H(2024)Fair and Robust Federated Learning via Decentralized and Adaptive Aggregation based on BlockchainACM Transactions on Sensor Networks10.1145/3673656Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3673656
Lin WZhang YDai WLiu HSee JXiong H(2024)Scene Graph Lossless Compression with Adaptive Prediction for Objects and RelationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364950320:7(1-23)Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1145/3649503

Index Terms

A Universal Optimization Framework for Learning-based Image Codec
1. Computing methodologies
  1. Computer graphics
    1. Image compression

Recommendations

Low complexity encoder optimization for HEVC

Considering the content characteristic of different PU, an optimization scheme is proposed.Based on the R-D cost distribution, an early termination scheme of coding unit (CU) splitting is proposed.Based on the reference frame distribution, an adaptive ...
Fast H.264/MPEG-4 AVC Transcoding Using Power-Spectrum Based Rate-Distortion Optimization

Since variable block-size motion compensation (MC) and rate-distortion optimization (RDO) techniques are adopted in H.264/MPEG-4 AVC, modes and motion vectors (MVs) in input stream can no longer be reused equivalently efficient over a wide range of bit ...
Conditional Entropy Coding of VQ Indexes for Image Compression
DCC '97: Proceedings of the Conference on Data Compression

Vector quantization (VQ) is a source coding methodology with provable rate-distortion optimality. However, despite more than two decades of intensive research, VQ theoretical promise is yet to be fully realized in image compression practice. Restricted ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 1

January 2024

639 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3613542

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2023

Online AM: 03 July 2023

Accepted: 04 January 2023

Revised: 14 October 2022

Received: 04 May 2022

Published in TOMM Volume 20, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
366
Total Downloads

Downloads (Last 12 months)366
Downloads (Last 6 weeks)36

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bowen DHaiquan WYuxuan LZhao JMa YRunhe H(2024)Fair and Robust Federated Learning via Decentralized and Adaptive Aggregation based on BlockchainACM Transactions on Sensor Networks10.1145/3673656Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3673656
Lin WZhang YDai WLiu HSee JXiong H(2024)Scene Graph Lossless Compression with Adaptive Prediction for Objects and RelationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364950320:7(1-23)Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1145/3649503

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents