research-article

A Unified Framework for Jointly Compressing Visual and Semantic Data

Authors:

Hong-Kai XiongAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 7

Article No.: 221, Pages 1 - 24

https://doi.org/10.1145/3654800

Published: 15 May 2024 Publication History

Abstract

The rapid advancement of multimedia and imaging technologies has resulted in increasingly diverse visual and semantic data. A large range of applications such as remote-assisted driving requires the amalgamated storage and transmission of various visual and semantic data. However, existing works suffer from the limitation of insufficiently exploiting the redundancy between different types of data. In this article, we propose a unified framework to jointly compress a diverse spectrum of visual and semantic data, including images, point clouds, segmentation maps, object attributes, and relations. We develop a unifying process that embeds the representations of these data into a joint embedding graph according to their categories, which enables flexible handling of joint compression tasks for various visual and semantic data. To fully leverage the redundancy between different data types, we further introduce an embedding-based adaptive joint encoding process and a Semantic Adaptation Module to efficiently encode diverse data based on the learned embeddings in the joint embedding graph. Experiments on the Cityscapes, MSCOCO, and KITTI datasets demonstrate the superiority of our framework, highlighting promising steps toward scalable multimedia processing.

References

[1]

Mohammad Akbari, Jie Liang, and Jingning Han. 2019. DSSLIC: Deep semantic segmentation-based layered image compression. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). IEEE, 2042–2046.

[2]

Saeed Ranjbar Alvar and Ivan V. Bajić. 2020. Bit allocation for multi-task collaborative intelligence. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). IEEE, 4342–4346.

[3]

Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. 2016. End-to-end optimized image compression. Retrieved from https://arXiv:1611.01704

[4]

Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. 2018. Variational image compression with a scale hyperprior. Retrieved from https://arXiv:1802.01436

[5]

F. Bellard. 2017. BPG image format. Retrieved from http://bellard.org/bpg/

[6]

Gisle Bjøntegaard. 2001. Calculation of average PSNR differences between RD-curves. Retrieved from https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/VCEG-M33.doc

[7]

Jianhui Chang, Qi Mao, Zhenghui Zhao, Shanshe Wang, Shiqi Wang, Hong Zhu, and Siwei Ma. 2019. Layered conceptual image compression via deep semantic synthesis. In Proceedings of the IEEE International Conference on Image Processing (ICIP’19). IEEE, 694–698.

[8]

Jianhui Chang, Jian Zhang, Jiguo Li, Shiqi Wang, Qi Mao, Chuanmin Jia, Siwei Ma, and Wen Gao. 2023. Semantic-aware visual decomposition for image coding. Int. J. Comput. Vision 131, 9 (2023), 1–23.

[9]

Jianhui Chang, Zhenghui Zhao, Chuanmin Jia, Shiqi Wang, Lingbo Yang, Qi Mao, Jian Zhang, and Siwei Ma. 2022. Conceptual compression via deep structure and texture synthesis. IEEE Trans. Image Process. 31 (2022), 2809–2823.

[10]

Jianhui Chang, Zhenghui Zhao, Lingbo Yang, Chuanmin Jia, Jian Zhang, and Siwei Ma. 2021. Thousand to one: Semantic prior modeling for conceptual coding. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’21). IEEE, 1–6.

[11]

Zhuo Chen, Ling-Yu Duan, Shiqi Wang, Weisi Lin, and Alex C. Kot. 2020. Data representation in hybrid coding framework for feature maps compression. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20). IEEE, 3094–3098.

[12]

Zhuo Chen, Kui Fan, Shiqi Wang, Ling-Yu Duan, Weisi Lin, and Alex Kot. 2019. Lossy intermediate deep learning feature compression and evaluation. In Proceedings of the 27th ACM International Conference on Multimedia. 2414–2422.

Digital Library

[13]

Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2020. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7939–7948.

[14]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3213–3223.

[15]

Nan Du and Chuandong Yu. 2020. Research on special effects of film and television movies based on computer virtual production VR technology. In Proceedings of the International Conference on Computers, Information Processing and Advanced Education (CIPAE’20). Association for Computing Machinery, New York, NY, 115–120.

Digital Library

[16]

Chunyang Fu, Ge Li, Rui Song, Wei Gao, and Shan Liu. 2022. Octattention: Octree-based large-scale contexts model for point cloud compression. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 625–633.

[17]

Jean-loup Gailly and Mark Adler. 2004. Zlib compression library. Retrieved from http://www.dspace.cam.ac.uk/handle/1810/3486

[18]

Wen Gao, Siwei Ma, Lingyu Duan, Yonghong Tian, Peiyin Xing, Yaowei Wang, Shanshe Wang, Huizhu Jia, and Tiejun Huang. 2021. Digital retina: A way to make the city brain more efficient by visual coding. IEEE Trans. Circ. Syst. Video Technol. 31, 11 (2021), 4147–4161.

Digital Library

[19]

Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3354–3361.

[20]

Google. 2017. Draco 3D graphics compression. Retrieved from https://github.com/google/draco

[21]

André F. R. Guarda, Nuno M. M. Rodrigues, and Fernando Pereira. 2019. Point cloud coding: Adopting a deep learning-based approach. In Proceedings of the Picture Coding Symposium (PCS’19). IEEE, 1–5.

[22]

Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang. 2022. Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5718–5727.

[23]

Trinh Man Hoang, Jinjia Zhou, and Yibo Fan. 2020. Image compression with encoder-decoder matched semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 160–161.

[24]

Yuzhang Hu, Sifeng Xia, Wenhan Yang, and Jiaying Liu. 2020. Sensitivity-aware bit allocation for intermediate deep feature compression. In Proceedings of the IEEE International Conference on Visual Communications and Image Processing (VCIP’20). IEEE, 475–478.

[25]

Zhenyu Li, Aiguo Zhou, Jiakun Pu, and Jiangyang Yu. 2021. Multi-modal neural feature fusion for automatic driving through perception-aware path planning. IEEE Access 9 (2021), 142782–142794.

[26]

Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. Retrieved from https://arXiv:1312.4400

[27]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision (ECCV’14). Springer, 740–755.

[28]

Weiyao Lin, Xiaoyi He, Wenrui Dai, John See, Tushar Shinde, Hongkai Xiong, and Lingyu Duan. 2020. Key-point sequence lossless compression for intelligent video analysis. IEEE MultiMedia 27, 3 (2020), 12–22.

Digital Library

[29]

Haojie Liu, Tong Chen, Peiyao Guo, Qiu Shen, Xun Cao, Yao Wang, and Zhan Ma. 2019. Non-local attention optimized deep image compression. Retrieved from https://arXiv:1904.09757

[30]

Kang Liu, Dong Liu, Li Li, Ning Yan, and Houqiang Li. 2021. Semantics-to-signal scalable image compression with learned revertible representations. Int. J. Comput. Vision 129, 9 (2021), 2605–2621.

Digital Library

[31]

Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. 2019. Practical full resolution learned lossless image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10629–10638.

[32]

David Minnen, Johannes Ballé, and George D. Toderici. 2018. Joint autoregressive and hierarchical priors for learned image compression. Adv. Neural Info. Process. Syst. 31 (2018).

[33]

MPEGGroup. 2021. Mpeg g-pcc tmc13. Retrieved from https://github.com/MPEGGroup/mpeg-pcc-tmc13

[34]

Jens-Rainer Ohm and Gary J. Sullivan. 2018. Versatile video coding–towards the next generation of video compression. In Proceedings of the Picture Coding Symposium, Vol. 2018.

[35]

P. M. Parekar and S. S. Thakare. 2014. Lossless data compression algorithm–a review. Int. J. Comput. Sci. Info. Technol. 5, 1 (2014).

[36]

Jae Hyun Park, Sanghoon Kim, Joo Chan Lee, and Jong Hwan Ko. 2023. Scalable color quantization for task-centric image compression. ACM Trans. Multimedia Comput. Commun. Appl. 19, 2s (2023), 1–18.

Digital Library

[37]

Yuxiang Peng, Chong Fu, Guixing Cao, Wei Song, Junxin Chen, and Chiu-Wing Sham. 2023. JPEG-compatible joint image compression and encryption algorithm with file size preservation. ACM Trans. Multimedia Comput. Commun. Appl. 20, 4 (2023).

[38]

Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.

[39]

Saurabh Singh, Sami Abu-El-Haija, Nick Johnston, Johannes Ballé, Abhinav Shrivastava, and George Toderici. 2020. End-to-end learning of compressible features. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20). IEEE, 3349–3353.

[40]

Jon Sneyers and Pieter Wuille. 2016. FLIF: Free lossless image format based on MANIAC compression. In Proceedings of the IEEE International Conference on Image Processing (ICIP’16). IEEE, 66–70.

[41]

Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2020. Unified binary generative adversarial network for image retrieval and compression. Int. J. Comput. Vision 128 (2020), 2243–2264.

Digital Library

[42]

Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Video Technol. 22, 12 (2012), 1649–1668.

Digital Library

[43]

Satoshi Suzuki, Motohiro Takagi, Shoichiro Takeda, Ryuichi Tanida, and Hideaki Kimata. 2020. Deep feature compression with spatio-temporal arranging for collaborative intelligence. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20). IEEE, 3099–3103.

[44]

Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased scene graph generation from biased training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3716–3725.

[45]

Gregory K Wallace. 1991. The JPEG still picture compression standard. Commun. ACM 34, 4 (1991), 30–44.

Digital Library

[46]

Jianqiang Wang, Hao Zhu, Zhan Ma, Tong Chen, Haojie Liu, and Qiu Shen. 2019. Learned point cloud geometry compression. Retrieved from https://arXiv:1909.12037

[47]

Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. Retrieved from https://github.com/facebookresearch/detectron2

[48]

Shuyu Yang, Yinan Zhou, Yaxiong Wang, Yujiao Wu, Li Zhu, and Zhedong Zheng. 2023. Towards unified text-based person retrieval: A large-scale multi-attribute and language search benchmark. In Proceedings of the ACM Multimedia Conference.

Digital Library

[49]

Hang Yuan, Wei Gao, Siwei Ma, and Yiqiang Yan. 2023. Divide-and-conquer-based RDO-free CU partitioning for 8K video compression. ACM Trans. Multimedia Comput. Commun. Appl. 20, 4 (2023).

[50]

Jing Zhang and Dacheng Tao. 2020. Empowering things with intelligence: A survey of the progress, challenges, and opportunities in artificial intelligence of things. IEEE Internet Things J. 8, 10 (2020), 7789–7817.

[51]

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’23).

[52]

Qi Zhang, Shanshe Wang, Xinfeng Zhang, Siwei Ma, and Wen Gao. 2021. Just recognizable distortion for machine vision oriented image and video coding. Int. J. Comput. Vision 129, 10 (2021), 2889–2906.

Digital Library

[53]

Tao Zhang. 2020. Toward automated vehicle teleoperation: Vision, opportunities, and challenges. IEEE Internet Things J. 7, 12 (2020), 11347–11354.

[54]

Yufeng Zhang, Weiyao Lin, Wenrui Dai, Huabin Liu, and Hongkai Xiong. 2023. Scene graph lossless compression with adaptive prediction for objects and relations. Retrieved from https://arXiv:2304.13359

[55]

Zhicong Zhang, Mengyang Wang, Mengyao Ma, Jiahui Li, and Xiaopeng Fan. 2021. MSFC: Deep feature compression in multi-task network. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’21). IEEE, 1–6.

Index Terms

A Unified Framework for Jointly Compressing Visual and Semantic Data
1. Computing methodologies
  1. Computer graphics
    1. Image compression

Recommendations

Towards a Programmable Semantic Extract-Transform-Load Framework for Semantic Data Warehouses
DOLAP '15: Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP

In order to create better decisions for business analytics, organizations increasingly use external data, structured, semi-structured and unstructured, in addition to the (mostly structured) internal data. Current Extract-Transform-Load (ETL) tools are ...
Read More
A unified framework for semantic shot representation of sports video
MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval

The development of mid-level shot description helps to bridge the gap between low-level feature and high-level semantics in video indexing and analysis. In this paper, we present a unified framework for semantic shot representation in field-ball sports ...
Read More
A domain independent framework for extracting linked semantic data from tables
Search Computing

Vast amounts of information is encoded in tables found in documents, on the Web, and in spreadsheets or databases. Integrating or searching over this information benefits from understanding its intended meaning and making it explicit in a semantic ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 7

July 2024

973 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3613662

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 May 2024

Online AM: 28 March 2024

Accepted: 21 March 2024

Revised: 26 January 2024

Received: 11 December 2023

Published in TOMM Volume 20, Issue 7

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Ministry of Higher Education (MOHE) Malaysia FRGS Scheme

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
116
Total Downloads

Downloads (Last 12 months)116
Downloads (Last 6 weeks)20

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents