Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Unified Framework for Jointly Compressing Visual and Semantic Data

Published: 15 May 2024 Publication History
  • Get Citation Alerts
  • Abstract

    The rapid advancement of multimedia and imaging technologies has resulted in increasingly diverse visual and semantic data. A large range of applications such as remote-assisted driving requires the amalgamated storage and transmission of various visual and semantic data. However, existing works suffer from the limitation of insufficiently exploiting the redundancy between different types of data. In this article, we propose a unified framework to jointly compress a diverse spectrum of visual and semantic data, including images, point clouds, segmentation maps, object attributes, and relations. We develop a unifying process that embeds the representations of these data into a joint embedding graph according to their categories, which enables flexible handling of joint compression tasks for various visual and semantic data. To fully leverage the redundancy between different data types, we further introduce an embedding-based adaptive joint encoding process and a Semantic Adaptation Module to efficiently encode diverse data based on the learned embeddings in the joint embedding graph. Experiments on the Cityscapes, MSCOCO, and KITTI datasets demonstrate the superiority of our framework, highlighting promising steps toward scalable multimedia processing.

    References

    [1]
    Mohammad Akbari, Jie Liang, and Jingning Han. 2019. DSSLIC: Deep semantic segmentation-based layered image compression. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). IEEE, 2042–2046.
    [2]
    Saeed Ranjbar Alvar and Ivan V. Bajić. 2020. Bit allocation for multi-task collaborative intelligence. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). IEEE, 4342–4346.
    [3]
    Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. 2016. End-to-end optimized image compression. Retrieved from https://arXiv:1611.01704
    [4]
    Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. 2018. Variational image compression with a scale hyperprior. Retrieved from https://arXiv:1802.01436
    [5]
    F. Bellard. 2017. BPG image format. Retrieved from http://bellard.org/bpg/
    [6]
    Gisle Bjøntegaard. 2001. Calculation of average PSNR differences between RD-curves. Retrieved from https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/VCEG-M33.doc
    [7]
    Jianhui Chang, Qi Mao, Zhenghui Zhao, Shanshe Wang, Shiqi Wang, Hong Zhu, and Siwei Ma. 2019. Layered conceptual image compression via deep semantic synthesis. In Proceedings of the IEEE International Conference on Image Processing (ICIP’19). IEEE, 694–698.
    [8]
    Jianhui Chang, Jian Zhang, Jiguo Li, Shiqi Wang, Qi Mao, Chuanmin Jia, Siwei Ma, and Wen Gao. 2023. Semantic-aware visual decomposition for image coding. Int. J. Comput. Vision 131, 9 (2023), 1–23.
    [9]
    Jianhui Chang, Zhenghui Zhao, Chuanmin Jia, Shiqi Wang, Lingbo Yang, Qi Mao, Jian Zhang, and Siwei Ma. 2022. Conceptual compression via deep structure and texture synthesis. IEEE Trans. Image Process. 31 (2022), 2809–2823.
    [10]
    Jianhui Chang, Zhenghui Zhao, Lingbo Yang, Chuanmin Jia, Jian Zhang, and Siwei Ma. 2021. Thousand to one: Semantic prior modeling for conceptual coding. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’21). IEEE, 1–6.
    [11]
    Zhuo Chen, Ling-Yu Duan, Shiqi Wang, Weisi Lin, and Alex C. Kot. 2020. Data representation in hybrid coding framework for feature maps compression. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20). IEEE, 3094–3098.
    [12]
    Zhuo Chen, Kui Fan, Shiqi Wang, Ling-Yu Duan, Weisi Lin, and Alex Kot. 2019. Lossy intermediate deep learning feature compression and evaluation. In Proceedings of the 27th ACM International Conference on Multimedia. 2414–2422.
    [13]
    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2020. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7939–7948.
    [14]
    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3213–3223.
    [15]
    Nan Du and Chuandong Yu. 2020. Research on special effects of film and television movies based on computer virtual production VR technology. In Proceedings of the International Conference on Computers, Information Processing and Advanced Education (CIPAE’20). Association for Computing Machinery, New York, NY, 115–120.
    [16]
    Chunyang Fu, Ge Li, Rui Song, Wei Gao, and Shan Liu. 2022. Octattention: Octree-based large-scale contexts model for point cloud compression. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 625–633.
    [17]
    Jean-loup Gailly and Mark Adler. 2004. Zlib compression library. Retrieved from http://www.dspace.cam.ac.uk/handle/1810/3486
    [18]
    Wen Gao, Siwei Ma, Lingyu Duan, Yonghong Tian, Peiyin Xing, Yaowei Wang, Shanshe Wang, Huizhu Jia, and Tiejun Huang. 2021. Digital retina: A way to make the city brain more efficient by visual coding. IEEE Trans. Circ. Syst. Video Technol. 31, 11 (2021), 4147–4161.
    [19]
    Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3354–3361.
    [20]
    Google. 2017. Draco 3D graphics compression. Retrieved from https://github.com/google/draco
    [21]
    André F. R. Guarda, Nuno M. M. Rodrigues, and Fernando Pereira. 2019. Point cloud coding: Adopting a deep learning-based approach. In Proceedings of the Picture Coding Symposium (PCS’19). IEEE, 1–5.
    [22]
    Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang. 2022. Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5718–5727.
    [23]
    Trinh Man Hoang, Jinjia Zhou, and Yibo Fan. 2020. Image compression with encoder-decoder matched semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 160–161.
    [24]
    Yuzhang Hu, Sifeng Xia, Wenhan Yang, and Jiaying Liu. 2020. Sensitivity-aware bit allocation for intermediate deep feature compression. In Proceedings of the IEEE International Conference on Visual Communications and Image Processing (VCIP’20). IEEE, 475–478.
    [25]
    Zhenyu Li, Aiguo Zhou, Jiakun Pu, and Jiangyang Yu. 2021. Multi-modal neural feature fusion for automatic driving through perception-aware path planning. IEEE Access 9 (2021), 142782–142794.
    [26]
    Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. Retrieved from https://arXiv:1312.4400
    [27]
    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision (ECCV’14). Springer, 740–755.
    [28]
    Weiyao Lin, Xiaoyi He, Wenrui Dai, John See, Tushar Shinde, Hongkai Xiong, and Lingyu Duan. 2020. Key-point sequence lossless compression for intelligent video analysis. IEEE MultiMedia 27, 3 (2020), 12–22.
    [29]
    Haojie Liu, Tong Chen, Peiyao Guo, Qiu Shen, Xun Cao, Yao Wang, and Zhan Ma. 2019. Non-local attention optimized deep image compression. Retrieved from https://arXiv:1904.09757
    [30]
    Kang Liu, Dong Liu, Li Li, Ning Yan, and Houqiang Li. 2021. Semantics-to-signal scalable image compression with learned revertible representations. Int. J. Comput. Vision 129, 9 (2021), 2605–2621.
    [31]
    Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. 2019. Practical full resolution learned lossless image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10629–10638.
    [32]
    David Minnen, Johannes Ballé, and George D. Toderici. 2018. Joint autoregressive and hierarchical priors for learned image compression. Adv. Neural Info. Process. Syst. 31 (2018).
    [33]
    MPEGGroup. 2021. Mpeg g-pcc tmc13. Retrieved from https://github.com/MPEGGroup/mpeg-pcc-tmc13
    [34]
    Jens-Rainer Ohm and Gary J. Sullivan. 2018. Versatile video coding–towards the next generation of video compression. In Proceedings of the Picture Coding Symposium, Vol. 2018.
    [35]
    P. M. Parekar and S. S. Thakare. 2014. Lossless data compression algorithm–a review. Int. J. Comput. Sci. Info. Technol. 5, 1 (2014).
    [36]
    Jae Hyun Park, Sanghoon Kim, Joo Chan Lee, and Jong Hwan Ko. 2023. Scalable color quantization for task-centric image compression. ACM Trans. Multimedia Comput. Commun. Appl. 19, 2s (2023), 1–18.
    [37]
    Yuxiang Peng, Chong Fu, Guixing Cao, Wei Song, Junxin Chen, and Chiu-Wing Sham. 2023. JPEG-compatible joint image compression and encryption algorithm with file size preservation. ACM Trans. Multimedia Comput. Commun. Appl. 20, 4 (2023).
    [38]
    Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.
    [39]
    Saurabh Singh, Sami Abu-El-Haija, Nick Johnston, Johannes Ballé, Abhinav Shrivastava, and George Toderici. 2020. End-to-end learning of compressible features. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20). IEEE, 3349–3353.
    [40]
    Jon Sneyers and Pieter Wuille. 2016. FLIF: Free lossless image format based on MANIAC compression. In Proceedings of the IEEE International Conference on Image Processing (ICIP’16). IEEE, 66–70.
    [41]
    Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2020. Unified binary generative adversarial network for image retrieval and compression. Int. J. Comput. Vision 128 (2020), 2243–2264.
    [42]
    Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Video Technol. 22, 12 (2012), 1649–1668.
    [43]
    Satoshi Suzuki, Motohiro Takagi, Shoichiro Takeda, Ryuichi Tanida, and Hideaki Kimata. 2020. Deep feature compression with spatio-temporal arranging for collaborative intelligence. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20). IEEE, 3099–3103.
    [44]
    Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased scene graph generation from biased training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3716–3725.
    [45]
    Gregory K Wallace. 1991. The JPEG still picture compression standard. Commun. ACM 34, 4 (1991), 30–44.
    [46]
    Jianqiang Wang, Hao Zhu, Zhan Ma, Tong Chen, Haojie Liu, and Qiu Shen. 2019. Learned point cloud geometry compression. Retrieved from https://arXiv:1909.12037
    [47]
    Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. Retrieved from https://github.com/facebookresearch/detectron2
    [48]
    Shuyu Yang, Yinan Zhou, Yaxiong Wang, Yujiao Wu, Li Zhu, and Zhedong Zheng. 2023. Towards unified text-based person retrieval: A large-scale multi-attribute and language search benchmark. In Proceedings of the ACM Multimedia Conference.
    [49]
    Hang Yuan, Wei Gao, Siwei Ma, and Yiqiang Yan. 2023. Divide-and-conquer-based RDO-free CU partitioning for 8K video compression. ACM Trans. Multimedia Comput. Commun. Appl. 20, 4 (2023).
    [50]
    Jing Zhang and Dacheng Tao. 2020. Empowering things with intelligence: A survey of the progress, challenges, and opportunities in artificial intelligence of things. IEEE Internet Things J. 8, 10 (2020), 7789–7817.
    [51]
    Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’23).
    [52]
    Qi Zhang, Shanshe Wang, Xinfeng Zhang, Siwei Ma, and Wen Gao. 2021. Just recognizable distortion for machine vision oriented image and video coding. Int. J. Comput. Vision 129, 10 (2021), 2889–2906.
    [53]
    Tao Zhang. 2020. Toward automated vehicle teleoperation: Vision, opportunities, and challenges. IEEE Internet Things J. 7, 12 (2020), 11347–11354.
    [54]
    Yufeng Zhang, Weiyao Lin, Wenrui Dai, Huabin Liu, and Hongkai Xiong. 2023. Scene graph lossless compression with adaptive prediction for objects and relations. Retrieved from https://arXiv:2304.13359
    [55]
    Zhicong Zhang, Mengyang Wang, Mengyao Ma, Jiahui Li, and Xiaopeng Fan. 2021. MSFC: Deep feature compression in multi-task network. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’21). IEEE, 1–6.

    Index Terms

    1. A Unified Framework for Jointly Compressing Visual and Semantic Data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 7
      July 2024
      973 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3613662
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 May 2024
      Online AM: 28 March 2024
      Accepted: 21 March 2024
      Revised: 26 January 2024
      Received: 11 December 2023
      Published in TOMM Volume 20, Issue 7

      Check for updates

      Author Tags

      1. Joint visual and semantic data compression
      2. visual semantic data
      3. multimedia processing
      4. unified compression framework

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China
      • Ministry of Higher Education (MOHE) Malaysia FRGS Scheme

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 116
        Total Downloads
      • Downloads (Last 12 months)116
      • Downloads (Last 6 weeks)20

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media