Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503161.3548136acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Disparity-based Stereo Image Compression with Aligned Cross-View Priors

Published: 10 October 2022 Publication History
  • Get Citation Alerts
  • Abstract

    With the wide application of stereo images in various fields, the research on stereo image compression (SIC) attracts extensive attention from academia and industry. The core of SIC is to fully explore the mutual information between the left and right images and reduce redundancy between views as much as possible. In this paper, we propose DispSIC, an end-to-end trainable deep neural network, in which we jointly train a stereo matching model to assist in the image compression task. Based on the stereo matching results (i.e. disparity), the right image can be easily warped to the left view, and only the residuals between the left and right views are encoded for the left image. A three-branch auto-encoder architecture is adopted in DispSIC, which encodes the right image, the disparity map and the residuals respectively. During training, the whole network can learn how to adaptively allocate bitrates to these three parts, achieving better rate-distortion performance at the cost of a lower disparity map bitrates. Moreover, we propose a conditional entropy model with aligned cross-view priors for SIC, which takes the warped latents of the right image as priors to improve the accuracy of the probability estimation for the left image. Experimental results demonstrate that our proposed method achieves superior performance compared to other existing SIC methods on the KITTI and InStereo2K datasets both quantitatively and qualitatively.

    Supplementary Material

    MP4 File (MM-fp1638.mp4)
    Recently, learning-based stereo image compression methods have achieved promising results. However, they still suffer from high computation complexity and insufficint compression performance. In this paper, we propose a disparity-based stereo image compression framework, namely DispSIC, which significantly outperforms the state-of-art deep stereo image compression methods.

    References

    [1]
    Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, and Luc Van Gool. 2017. Soft-to-hard vector quantization for end-to-end learning compressible representations. arXiv preprint arXiv:1704.00648 (2017).
    [2]
    Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, and Luc Van Gool. 2019. Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 221--231.
    [3]
    Johannes Ballé, Valero Laparra, and Eero P Simoncelli. 2017. End-to-end Optimized Image Compression. (2017).
    [4]
    Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. 2018. Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018).
    [5]
    Wei Bao, Wei Wang, Yuhua Xu, Yulan Guo, Siyu Hong, and Xiaohu Zhang. 2020. InStereo2K: A large real dataset for stereo matching in indoor scenes. Science China Information Sciences 63, 11 (2020), 1--11.
    [6]
    Jean Bégaint, Fabien Racapé, Simon Feltman, and Akshay Pushparaja. 2020. Compressai: a pytorch library and evaluation platform for end-to-end compression research. arXiv preprint arXiv:2011.03029 (2020).
    [7]
    I Bezzine, Mounir Kaaniche, Saadi Boudjit, and Azeddine Beghdadi. 2018. Sparse optimization of non separable vector lifting scheme for stereo image coding. Journal of Visual Communication and Image Representation 57 (2018), 283--293.
    [8]
    Stan Birchfield and Carlo Tomasi. 1999. Depth discontinuities by pixel-to-pixel stereo. International Journal of Computer Vision 35, 3 (1999), 269--293.
    [9]
    Gisle Bjontegaard. 2001. Calculation of average PSNR differences between RD-curves. VCEG-M33 (2001).
    [10]
    BPG. 2019. BPG Image format. https://bellard.org/bpg/.
    [11]
    Jia-Ren Chang and Yong-Sheng Chen. 2018. Pyramid stereo matching network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5410--5418.
    [12]
    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2020. Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    [13]
    Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. 2019. Variable rate deep image compression with a conditional autoencoder. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3146--3154.
    [14]
    Charilaos Christopoulos, A. S. Skodras, and Touradj Ebrahimi. 2000. The JPEG2000 still image coding system: An overview. IEEE Transactions on Consumer Electronics 46, 4 (2000), 1103--1127.
    [15]
    Ze Cui, Jing Wang, Shangyin Gao, Tiansheng Guo, Yihui Feng, and Bo Bai. 2021. Asymmetric Gained Deep Image Compression With Continuous Rate Adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10532--10541.
    [16]
    Xin Deng, Wenzhe Yang, Ren Yang, Mai Xu, Enpeng Liu, Qianhan Feng, and Radu Timofte. 2021. Deep homography for efficient stereo image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1492--1501.
    [17]
    JN Ellinas and Manolis S Sangriotis. 2004. Stereo image compression using wavelet coefficients morphology. Image and Vision Computing 22, 4 (2004), 281--290.
    [18]
    Markus Flierl and Bernd Girod. 2007. Multiview video compression. IEEE signal processing magazine 24, 6 (2007), 66--76.
    [19]
    Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32, 11 (2013), 1231--1237.
    [20]
    Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, and Hongsheng Li. 2019. Group-wise correlation stereo network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3273--3282.
    [21]
    Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, and Hongwei Qin. 2021. Checkerboard Context Model for Efficient Learned Image Compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14771--14780.
    [22]
    Heiko Hirschmuller. 2007. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on pattern analysis and machine intelligence 30, 2 (2007), 328--341.
    [23]
    Yueyu Hu, Wenhan Yang, and Jiaying Liu. 2020. Coarse-to-Fine Hyper-Prior Modeling for Learned Image Compression. In AAAI Conference on Artificial Intelligence.
    [24]
    Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al . 2015. Spatial transformer networks. Advances in neural information processing systems 28 (2015).
    [25]
    Chuanmin Jia, Zhaoyi Liu, Yao Wang, Siwei Ma, and Wen Gao. 2019. Layered image compression using scalable auto-encoder. In 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE, 431--436.
    [26]
    Aysha Kadaikar, Gabriel Dauphin, and Anissa Mokraoui. 2018. Joint disparity and variable size-block optimization algorithm for stereoscopic image compression. Signal Processing: Image Communication 61 (2018), 1--8.
    [27]
    Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan Kennedy, Abraham Bachrach, and Adam Bry. 2017. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE international conference on computer vision. 66--75.
    [28]
    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    [29]
    Masaki Kitahara, Hideaki Kimata, Shinya Shimizu, Kazuto Kamikura, Yoshiyuki Yashima, Kenji Yamamoto, Tomohiro Yendo, Toshiaki Fujii, and Masayuki Tanimoto. 2006. Multi-view video coding using view interpolation and reference picture selection. In 2006 IEEE International Conference on Multimedia and Expo. IEEE, 97--100.
    [30]
    Andreas Klaus, Mario Sormann, and Konrad Karner. 2006. Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In 18th International Conference on Pattern Recognition (ICPR'06), Vol. 3. IEEE, 15--18.
    [31]
    Vladimir Kolmogorov and Ramin Zabih. 2001. Computing visual correspondence with occlusions using graph cuts. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 2. IEEE, 508--515.
    [32]
    Jooyoung Lee, Seunghyun Cho, and Seungkwon Beack. 2019. Context-adaptive Entropy Model for End-to-end Optimized Image Compression. (2019).
    [33]
    Zhengfa Liang, Yiliu Feng, Yulan Guo, Hengzhu Liu, Wei Chen, Linbo Qiao, Li Zhou, and Jianfeng Zhang. 2018. Learning for disparity estimation through feature constancy. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2811--2820.
    [34]
    Haojie Liu, Tong Chen, Peiyao Guo, Qiu Shen, Xun Cao, Yao Wang, and Zhan Ma. 2019. Non-local attention optimized deep image compression. arXiv preprint arXiv:1904.09757 (2019).
    [35]
    Jerry Liu, Shenlong Wang, and Raquel Urtasun. 2019. DSIC: Deep stereo image compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3136--3145.
    [36]
    M Lukacs. 1986. Predictive coding of multi-viewpoint image sets. In ICASSP'86. IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 11. IEEE, 521--524.
    [37]
    Wenjie Luo, Alexander G Schwing, and Raquel Urtasun. 2016. Efficient deep learning for stereo matching. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5695--5703.
    [38]
    Yi Ma, Yongqi Zhai, Jiayu Yang, Chunhui Yang, and Ronggang Wang. 2021. AFEC: Adaptive Feature Extraction Modules for Learned Image Compression. In Proceedings of the 29th ACM International Conference on Multimedia. 5436--5444.
    [39]
    Emin Martinian, Alexander Behrens, Jun Xin, and Anthony Vetro. 2006. View synthesis for multiview video compression. In Picture Coding Symposium, Vol. 37. 38--39.
    [40]
    Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4040--4048.
    [41]
    Yixin Mei, Li Li, Zhu Li, and Fan Li. 2021. Learning-Based Scalable Image Compression with Latent-Feature Reuse and Prediction. IEEE Transactions on Multimedia (2021).
    [42]
    Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. 2018. Conditional probability models for deep image compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4394--4402.
    [43]
    Fabian Mentzer, George Toderici, Michael Tschannen, and Eirikur Agustsson. 2020. High-fidelity generative image compression. arXiv preprint arXiv:2006.09965 (2020).
    [44]
    Philipp Merkle, Karsten Muller, Aljoscha Smolic, and Thomas Wiegand. 2006. Efficient compression of multi-view video exploiting inter-view dependencies based on H. 264/MPEG4-AVC. In 2006 IEEE International Conference on Multimedia and Expo. IEEE, 1717--1720.
    [45]
    David Minnen, Johannes Ballé, and George D Toderici. 2018. Joint autoregressive and hierarchical priors for learned image compression. In Advances in Neural Information Processing Systems. 10771--10780.
    [46]
    Andrea Pilzer, Stéphane Lathuilière, Dan Xu, Mihai Marian Puscas, Elisa Ricci, and Nicu Sebe. 2019. Progressive fusion for unsupervised binocular depth estimation using cycled networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 10 (2019), 2380--2395.
    [47]
    Yichen Qian, Ming Lin, Xiuyu Sun, Zhiyu Tan, and Rong Jin. 2022. Entroformer: A Transformer-based Entropy Model for Learned Image Compression. arXiv preprint arXiv:2202.05492 (2022).
    [48]
    Oren Rippel and Lubomir Bourdev. 2017. Real-time adaptive image compression. In International Conference on Machine Learning. PMLR, 2922--2930.
    [49]
    Thomas W Ryan, RT Gray, and Bobby R Hunt. 1980. Prediction of correlation errors in stereo-pair images. Optical Engineering 19, 3 (1980), 312--322.
    [50]
    Daniel Scharstein and Richard Szeliski. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision 47, 1 (2002), 7--42.
    [51]
    Xiao Song, Xu Zhao, Liangji Fang, Hanwen Hu, and Yizhou Yu. 2020. Edgestereo: An effective multi-task learning network for stereo matching and edge detection. International Journal of Computer Vision 128, 4 (2020), 910--930.
    [52]
    Rige Su, Zhengxue Cheng, Heming Sun, and Jiro Katto. 2020. Scalable Learned Image Compression With A Recurrent Neural Networks-Based Hyperprior. In 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 3369--3373.
    [53]
    G. J. Sullivan and J. R. Ohm. 2018. Versatile video coding Towards the next generation of video compression. Picture Coding Symposium (2018).
    [54]
    Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on circuits and systems for video technology 22, 12 (2012), 1649--1668.
    [55]
    Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Huszár. 2017. Lossy image compression with compressive autoencoders. arXiv preprint arXiv:1703.00395 (2017).
    [56]
    George Toderici, Sean M O'Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, and Rahul Sukthankar. 2015. Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085 (2015).
    [57]
    Gregory K. Wallace. 1992. The JPEG still picture compression standard. Communications of the Acm 38, 1 (1992), xviii--xxxiv.
    [58]
    Yueqi Xie, Ka Leong Cheng, and Qifeng Chen. 2021. Enhanced invertible encoding for learned image compression. In Proceedings of the 29th ACM International Conference on Multimedia. 162--170.
    [59]
    Haofei Xu and Juyong Zhang. 2020. Aanet: Adaptive aggregation network for efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1959--1968.
    [60]
    Guorun Yang, Hengshuang Zhao, Jianping Shi, Zhidong Deng, and Jiaya Jia. 2018. Segstereo: Exploiting semantic information for disparity estimation. In Proceedings of the European conference on computer vision (ECCV). 636--651.
    [61]
    Lidong Yu, Yucheng Wang, Yuwei Wu, and Yunde Jia. 2018. Deep stereo matching with explicit cost aggregation sub-architecture. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
    [62]
    Jure Zbontar, Yann LeCun, et al . 2016. Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17, 1 (2016), 2287--2318.
    [63]
    Xi Zhang and Xiaolin Wu. 2021. Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13354--13364.
    [64]
    Yiran Zhong, Yuchao Dai, and Hongdong Li. 2017. Self-supervised learning for stereo matching with self-improving ability. arXiv preprint arXiv:1709.00930 (2017).
    [65]
    Yinhao Zhu, Yang Yang, and Taco Cohen. 2022. Transformer-based Transform Coding. In International Conference on Learning Representations.

    Cited By

    View all
    • (2023)Improving Multi-generation Robustness of Learned Image Compression2023 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME55011.2023.00430(2525-2530)Online publication date: Jul-2023

    Index Terms

    1. Disparity-based Stereo Image Compression with Aligned Cross-View Priors

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '22: Proceedings of the 30th ACM International Conference on Multimedia
      October 2022
      7537 pages
      ISBN:9781450392037
      DOI:10.1145/3503161
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 October 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. deep learning
      2. stereo image compression
      3. stereo matching

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MM '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 995 of 4,171 submissions, 24%

      Upcoming Conference

      MM '24
      The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)81
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Improving Multi-generation Robustness of Learned Image Compression2023 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME55011.2023.00430(2525-2530)Online publication date: Jul-2023

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media