Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Digging into Depth and Color Spaces: A Mapping Constraint Network for Depth Super-Resolution

Published: 30 October 2024 Publication History

Abstract

Scene depth super-resolution (DSR) poses an inherently ill-posed problem due to the extremely large space of one-to-many mapping functions from a given low-resolution (LR) depth map, which possesses limited depth information, to multiple plausible high-resolution (HR) depth maps. This characteristic renders the task highly challenging, as identifying an optimal solution becomes significantly intricate amidst this multitude of potential mappings. While simplistic constraints have been proposed to address the DSR task, the relationship between LR and HR depth maps and the color image has not been thoroughly investigated. In this paper, we introduce a novel mapping constraint network (MCNet) that incorporates additional constraints derived from both LR depth maps and color images. This integration aims to optimize the space of mapping functions and enhance the performance of DSR. Specifically, alongside the primary DSR network (DSRNet) dedicated to learning LR-to-HR mapping, we have developed an auxiliary degradation network (ADNet) that operates in reverse, generating the LR depth map from the reconstructed HR depth map to obtain depth features in LR space. To enhance the learning process of DSRNet in LR-to-HR mapping, we introduce two mapping constraints in LR space: (1) the cycle-consistent constraint, which offers additional supervision by establishing a closed loop between LR-to-HR and HR-to-LR mappings, and (2) the region-level contrastive constraint, aimed at reinforcing region-specific HR representations by explicitly modeling the consistency between LR and HR spaces. To leverage the color image effectively, we introduce a feature screening module to adaptively fuse color features at different layers, which can simultaneously maintain strong structural context and suppress texture distraction through subspace generation and image projection. Comprehensive experimental results across synthetic and real-world benchmark datasets unequivocally demonstrate the superiority of our proposed method over state-of-the-art DSR methods. Our MCNet achieves an average MAD reduction of 3.7% and 7.5% over state-of-the-art DSR method for ×8 and ×16 cases on Milddleburry dataset, respectively, without incurring additional costs during inference.

References

[1]
D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. 2012. A naturalistic open source movie for optical flow evaluation. In Proceedings of the 12th European Conference on Computer Vision (ECCV ’12). 611–625.
[2]
Jingwen Chen, Jianjie Luo, Yingwei Pan, Yehao Li, Ting Yao, Hongyang Chao, and Tao Mei. 2023. Boosting vision-and-language navigation with direction guiding and backtracing. ACM Trans. Multim. Comput. Commun. Appl. 19, 1 (2023), Article 9, 1–16.
[3]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning (ICML ’20). 1597–1607.
[4]
Xinlei Chen and Kaiming He. 2021. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’21). 15750–15758.
[5]
Xiang Chen, Jinshan Pan, Kui Jiang, Yufeng Li, Yufeng Huang, Caihua Kong, Longgang Dai, and Zhentao Fan. 2022. Unpaired deep image deraining using dual contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’22). 2007–2016.
[6]
Yunjey Choi, Min-Je Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’18). 8789–8797.
[7]
Xin Deng and Pier Luigi Dragotti. 2021. Deep convolutional neural network for multi-modal image restoration and fusion. IEEE Trans. Pattern Anal. Mach. Intell. 43, 10 (2021), 3333–3348.
[8]
Carl Doersch, Abhinav Gupta, and Alexei A. Efros. 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision (ICCV ’15). 1422–1430.
[9]
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2014. Learning a deep convolutional network for image super-resolution. In Proceedings of the 13th European Conference on Computer Vision (ECCV ’14). 184–199.
[10]
Chao Dong, Chen Change Loy, and Xiaoou Tang. 2016. Accelerating the super-resolution convolutional neural network. In Proceedings of the 14th European Conference on Computer Vision (ECCV ’16). 391–407.
[11]
Jiali Duan, Jun Wan, Shuai Zhou, Xiaoyuan Guo, and Stan Z. Li. 2018. A unified framework for multi-modal isolated gesture recognition. ACM Trans. Multim. Comput. Commun. Appl. 14, 1s (2018), Article 21, 1–16.
[12]
David Ferstl, Christian Reinbacher, René Ranftl, Matthias Rüther, and Horst Bischof. 2013. Image guided depth upsampling using anisotropic total generalized variation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV ’13). 993–1000.
[13]
Dandan Gao and Dengwen Zhou. 2023. A very lightweight and efficient image super-resolution network. Expert Syst. Appl. 213, Part (2023), 118898.
[14]
Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Á. Pires, Zhaohan Guo, Mohammad G. Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, and Michal Valko. 2020. Bootstrap your own latent - a new approach to self-supervised learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS ’20). 21271–21284.
[15]
Shuhang Gu, Wangmeng Zuo, Shi Guo, Yunjin Chen, Chongyu Chen, and Lei Zhang. 2017. Learning dynamic guidance for depth image enhancement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’17). 712–721.
[16]
Chunle Guo, Chongyi Li, Jichang Guo, Runmin Cong, Huazhu Fu, and Ping Han. 2019. Hierarchical features driven residual learning for depth map super-resolution. IEEE Trans. Image Processing 28, 5 (2019), 2545–2557.
[17]
Yong Guo, Jian Chen, Jingdong Wang, Qi Chen, Jiezhang Cao, Zeshuai Deng, Yanwu Xu, and Mingkui Tan. 2020. Closed-loop matters: Dual regression networks for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’20). 5406–5415.
[18]
Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. 2018. Deep back-projection networks for super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’18). 1664–1673.
[19]
Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learning for machine translation. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NeurIPS ’16). 820–828.
[20]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross B. Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’20). 9726–9735.
[21]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’16). 770–778.
[22]
Lingzhi He, Hongguang Zhu, Feng Li, Huihui Bai, Runmin Cong, Chunjie Zhang, Chunyu Lin, Meiqin Liu, and Yao Zhao. 2021. Towards fast and accurate real-world depth super-resolution: Benchmark dataset and baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’21). 9229–9238.
[23]
Xin He, Qiong Liu, and You Yang. 2021a. Make full use of priors: Cross-view optimized filter for multi-view depth enhancement. ACM Trans. Multim. Comput. Commun. Appl. 16, 4 (2021), Article 127, 1–19.
[24]
Heiko Hirschmüller and Daniel Scharstein. 2007. Evaluation of cost functions for stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’07). 21–31.
[25]
Tak-Wai Hui, Chen Change Loy, and Xiaoou Tang. 2016. Depth map super-resolution by deep multi-scale guidance. In Proceedings of the 14th European Conference on Computer Vision (ECCV ’16). 353–369.
[26]
Zheng Hui, Xiumei Wang, and Xinbo Gao. 2018. Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’18). 723–731.
[27]
Kui Jiang, Zhongyuan Wang, Peng Yi, and Junjun Jiang. 2020. Hierarchical dense recursive network for image super-resolution. Pattern Recognit. 107 (2020), 107475.
[28]
Kui Jiang, Zhongyuan Wang, Peng Yi, Tao Lu, Junjun Jiang, and Zixiang Xiong. 2022. Dual-path deep fusion network for face image hallucination. IEEE Trans. Neural Networks Learn. Syst. 33, 1 (2022), 378–391.
[29]
Guoliang Kang, Yunchao Wei, Yi Yang, Yueting Zhuang, and Alexander G. Hauptmann. 2020. Pixel-level cycle association: A new perspective for domain adaptive semantic segmentation. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS ’20), Vol. 33. 352–368.
[30]
Beomjun Kim, Jean Ponce, and Bumsub Ham. 2019. Deformable kernel networks for guided depth map upsampling. arXiv: 1903.11286.
[31]
Beomjun Kim, Jean Ponce, and Bumsub Ham. 2021. Deformable kernel networks for joint image filtering. Int. J. Comput. Vis. 129, 2 (2021), 579–600. DOI: https://doi.org/10.1007/s11263-020-01386-z
[32]
Rushi Lan, Long Sun, Zhenbing Liu, Huimin Lu, Cheng Pang, and Xiaonan Luo. 2021. MADNet: A fast and lightweight network for single-image super resolution. IEEE Trans. Cybern. 51, 3 (2021), 1443–1453.
[33]
Yijun Li, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2016. Deep joint image filtering. In Proceedings of the 14th European Conference on Computer Vision (ECCV ’16). 154–169.
[34]
Yijun Li, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2019. Joint image filtering with deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 41, 8 (2019), 1909–1923.
[35]
Caixia Liu, Dehui Kong, Shaofan Wang, Jinghua Li, and Baocai Yin. 2022. A spatial relationship preserving adversarial network for 3D reconstruction from a single depth view. ACM Trans. Multim. Comput. Commun. Appl. 18, 4 (2022), 110:1–110:22.
[36]
Ming-Yu Liu, Oncel Tuzel, and Yuichi Taguchi. 2013. Joint geodesic upsampling of depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’13). 169–176.
[37]
Riccardo Lutio, Stefano D’Aronco, Jan D. Wegner, and Konrad Schindler. 2019. Guided super-resolution as pixel-to-pixel transformation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV ’19). 8828–8836.
[38]
Shunta Maeda. 2020. Unpaired image super-resolution using pseudo-supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’20). 288–297.
[39]
Ram P. Padhy, Pankaj K. Sa, Fabio Narducci, Carmen Bisogni, and Sambit Bakshi. 2024. Monocular vision-aided depth measurement from RGB images for autonomous UAV navigation. ACM Trans. Multim. Comput. Commun. Appl. 20, 2 (2024), Article 37, 1–22.
[40]
Jinshan Pan, Jiangxin Dong, Jimmy S. J. Ren, Liang Lin, Jinhui Tang, and Ming-Hsuan Yang. 2019. Spatially variant linear representation models for joint filtering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’19). 1702–1711.
[41]
Tian Pan, Yibing Song, Tianyu Yang, Wenhao Jiang, and Wei Liu. 2021. VideoMoCo: Contrastive video representation learning with temporally adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’21). 11205–11214.
[42]
Jaesik Park, Hyeongwoo Kim, Yu-Wing Tai, Michael S. Brown, and In-So Kweon. 2011. High quality depth map upsampling for 3D-TOF cameras. In Proceedings of the IEEE International Conference on Computer Vision (ICCV ’11). 1623–1630.
[43]
Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, and Sergey Levine. 2018. Time-contrastive networks: Self-supervised learning from video. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’18). 1134–1141.
[44]
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from RGBD images. In Proceedings of the 12th European Conference on Computer Vision (ECCV ’12). 746–760.
[45]
Amit Kumar Singh, Ashima Anand, Zhihan Lv, Hoon Ko, and Anand Mohan. 2021. A survey on healthcare data: A security perspective. ACM Trans. Multim. Comput. Commun. Appl. 17, 2s (2021), Article 59, 1–26.
[46]
Baoli Sun, Xinchen Ye, Baopu Li, Haojie Li, Zhihui Wang, and Rui Xu. 2021. Learning scene structure guidance via cross-task knowledge transfer for single depth super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’21). 7792–7801.
[47]
Qi Tang, Runmin Cong, Ronghui Sheng, Lingzhi He, Dan Zhang, Yao Zhao, and Sam Kwong. 2021. BridgeNet: A joint learning network of depth map super-resolution and monocular depth estimation. In Proceedings of the 29th ACM International Conference on Multimedia. 2148–2157.
[48]
Jin Wang, Wei Xu, Jian-Feng Cai, Qing Zhu, Yunhui Shi, and Baocai Yin. 2020. Multi-direction dictionary learning based depth map super-resolution with autoregressive modeling. IEEE Trans. Multim. 22, 6 (2020), 1470–1484.
[49]
Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, and Lei Li. 2021. Dense contrastive learning for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’21). 3024–3033.
[50]
Zhihui Wang, Xinchen Ye, Baoli Sun, Jingyu Yang, Rui Xu, and Haojie Li. 2020. Depth upsampling based on deep edge-aware learning. Pattern Recognit. 103 (2020), 107274.
[51]
Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2018. Person transfer GAN to bridge domain gap for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’18). 79–88.
[52]
Wenying Wen, Minghui Huang, Yushu Zhang, Yuming Fang, and Yifan Zuo. 2024. Visual security index combining cnn and filter for perceptually encrypted light field images. ACM Trans. Multim. Comput. Commun. Appl. 20, 1 (2024), Article 25, 1–15.
[53]
Yang Wen, Bin Sheng, Ping Li, Weiyao Lin, and David Dagan Feng. 2019. Deep color guided coarse-to-fine convolutional network cascade for depth image super-resolution. IEEE Trans. Image Processing 28, 2 (2019), 994–1006.
[54]
Haiyan Wu, Yanyun Qu, Shaohui Lin, Jian Zhou, Ruizhi Qiao, Zhizhong Zhang, Yuan Xie, and Lizhuang Ma. 2021. Contrastive learning for compact single image dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’21). 10551–10560.
[55]
Yi Xiao, Tong Liu, Yu Han, Yue Liu, and Yongtian Wang. 2024. Realtime recognition of dynamic hand gestures in practical applications. ACM Trans. Multim. Comput. Commun. Appl. 20, 2 (2024), Article 50, 1–17.
[56]
Yi Xiao, Qiangqiang Yuan, Kui Jiang, Jiang He, Xianyu Jin, and Liangpei Zhang. 2024. EDiffSR: An efficient diffusion probabilistic model for remote sensing image super-resolution. IEEE Trans. Geosci. Remote. Sens. 62 (2024), 1–14.
[57]
Yi Xiao, Qiangqiang Yuan, Kui Jiang, Jiang He, Chia-Wen Lin, and Liangpei Zhang. 2024. TTST: A top-k token selective transformer for remote sensing image super-resolution. IEEE Trans. Image Process. 33 (2024), 738–752.
[58]
Yi Xiao, Qiangqiang Yuan, Kui Jiang, Jiang He, Yuan Wang, and Liangpei Zhang. 2023. From degrade to upgrade: Learning a self-supervised degradation guided adaptive network for blind remote sensing image super-resolution. Inf. Fusion 96 (2023), 297–311.
[59]
Zhenda Xie, Yutong Lin, Zheng Zhang, Yue Cao, Stephen Lin, and Han Hu. 2021. Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’21). 16684–16693.
[60]
Xinchen Ye, Xiangyue Duan, and Haojie Li. 2018. Depth super-resolution with deep edge-inference network and edge-guided depth filling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 1398–1402.
[61]
Xinchen Ye, Baoli Sun, Zhihui Wang, Jingyu Yang, Rui Xu, Haojie Li, and Baopu Li. 2020. PMBANet: Progressive multi-branch aggregation network for scene depth super-resolution. IEEE Trans. Image Processing 29 (2020), 7427–7442.
[62]
Zili Yi, Hao (Richard) Zhang, Ping Tan, and Minglun Gong. 2017. DualGAN: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV ’17). 2868–2876.
[63]
Yuan Yuan, Siyuan Liu, Jiawei Zhang, Yongbing Zhang, Chao Dong, and Liang Lin. 2018. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW ’18). 701–710.
[64]
Shuo Zhang, Zexu Pan, Yichang Lv, and Youfang Lin. 2024. Hierarchical edge refinement network for guided depth map super-resolution. IEEE Trans. Computational Imaging 10 (2024), 469–478.
[65]
Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. 2018. Residual dense network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’18). 2472–2481.
[66]
Tianyu Zhao, Wenqi Ren, Changqing Zhang, Dongwei Ren, and Qinghua Hu. 2018. Unsupervised degradation learning for single image super-resolution. arXiv:1812.04240. Retrieved from http://arxiv.org/abs/1812.04240
[67]
Zhiwei Zhong, Xianming Liu, Junjun Jiang, Debin Zhao, Zhiwen Chen, and Xiangyang Ji. 2022. High-resolution depth maps imaging via attention-based hierarchical multi-modal fusion. IEEE Trans. Image Process. 31 (2022), 648–663.
[68]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV ’17). 2242–2251.

Index Terms

  1. Digging into Depth and Color Spaces: A Mapping Constraint Network for Depth Super-Resolution

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 10
    October 2024
    729 pages
    EISSN:1551-6865
    DOI:10.1145/3613707
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 October 2024
    Online AM: 10 July 2024
    Accepted: 29 June 2024
    Revised: 19 June 2024
    Received: 16 April 2024
    Published in TOMM Volume 20, Issue 10

    Check for updates

    Author Tags

    1. Depth map
    2. super-resolution
    3. cycle-consistent
    4. contrastive constraint
    5. screening

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 117
      Total Downloads
    • Downloads (Last 12 months)117
    • Downloads (Last 6 weeks)30
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media