Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3552458.3556451acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

PSINet: Progressive Saliency Iteration Network for RGB-D Salient Object Detection

Published: 10 October 2022 Publication History
  • Get Citation Alerts
  • Abstract

    RGB-D Salient Object Detection (RGB-D SOD) is a pixel-level dense prediction task that can highlight the prominent object in the scene by combining color information and depth constraints. Attention mechanisms have been widely employed in SOD due to their ability to capture important cues. However, most existing attentions (\textite.g., spatial attention, channel attention, self-attention) mainly exploit the pixel-level attention maps, ignoring the region properties of salient objects. To remedy this issue, we propose a progressive saliency iteration network (PSINet) with a region-wise saliency attention to improve the regional integrity of salient objects in an iterative manner. Specifically, two-stream Swin Transformers are first employed to extract RGB and depth features. Second, a multi-modality alternate and inverse module (AIM) is designed to extract complementary features from RGB-D images in an interleaved manner, which breaks down the barriers of inconsistency existing in the cross-modal data and also sufficiently captures the complementarity. Third, a triple progressive iteration decoder (TPID) is proposed to optimize the salient objects, where a coarse saliency map, generated by integrating multi-scale features with a U-Net, is viewed as region-wise attention maps to construct a region-wise saliency attention module(RSAM), which can emphasize the prominent region of features. Finally, the regional integrity of salient objects can be gradually optimized from coarse to fine by iterating the above steps on TPID. Quantitative and qualitative experiments demonstrate that the proposed model performs favorably against 19 state-of-the-art (SOTA) saliency detectors on five benchmark RGB-D SOD datasets.

    Supplementary Material

    MP4 File (HCMA22-hcma12p.mp4)
    This video introduces my work about salient object detection, the title is Progressive Saliency Iteration Network for RGB-D Salient Object Detection, which contains the introduction, Motivation, Solution, Experiment, and Ablation.

    References

    [1]
    Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. Frequency-tuned salient region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1597--1604, 2009.
    [2]
    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
    [3]
    Chenglizhao Chen, Jipeng Wei, Chong Peng, and Hong Qin. Depth-quality-aware salient object detection. IEEE Transactions on Image Processing, 30:2350--2363, 2021.
    [4]
    Hao Chen, Youfu Li, and Dan Su. Discriminative cross-modal transfer learning and densely cross-level feedback fusion for rgb-d salient object detection. IEEE Transactions on Cybernetics, 50(11):4808--4820, 2019.
    [5]
    Qian Chen, Ze Liu, Yi Zhang, Keren Fu, Qijun Zhao, and Hongwei Du. RGB-D salient object detection via 3d convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1063--1071, 2021.
    [6]
    Zuyao Chen, Runmin Cong, Qianqian Xu, and Qingming Huang. Dpanet: Depth potentiality-aware gated attention network for rgb-d salient object detection. IEEE Transactions on Image Processing, 30:7012--7024, 2020.
    [7]
    Yupeng Cheng, Huazhu Fu, Xingxing Wei, Jiangjian Xiao, and Xiaochun Cao. Depth enhanced saliency detection method. In Proceedings of International Conference on Internet Multimedia Computing and Service, pages 23--27, 2014.
    [8]
    Runmin Cong, Jianjun Lei, Huazhu Fu, Junhui Hou, Qingming Huang, and Sam Kwong. Going from rgb to rgbd saliency: A depth-guided transformation model. IEEE Transactions on Cybernetics, 50(8):3627--3639, 2019.
    [9]
    Runmin Cong, Jianjun Lei, Huazhu Fu, Qingming Huang, Xiaochun Cao, and Nam Ling. Hscs: Hierarchical sparsity based co-saliency detection for rgbd images. IEEE Transactions on Multimedia, 21(7):1660--1671, 2018.
    [10]
    Runmin Cong, Jianjun Lei, Huazhu Fu, Weisi Lin, Qingming Huang, Xiaochun Cao, and Chunping Hou. An iterative co-saliency framework for rgbd images. IEEE Transactions on Cybernetics, 49(1):233--246, 2017.
    [11]
    Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji. Enhanced-alignment measure for binary foreground map evaluation. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 698--704, 2018.
    [12]
    Deng-Ping Fan, Zheng Lin, Zhao Zhang, Menglong Zhu, and Ming-Ming Cheng. Rethinking rgb-d salient object detection: Models, data sets, and large-scale benchmarks. IEEE Transactions on Neural Networks and Learning Systems, 32(5):2075-- 2089, 2020.
    [13]
    Deng-Ping Fan, Yingjie Zhai, Ali Borji, Jufeng Yang, and Ling Shao. Bbs-net: Rgb-d salient object detection with a bifurcated backbone strategy network. In Proceedings of the European Conference on Computer Vision, pages 275--292, 2020.
    [14]
    Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3146--3154, 2019.
    [15]
    Keren Fu, Deng-Ping Fan, Ge-Peng Ji, and Qijun Zhao. Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3052--3062, 2020.
    [16]
    Keren Fu, Deng-Ping Fan, Ge-Peng Ji, Qijun Zhao, Jianbing Shen, and Ce Zhu. Siamese network for rgb-d salient object detection and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
    [17]
    Lina Gao, Bing Liu, Ping Fu, Mingzhu Xu, and Junbao Li. Visual tracking via dynamic saliency discriminative correlation filter. Applied Intelligence, pages 1--15, 2021.
    [18]
    Wei Gao, Guibiao Liao, Siwei Ma, Ge Li, Yongsheng Liang, and Weisi Lin. Unified information fusion network for multi-modal rgb-d and rgb-t salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(4):2091--2106, 2022.
    [19]
    Junwei Han, Hao Chen, Nian Liu, Chenggang Yan, and Xuelong Li. Cnns-based rgb-d saliency detection via cross-view transfer and multiview fusion. IEEE Transactions on Cybernetics, 48(11):3171--3183, 2018.
    [20]
    Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
    [21]
    Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
    [22]
    Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7132--7141, 2018.
    [23]
    Wei Ji, Jingjing Li, Shuang Yu, Miao Zhang, Yongri Piao, Shunyu Yao, Qi Bi, Kai Ma, Yefeng Zheng, Huchuan Lu, et al. Calibrated rgb-d salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9471--9481, 2021.
    [24]
    Wei Ji, Jingjing Li, Miao Zhang, Yongri Piao, and Huchuan Lu. Accurate rgb-d salient object detection via collaborative learning. In Proceedings of the European Conference on Computer Vision, pages 52--69, 2020.
    [25]
    Limai Jiang, Hui Fan, and Jinjiang Li. A multi-focus image fusion method based on attention mechanism and supervised learning. Applied Intelligence, 52(1):339--357, 2022.
    [26]
    Ran Ju, Ling Ge, Wenjing Geng, Tongwei Ren, and Gangshan Wu. Depth saliency based on anisotropic center-surround difference. In Proceedings of the IEEE International Conference on Image Processing, pages 1115--1119, 2014.
    [27]
    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Proceedings of International Conference on Learning Representations, 2015.
    [28]
    Chongyi Li, Runmin Cong, Sam Kwong, Junhui Hou, Huazhu Fu, Guopu Zhu, Dingwen Zhang, and Qingming Huang. Asif-net: Attention steered interweave fusion network for rgb-d salient object detection. IEEE Transactions on Cybernetics, 51(1):88--100, 2020.
    [29]
    Chongyi Li, Runmin Cong, Yongri Piao, Qianqian Xu, and Chen Change Loy. Rgb-d salient object detection with cross-modality modulation and selection. In Proceedings of the European Conference on Computer Vision, pages 225--241, 2020.
    [30]
    Gongyang Li, Zhi Liu, Minyu Chen, Zhen Bai, Weisi Lin, and Haibin Ling. Hierarchical alternate interaction network for rgb-d salient object detection. IEEE Transactions on Image Processing, 30:3528--3542, 2021.
    [31]
    Gongyang Li, Zhi Liu, and Haibin Ling. Icnet: Information conversion network for rgb-d based salient object detection. IEEE Transactions on Image Processing, 29:4873--4884, 2020.
    [32]
    Gongyang Li, Zhi Liu, Linwei Ye, Yang Wang, and Haibin Ling. Cross-modal weighting network for rgb-d salient object detection. In Proceedings of the European Conference on Computer Vision, pages 665--681, 2020.
    [33]
    Nian Liu, Ni Zhang, Kaiyuan Wan, Ling Shao, and Junwei Han. Visual saliency transformer. In Proceedings of the IEEE International Conference on Computer Vision, pages 4722--4732, 2021.
    [34]
    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE International Conference on Computer Vision, pages 10012--10022, 2021.
    [35]
    Zhengyi Liu, Yacheng Tan, Qian He, and Yun Xiao. Swinnet: Swin transformer drives edge-aware rgb-d and rgb-t salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, pages 1--1, 2021.
    [36]
    Zhengyi Liu, Kaixun Wang, Hao Dong, and Yuan Wang. A cross-modal edgeguided salient object detection for rgb-d image. Neurocomputing, 454:168--177, 2021.
    [37]
    Zhengyi Liu, Yuan Wang, Zhengzheng Tu, Yun Xiao, and Bin Tang. Tritransnet: Rgb-d salient object detection with a triplet transformer embedding network. In Proceedings of the ACM International Conference on Multimedia, pages 4481--4490, 2021.
    [38]
    Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11976--11986, 2022.
    [39]
    Ran Margolin, Lihi Zelnik-Manor, and Ayellet Tal. How to evaluate foreground maps? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 248--255, 2014.
    [40]
    Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the International Conference on Machine Learning, 2010.
    [41]
    Yuzhen Niu, Yujie Geng, Xueqing Li, and Feng Liu. Leveraging stereopsis for saliency analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 454--461, 2012.
    [42]
    Youwei Pang, Lihe Zhang, Xiaoqi Zhao, and Huchuan Lu. Hierarchical dynamic filtering network for rgb-d salient object detection. In Proceedings of the European Conference on Computer Vision, pages 235--252, 2020.
    [43]
    Houwen Peng, Bing Li, Weihua Xiong, Weiming Hu, and Rongrong Ji. Rgbd salient object detection: a benchmark and algorithms. In Proceedings of the European Conference on Computer Vision, pages 92--109, 2014.
    [44]
    Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. Saliency filters: Contrast based filtering for salient region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 733--740, 2012.
    [45]
    Yongri Piao, Zhengkun Rong, Miao Zhang, Weisong Ren, and Huchuan Lu. A2dele: Adaptive and attentive depth distiller for efficient rgb-d salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9060--9069, 2020.
    [46]
    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention, pages 234--241, 2015.
    [47]
    Peng Sun, Wenhu Zhang, Huanyu Wang, Songyuan Li, and Xi Li. Deep rgb-d saliency detection with depth-sensitive attention and automatic multi-modal fusion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1407--1417, 2021.
    [48]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, 30, 2017
    [49]
    Haoxiang Wang, Zhihui Li, Yang Li, Brij B Gupta, and Chang Choi. Visual saliency guided complex image retrieval. Pattern Recognition Letters, 130:64--72, 2020.
    [50]
    Xiaoqiang Wang, Lei Zhu, Siliang Tang, Huazhu Fu, Ping Li, Fei Wu, Yi Yang, and Yueting Zhuang. Boosting rgb-d saliency detection by leveraging unlabeled rgb images. IEEE Transactions on Image Processing, 31:1107--1119, 2022.
    [51]
    Xuehao Wang, Shuai Li, Chenglizhao Chen, Yuming Fang, Aimin Hao, and Hong Qin. Data-level recombination and lightweight fusion scheme for rgb-d salient object detection. IEEE Transactions on Image Processing, 30:458--471, 2020.
    [52]
    Xuehao Wang, Shuai Li, Chenglizhao Chen, Aimin Hao, and Hong Qin. Depth quality-aware selective saliency fusion for rgb-d image salient object detection. Neurocomputing, 432:44--56, 2021.
    [53]
    Qi Bi Chuan Guo Jie Liu Li Cheng Wei Ji, Jingjing Li. Promoting saliency from depth: Deep unsupervised rgb-d saliency detection. In Proceedings of the International Conference on Machine Learning, 2022.
    [54]
    Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision, pages 3--19, 2018.
    [55]
    Jingyu Wu, Fuming Sun, Rui Xu, Jie Meng, and Fasheng Wang. Aggregate interactive learning for rgb-d salient object detection. Expert Systems with Applications, 195:116614, 2022.
    [56]
    Chenxing Xia, Xiuju Gao, Xianjin Fang, Kuan-Ching Li, Shuzhi Su, and Haitao Zhang. Rlp-agmc: Robust label propagation for saliency detection based on an adaptive graph with multiview connections. Signal Processing: Image Communication, 98:116372, 2021.
    [57]
    Chenxing Xia, Xiuju Gao, Kuan-Ching Li, Qianjin Zhao, and Shunxiang Zhang. Salient object detection based on distribution-edge guidance and iterative bayesian optimization. Applied Intelligence, 50(10):2977--2990, 2020.
    [58]
    Chenxing Xia, Hanling Zhang, Xiuju Gao, and Keqin Li. Exploiting background divergence and foreground compactness for salient object detection. Neurocomputing, 383:194--211, 2020.
    [59]
    Chenxing Xia, Hanling Zhang, Xiuju Gao, and Keqin Li. Exploiting background divergence and foreground compactness for salient object detection. Neurocomputing, 383:194--211, 2020.
    [60]
    Yang Yang, Qi Qin, Yongjiang Luo, Yi Liu, Qiang Zhang, and Jungong Han. Bidirectional progressive guidance network for rgb-d salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, pages 1--1, 2022.
    [61]
    Yu Zeng, Yunzhi Zhuge, Huchuan Lu, and Lihe Zhang. Joint learning of saliency detection and weakly supervised semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pages 7223--7233, 2019.
    [62]
    Chen Zhang, Runmin Cong, Qinwei Lin, Lin Ma, Feng Li, Yao Zhao, and Sam Kwong. Cross-modality discrepant interaction network for rgb-d salient object detection. In Proceedings of the ACM International Conference on Multimedia, pages 2094--2102, 2021.
    [63]
    Jing Zhang, Deng-Ping Fan, Yuchao Dai, Saeed Anwar, Fatemeh Sadat Saleh, Tong Zhang, and Nick Barnes. Uc-net: Uncertainty inspired rgb-d saliency detection via conditional variational autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8582--8591, 2020.
    [64]
    Wenbo Zhang, Ge-Peng Ji, Zhuo Wang, Keren Fu, and Qijun Zhao. Depth qualityinspired feature manipulation for efficient rgb-d salient object detection. In Proceedings of the ACM International Conference on Multimedia, pages 731--740, 2021.
    [65]
    Wenbo Zhang, Yao Jiang, Keren Fu, and Qijun Zhao. Bts-net: Bi-directional transfer-and-selection network for rgb-d salient object detection. In Proceedings of the IEEE International Conference on Multimedia and Expo, pages 1--6, 2021.
    [66]
    Zhao Zhang, Zheng Lin, Jun Xu, Wen-Da Jin, Shao-Ping Lu, and Deng-Ping Fan. Bilateral attention network for rgb-d salient object detection. IEEE Transactions on Image Processing, 30:1949--1961, 2021.
    [67]
    Jia-Xing Zhao, Yang Cao, Deng-Ping Fan, Ming-Ming Cheng, Xuan-Yi Li, and Le Zhang. Contrast prior and fluid pyramid integration for rgbd salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3927--3936, 2019.
    [68]
    Jiawei Zhao, Yifan Zhao, Jia Li, and Xiaowu Chen. Is depth really necessary for salient object detection? In Proceedings of the ACM International Conference on Multimedia, pages 1745--1754, 2020.
    [69]
    Ting Zhao and Xiangqian Wu. Pyramid feature attention network for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3085--3094, 2019.
    [70]
    Xiaoqi Zhao, Lihe Zhang, Youwei Pang, Huchuan Lu, and Lei Zhang. A single stream network for robust and real-time rgb-d salient object detection. In Proceedings of the European Conference on Computer Vision, pages 646--662, 2020.
    [71]
    Wujie Zhou, Yun Zhu, Jingsheng Lei, Jian Wan, and Lu Yu. Ccafnet: Crossflow and cross-scale adaptive fusion network for detecting salient objects in rgb-d images. IEEE Transactions on Multimedia, 24:2192--2204, 2021.
    [72]
    Chunbiao Zhu and Ge Li. A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 3008--3014, 2017

    Index Terms

    1. PSINet: Progressive Saliency Iteration Network for RGB-D Salient Object Detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      HCMA '22: Proceedings of the 3rd International Workshop on Human-Centric Multimedia Analysis
      October 2022
      106 pages
      ISBN:9781450394925
      DOI:10.1145/3552458
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 October 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. attention
      2. progressive iteration
      3. regional integrity
      4. rgb-d images
      5. salient object detection

      Qualifiers

      • Research-article

      Funding Sources

      • University-level key projects of Anhui University of science and technology
      • University Synergy Innovation Program of Anhui Province
      • the National Natural Science Foundation of China
      • University-level general projects of Anhui University of science and technology
      • Natural Science Research Project of Colleges and Universities in Anhui Province
      • Anhui Natural Science Foundation

      Conference

      MM '22
      Sponsor:

      Acceptance Rates

      HCMA '22 Paper Acceptance Rate 12 of 21 submissions, 57%;
      Overall Acceptance Rate 12 of 21 submissions, 57%

      Upcoming Conference

      MM '24
      The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 111
        Total Downloads
      • Downloads (Last 12 months)35
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media