Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object Detection

Published: 15 May 2024 Publication History

Abstract

While significant progress has been made in recent years in the field of salient object detection, there are still limitations in heterogeneous modality fusion and salient feature integrity learning. The former is primarily attributed to a paucity of attention from researchers to the fusion of cross-scale information between different modalities during processing multi-modal heterogeneous data, coupled with an absence of methods for adaptive control of their respective contributions. The latter constraint stems from the shortcomings in existing approaches concerning the prediction of salient region’s integrity. To address these problems, we propose a Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object Detection (HFIL-Net). In response to the first challenge, we design an Advanced Semantic Guidance Aggregation (ASGA) module, which utilizes three fusion blocks to achieve the aggregation of three types of information: within-scale cross-modal, within-modal cross-scale, and cross-modal cross-scale. In addition, we embed the local fusion factor matrices in the ASGA module and utilize the global fusion factor matrices in the Multi-modal Information Adaptive Fusion module to control the contributions adaptively from different perspectives during the fusion process. For the second issue, we introduce the Feature Integrity Learning and Refinement Module. It leverages the idea of ”part-whole” relationships from capsule networks to learn feature integrity and further refine the learned features through attention mechanisms. Extensive experimental results demonstrate that our proposed HFIL-Net outperforms over 17 state-of-the-art detection methods in testing across seven challenging standard datasets. Codes and results are available on https://github.com/BojueGao/HFIL-Net.

References

[1]
Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. 2009. Frequency-tuned salient region detection. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 1597–1604.
[2]
Baian Chen, Zhilei Chen, Xiaowei Hu, Jun Xu, Haoran Xie, Jing Qin, and Mingqiang Wei. 2023. Dynamic message propagation network for RGB-D and video salient object detection. ACM Trans. Multimedia Comput. Commun. Appl. 20, 1 (2023), 1–21.
[3]
Gang Chen, Feng Shao, Xiongli Chai, Hangwei Chen, Qiuping Jiang, Xiangchao Meng, and Yo-Sung Ho. 2022. CGMDRNet: Cross-guided modality difference reduction network for RGB-T salient object detection. IEEE Trans. Circ. Syst. Vid. Technol. 32, 9 (2022), 6308–6323.
[4]
Gang Chen, Feng Shao, Xiongli Chai, Hangwei Chen, Qiuping Jiang, Xiangchao Meng, and Yo-Sung Ho. 2022. Modality-induced transfer-fusion network for RGB-D and RGB-T salient object detection. IEEE Trans. Circ. Syst. Vid. Technol. 33, 4 (2022), 1787–1801.
[5]
Hao Chen and Feihong Shen. 2023. Hierarchical cross-modal transformer for RGB-D salient object detection. arXiv preprint arXiv:2302.08052 (2023). DOI:
[6]
Qian Chen, Keren Fu, Ze Liu, Geng Chen, Hongwei Du, Bensheng Qiu, and Ling Shao. 2021. EF-Net: A novel enhancement and fusion network for RGB-D saliency detection. Pattern Recogn. 112 (2021), 107740.
[7]
Qian Chen, Zhenxi Zhang, Yanye Lu, Keren Fu, and Qijun Zhao. 2024. 3-d convolutional neural networks for rgb-d salient object detection and beyond. IEEE Trans. Neural Netw. Learn. Syst. 35, 3 (2024), 4309–4323.
[8]
Xiaolong Cheng, Xuan Zheng, Jialun Pei, He Tang, Zehua Lyu, and Chuanbo Chen. 2023. Depth-induced gap-reducing network for RGB-D salient object detection: An interaction, guidance and refinement approach. IEEE Trans. Multimedia 25 (2023), 4253–4266.
[9]
Yupeng Cheng, Huazhu Fu, Xingxing Wei, Jiangjian Xiao, and Xiaochun Cao. 2014. Depth enhanced saliency detection method. In Proceedings of the International Conference on Internet Multimedia Computing and Service. 23–27.
[10]
Runmin Cong, Qinwei Lin, Chen Zhang, Chongyi Li, Xiaochun Cao, Qingming Huang, and Yao Zhao. 2022. CIR-Net: Cross-modality interaction and refinement for RGB-D salient object detection. IEEE Trans. Image Process. 31 (2022), 6800–6815.
[11]
Runmin Cong, Kepu Zhang, Chen Zhang, Feng Zheng, Yao Zhao, Qingming Huang, and Sam Kwong. 2023. Does thermal really always matter for RGB-T salient object detection? IEEE Trans. Multimedia 25 (2023), 6971–6982.
[12]
Jiaxiu Dong, Niannian Wang, Hongyuan Fang, Rui Wu, Chengzhi Zheng, Duo Ma, and Haobang Hu. 2022. Automatic damage segmentation in pavement videos by fusing similar feature extraction siamese network (SFE-SNet) and pavement damage segmentation capsule network (PDS-CapsNet). Autom. Constr. 143 (2022), 104537.
[13]
Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. 2017. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the International Conference on Computer Vision. 4548–4557.
[14]
Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji. 2018. Enhanced-alignment measure for binary foreground map evaluation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 698–704.
[15]
Deng-Ping Fan, Zheng Lin, Zhao Zhang, Menglong Zhu, and Ming-Ming Cheng. 2021. Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks. IEEE Trans. Neural Netw. Learn. Syst. 32, 5 (2021), 2075–2089.
[16]
Deng-Ping Fan, Yingjie Zhai, Ali Borji, Jufeng Yang, and Ling Shao. 2020. BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network. In Proceedings of the European Conference on Computer Vision. 275–292.
[17]
Wei Gao, Guibiao Liao, Siwei Ma, Ge Li, Yongsheng Liang, and Weisi Lin. 2022. Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection. IEEE Trans.Circ. Syst. Vid. Technol. 32, 4 (2022), 2091–2106.
[18]
Geoffrey E. Hinton, Alex Krizhevsky, and Sida D. Wang. 2011. Transforming auto-encoders. In Proceedings of the International Conference on Artificial Neural Network. 44–51.
[19]
Geoffrey E. Hinton, Sara Sabour, and Nicholas Frosst. 2018. Matrix capsules with EM routing. In Proceedings of the International Conference on Learning Representations. 1–15.
[20]
Wei Ji, Ge Yan, Jingjing Li, Yongri Piao, Shunyu Yao, Miao Zhang, Li Cheng, and Huchuan Lu. 2022. DMRA: Depth-induced multi-scale recurrent attention network for RGB-D saliency detection. IEEE Trans. Image Process. 31 (2022), 2321–2336.
[21]
Ran Ju, Ling Ge, Wenjing Geng, Tongwei Ren, and Gangshan Wu. 2014. Depth saliency based on anisotropic center-surround difference. In Proceedings of the International Conference on Image Processing. 1115–1119.
[22]
Rodney LaLonde and Ulas Bagci. 2018. Capsules for object segmentation. In Proceedings of the International Conference on Medical Imaging with Deep Learning. 1–9.
[23]
Minhyeok Lee, Chaewon Park, Suhwan Cho, and Sangyoun Lee. 2022. Spsn: Superpixel prototype sampling network for rgb-d salient object detection. In Proceedings of the European Conference on Computer Vision. 630–647.
[24]
Chongyi Li, Runmin Cong, Sam Kwong, Junhui Hou, Huazhu Fu, Guopu Zhu, Dingwen Zhang, and Qingming Huang. 2021. ASIF-Net: Attention steered interweave fusion network for RGB-D salient object detection. IEEE Trans. Cybernet. 51, 1 (2021), 88–100.
[25]
Chongyi Li, Runmin Cong, Yongri Piao, Qianqian Xu, and Chen Change Loy. 2020. RGB-D salient object detection with cross-modality modulation and selection. In Proceedings of the European Conference on Computer Vision. 225–241.
[26]
Jingjing Li, Wei Ji, Miao Zhang, Yongri Piao, Huchuan Lu, and Li Cheng. 2023. Delving into calibrated depth for accurate RGB-D salient object detection. Int. J. Comput. Vis 131, 4 (2023), 855–876.
[27]
Zhongqi Lin, Jingdun Jia, Feng Huang, and Wanlin Gao. 2022. Feature correlation-steered capsule network for object detection. Neural Netw. 147 (2022), 25–41.
[28]
Nian Liu, Ni Zhang, Kaiyuan Wan, Ling Shao, and Junwei Han. 2021. Visual saliency transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4722–4732.
[29]
Yi Liu, Dingwen Zhang, Qiang Zhang, and Jungong Han. 2022. Part-object relational visual saliency. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7 (2022), 3688–3704.
[30]
Yi Liu, Qiang Zhang, Dingwen Zhang, and Jungong Han. 2019. Employing deep part-object relationships for salient object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1232–1241.
[31]
Zhiyu Liu, Munawar Hayat, Hong Yang, Duo Peng, and Yinjie Lei. 2023. Deep hypersphere feature regularization for weakly supervised RGB-D salient object detection. IEEE Trans. Image Process. 32 (2023), 5423–5437.
[32]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.
[33]
Zhengyi Liu, Yacheng Tan, Qian He, and Yun Xiao. 2022. SwinNet: Swin transformer drives edge-aware RGB-D and RGB-T salient object detection. IEEE Trans. Circ. Syst. Vid. Technol. 32, 7 (2022), 4486–4497.
[34]
Zhengyi Liu, Yuan Wang, Zhengzheng Tu, Yun Xiao, and Bin Tang. 2021. TriTransNet: RGB-D salient object detection with a triplet transformer embedding network. In Proceedings of the ACM International Conference on Multimedia. 4481–4490.
[35]
Mingcan Ma, Changqun Xia, Chenxi Xie, Xiaowu Chen, and Jia Li. 2023. Boosting broader receptive fields for salient object detection. IEEE Trans. Image Process. 32 (2023), 1026–1038.
[36]
Ran Margolin, Lihi Zelnik-Manor, and Ayellet Tal. 2014. How to evaluate foreground maps? In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 248–255.
[37]
Yuzhen Niu, Yujie Geng, Xueqing Li, and Feng Liu. 2012. Leveraging stereopsis for saliency analysis. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 454–461.
[38]
Youwei Pang, Xiaoqi Zhao, Lihe Zhang, and Huchuan Lu. 2023. CAVER: Cross-modal view-mixed transformer for bi-modal salient object detection. IEEE Trans. Image Process. 32 (2023), 892–904.
[39]
Houwen Peng, Bing Li, Weihua Xiong, Weiming Hu, and Rongrong Ji. 2014. RGBD salient object detection: A benchmark and algorithms. In Proceedings of the European Conference on Computer Vision. 92–109.
[40]
Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 733–740.
[41]
Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, and Huchuan Lu. 2019. Depth-induced multi-scale recurrent attention network for saliency detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7254–7263.
[42]
Jathushan Rajasegaran, Vinoj Jayasundara, Sandaru Jayasekara, Hirunima Jayasekara, Suranga Seneviratne, and Ranga Rodrigo. 2019. Deepcaps: Going deeper with capsule networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. 10725–10733.
[43]
Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic routing between capsules. In Advances in Neural Information Processing Systems, Vol. 30, 3859–3869.
[44]
Kechen Song, Yanqi Bao, Han Wang, Liming Huang, and Yunhui Yan. 2023. A potential vision-based measurements technology: Information flow fusion detection method using RGB-thermal infrared images. IEEE Trans. Instrum. Meas. 72 (2023), 1–13.
[45]
Fuming Sun, Peng Ren, Bowen Yin, Fasheng Wang, and Haojie Li. 2024. CATNet: A cascaded and aggregated transformer network for RGB-D salient object detection. IEEE Trans. Multimedia 26 (2024), 2249–2262.
[46]
Zhengzheng Tu, Zhun Li, Chenglong Li, Yang Lang, and Jin Tang. 2021. Multi-interactive dual-decoder for RGB-thermal salient object detection. IEEE Trans. Image Process. 30 (2021), 5678–5691.
[47]
Fengyun Wang, Jinshan Pan, Shoukun Xu, and Jinhui Tang. 2022. Learning discriminative cross-modality features for RGB-D saliency detection. IEEE Trans. Image Process. 31 (2022), 1285–1297.
[48]
Fasheng Wang, Yiming Su, Ruimin Wang, Jing Sun, Fuming Sun, and Haojie Li. 2023. Cross-modal and cross-level attention interaction network for salient object detection. IEEE Trans. Artif. Intell. (2023), 1–15.
[49]
Fasheng Wang, Ruimin Wang, and Fuming Sun. 2023. DCMNet: Discriminant and cross-modality network for RGB-D salient object detection. Expert Syst. Appl. 214 (2023), 119047.
[50]
Fasheng Wang, Shuangshuang Yin, Jimmy T. Mbelwa, and Fuming Sun. 2022. Context and saliency aware correlation filter for visual target tracking. Multimed. Tools. Appl. 81, 19 (2022), 27879–27893.
[51]
Jie Wang, Kechen Song, Yanqi Bao, Liming Huang, and Yunhui Yan. 2022. CGFNet: Cross-guided fusion network for RGB-T salient object detection. IEEE Trans. Circ. Syst. Vid. Technol. 32, 5 (2022), 2949–2961.
[52]
Ruimin Wang, Fasheng Wang, Yiming Su, Jing Sun, Fuming Sun, and Haojie Li. 2024. Attention-guided multi-modality interaction network for RGB-D salient object detection. ACM Trans. Multimedia Comput. Commun. Appl. 20, 3, Article NO. 68 (2024), 1–22.
[53]
Wenguan Wang, Jianbing Shen, and Haibin Ling. 2019. A deep network solution for attention and aesthetics aware photo cropping. IEEE Trans. Pattern Anal. Mach. Intell. 41, 7 (2019), 1531–1544.
[54]
Wenguan Wang, Jianbing Shen, Xiankai Lu, Steven CH Hoi, and Haibin Ling. 2020. Paying attention to video object pattern understanding. IEEE Trans. Pattern Anal. Mach. Intell. 43, 7 (2020), 2413–2428.
[55]
Yanbo Wang, Fasheng Wang, Chang Wang, Jianjun He, and Fuming Sun. 2022. Learning saliency aware correlation filter for visual tracking. Comput. J. 65, 7 (2022), 1846–1859.
[56]
Yang Wang and Yanqing Zhang. 2022. Three-stage bidirectional interaction network for efficient RGB-D salient object detection. In Proceedings of the Asian Conference on Computer Vision (ACCV ’22). 3672–3689.
[57]
Yu-Huan Wu, Yun Liu, Jun Xu, Jia-Wang Bian, Yu-Chao Gu, and Ming-Ming Cheng. 2022. MobileSal: Extremely efficient RGB-D salient object detection. IEEE Trans. Pattern Anal. Mach. Intell. 44, 12 (2022), 10261–10269.
[58]
Zongwei Wu, Guillaume Allibert, Fabrice Meriaudeau, Chao Ma, and Cédric Demonceaux. 2023. Hidanet: RGB-D salient object detection via hierarchical depth awareness. IEEE Trans. Image Process. 32 (2023), 2160–2173.
[59]
Shunyu Yao, Miao Zhang, Yongri Piao, Chaoyi Qiu, and Huchuan Lu. 2023. Depth injection framework for RGBD salient object detection. IEEE Trans. Image Process. 32 (2023), 5340–5352. DOI:
[60]
Amin Amiri Tehrani Zade, Maryam Jalili Aziz, Saeed Masoudnia, Alireza Mirbagheri, and Alireza Ahmadian. 2022. An improved capsule network for glioma segmentation on MRI images: A curriculum learning approach. Comput. Biol. Med. 148 (2022), 105917.
[61]
Chao Zeng, Sam Kwong, and Horace Ip. 2023. Dual swin-transformer based mutual interactive network for RGB-D salient object detection. Neurocomputing 559 (2023), 126–779.
[62]
Yu Zeng, Yunzhi Zhuge, Huchuan Lu, and Lihe Zhang. 2019. Joint learning of saliency detection and weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7223–7233.
[63]
Dingwen Zhang, Junwei Han, Long Zhao, and Deyu Meng. 2019. Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework. Int. J. Comput. Vis. 127 (2019), 363–380.
[64]
Miao Zhang, Shunyu Yao, Beiqi Hu, Yongri Piao, and Wei Ji. 2023. C2DFNet: Criss-cross dynamic filter network for RGB-D salient object detection. IEEE Trans. Multimedia 25 (2023), 5142–5154.
[65]
Heng Zhou, Chunna Tian, Zhenxi Zhang, Chengyang Li, Yuxuan Ding, Yongqiang Xie, and Zhongbo Li. 2023. Position-aware relation learning for RGB-thermal salient object detection. IEEE Trans. Image Process. 32 (2023), 2593–2607.
[66]
Wujie Zhou, Qinling Guo, Jingsheng Lei, Lu Yu, and Jenq-Neng Hwang. 2022. ECFFNet: Effective and consistent feature fusion network for RGB-T salient object detection. IEEE Trans. Circ. Syst. Vid. Technol. 32, 3 (2022), 1224–1235.
[67]
Chunbiao Zhu, Xing Cai, Kan Huang, Thomas H. Li, and Ge Li. 2019. PDNet: Prior-model guided depth-enhanced network for salient object detection. In Proceedings of the International Conference on Multimedia and Expo. 199–204.
[68]
Chunbiao Zhu and Ge Li. 2017. A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 3008–3014.
[69]
Mingchen Zhuge, Deng-Ping Fan, Nian Liu, Dingwen Zhang, Dong Xu, and Ling Shao. 2023. Salient object detection via integrity learning. IEEE Trans. Pattern Anal. Mach. Intell. 45, 3 (2023), 3738–3752.

Cited By

View all
  • (2024)A Novel Multi-view Hypergraph Adaptive Fusion Approach for Representation LearningProceedings of the Third International Workshop on Social and Metaverse Computing, Sensing and Networking10.1145/3698387.3700000(43-49)Online publication date: 4-Nov-2024
  • (2024)Visual Saliency Detection Based on Global Guidance Map and Background AttentionIEEE Access10.1109/ACCESS.2024.341428512(95434-95446)Online publication date: 2024

Index Terms

  1. Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 7
    July 2024
    973 pages
    EISSN:1551-6865
    DOI:10.1145/3613662
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 May 2024
    Online AM: 05 April 2024
    Accepted: 03 April 2024
    Revised: 13 March 2024
    Received: 23 December 2023
    Published in TOMM Volume 20, Issue 7

    Check for updates

    Author Tags

    1. Salient object detection
    2. heterogeneous modality fusion
    3. capsule network
    4. integrity learning

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Joint Funds of Liaoning Science and Technology Program (Key R&D Plan)
    • Liaoning Revitalization Talents Program
    • Taishan Scholars Program of Shandong Province
    • Fundamental Research Funds for the Central Universities

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)325
    • Downloads (Last 6 weeks)63
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Novel Multi-view Hypergraph Adaptive Fusion Approach for Representation LearningProceedings of the Third International Workshop on Social and Metaverse Computing, Sensing and Networking10.1145/3698387.3700000(43-49)Online publication date: 4-Nov-2024
    • (2024)Visual Saliency Detection Based on Global Guidance Map and Background AttentionIEEE Access10.1109/ACCESS.2024.341428512(95434-95446)Online publication date: 2024

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media