Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Multi-modality information refinement fusion network for RGB-D salient object detection

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

RGB-D salient object detection (SOD) has gained more and more research interest in recent years. Due to various imaging mechanisms of RGB and depth modalities, RGB-D images contain different information. Thus, how to effectively fuse multi-modality features and aggregate multi-scale features to generate accurate saliency prediction are still the problems. In this article, we present a Multi-Modality Information Refinement Fusion Network (MIRFNet) for RGB-D SOD to solve the problems. Specifically, a Feature-Enhancement and Cross-Refinement Module (FCM) is proposed to reduce redundant features and the gap between cross-modality data to achieve multi-modality feature fusion effectively. In FCM, the Feature-Enhancement step utilizes attention mechanisms to obtain enhanced features which contain less redundant information and more common salient information, and the Cross-Refinement step employs the enhanced features to reduce the gap between cross-modality features and achieve effective feature fusion. Then, we propose an Edge Guidance Module (EGM) to extract edge information from RGB features. Finally, to effectively aggregate multi-level features and achieve accurate saliency prediction, a Feature-Aggregation and Edge-Refinement Module (FEM) is designed, which introduces specific-modality information and edge information to conduct sufficient information interaction. In FEM, the Feature-Aggregation step aggregates multi-scale features with specific-modality information, and the Edge-Refinement step uses edge information to refine the aggregation features. Extensive experiments demonstrate that MIRFNet can achieve comparable performance against the other 12 SOTA methods on five datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1:
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The data that has been used is confidential.

References

  1. Li, Y., Košecká, J.: Uncertainty aware proposal segmentation for unknown object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 241–250 (2022)

  2. Kuznietsov, Y., Proesmans, M., Van Gool, L.: Towards unsupervised online domain adaptation for semantic segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 261–271 (2022)

  3. Jiang, M., Sogabe, R., Shimasaki, K., Hu, S., Senoo, T., Ishii, I.: 500-fps omnidirectional visual tracking using three-axis active vision system. IEEE Trans. Instrum. Meas. 70, 1–11 (2021)

    Google Scholar 

  4. Zhai, H., Lai, S., Jin, H., Qian, X., Mei, T.: Deep transfer hashing for image retrieval. IEEE Trans. Circuits Syst. Video Technol. 31(2), 742–753 (2020)

    Article  Google Scholar 

  5. Zhong, X., Lu, T., Huang, W., Ye, M., Jia, X., Lin, C.-W.: Grayscale enhancement colorization network for visible-infrared person re-identification. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1418–1430 (2021)

    Article  Google Scholar 

  6. Mikriukov, G., Ravanbakhsh, M., Demir, B.: Unsupervised contrastive hashing for cross-modal retrieval in remote sensing. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4463–4467 (2022). IEEE

  7. Ye, M., Shi, Q., Su, K., Du, B.: Cross-modality pyramid alignment for visual intention understanding. IEEE Trans. Image Process. 32, 2190–2201 (2023)

    Article  Google Scholar 

  8. Cheng, J., Ren, Z., Zhang, Q., Gao, X., Hao, F.: Cross-modality compensation convolutional neural networks for RGB-D action recognition. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1498–1509 (2021)

    Article  Google Scholar 

  9. Hu, P., Huang, Z., Peng, D., Wang, X., Peng, X.: Cross-modal retrieval with partially mismatched pairs. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023)

  10. Zhu, C., Li, G., Wang, W., Wang, R.: An innovative salient object detection using center-dark channel prior. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1509–1515 (2017)

  11. Huang, P., Shen, C.-H., Hsiao, H.-F.: Rgbd salient object detection using spatially coherent deep learning framework. In: 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP), pp. 1–5 (2018). IEEE

  12. Chen, T., Xiao, J., Hu, X., Zhang, G., Wang, S.: Adaptive fusion network for RGB-D salient object detection. Neurocomputing 522, 152–164 (2023)

    Article  Google Scholar 

  13. Zhang, M., Ren, W., Piao, Y., Rong, Z., Lu, H.: Select, supplement and focus for RGB-D saliency detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3472–3481 (2020)

  14. Xia, C., Duan, S., Ge, B., Zhang, H., Li, K.-C.: Hdnet: Multi-modality hierarchy-aware decision network for RGB-D salient object detection. IEEE Signal Process. Lett. 29, 2577–2581 (2022)

    Article  Google Scholar 

  15. Li, G., Liu, Z., Ling, H.: Icnet: Information conversion network for RGB-D based salient object detection. IEEE Trans. Image Process. 29, 4873–4884 (2020)

    Article  Google Scholar 

  16. Li, C., Cong, R., Kwong, S., Hou, J., Fu, H., Zhu, G., Zhang, D., Huang, Q.: Asif-net: Attention steered interweave fusion network for RGB-D salient object detection. IEEE Trans. Cyber. 51(1), 88–100 (2020)

    Article  Google Scholar 

  17. Zhou, W., Zhu, Y., Lei, J., Wan, J., Yu, L.: Ccafnet: Crossflow and cross-scale adaptive fusion network for detecting salient objects in RGB-D images. IEEE Trans. Multimedia 24, 2192–2204 (2021)

    Article  Google Scholar 

  18. Liu, Z., Wang, Y., Tu, Z., Xiao, Y., Tang, B.: Tritransnet: Rgb-d salient object detection with a triplet transformer embedding network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4481–4490 (2021)

  19. Wu, J., Sun, F., Xu, R., Meng, J., Wang, F.: Aggregate interactive learning for RGB-D salient object detection. Expert Syst. Appl. 195, 116614 (2022)

    Article  Google Scholar 

  20. Cong, R., Lin, Q., Zhang, C., Li, C., Cao, X., Huang, Q., Zhao, Y.: Cir-net: Cross-modality interaction and refinement for RGB-D salient object detection. IEEE Trans. Image Process. 31, 6800–6815 (2022)

    Article  Google Scholar 

  21. Zhou, T., Fu, H., Chen, G., Zhou, Y., Fan, D.-P., Shao, L.: Specificity-preserving RGB-D saliency detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4681–4691 (2021)

  22. Wei, L., Zong, G.: Ega-net: Edge feature enhancement and global information attention network for RGB-D salient object detection. Inform. Sci. 626, 223–248 (2023)

    Article  Google Scholar 

  23. Xu, Y., Zhao, L., Cao, S., Feng, S.: Dual priors network for rgb-d salient object detection. In: 2022 IEEE International Conference on Big Data (Big Data), pp. 4201–4209 (2022). IEEE

  24. Zhao, X., Zhang, L., Pang, Y., Lu, H., Zhang, L.: A single stream network for robust and real-time rgb-d salient object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pp. 646–662 (2020). Springer

  25. Song, H., Liu, Z., Du, H., Sun, G., Le Meur, O., Ren, T.: Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Trans. Image Process. 26(9), 4204–4216 (2017)

    Article  MathSciNet  Google Scholar 

  26. Fan, D.-P., Zhai, Y., Borji, A., Yang, J., Shao, L.: Bbs-net: Rgb-d salient object detection with a bifurcated backbone strategy network. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII, pp. 275–292 (2020). Springer

  27. Fang, X., Zhu, J., Shao, X., Wang, H.: Grouptransnet: Group transformer network for rgb-d salient object detection. arXiv preprint arXiv:2203.10785 (2022)

  28. Yang, Y., Qin, Q., Luo, Y., Liu, Y., Zhang, Q., Han, J.: Bi-directional progressive guidance network for RGB-D salient object detection. IEEE Trans. Circuits Syst. Video Technol. 32(8), 5346–5360 (2022)

    Article  Google Scholar 

  29. Song, Q., Li, J., Li, C., Guo, H., Huang, R.: Fully attentional network for semantic segmentation. Proc. AAAI Conf. Artif. Intelli. 36, 2280–2288 (2022)

    Google Scholar 

  30. Qin, J., Wu, J., Xiao, X., Li, L., Wang, X.: Activation modulation and recalibration scheme for weakly supervised semantic segmentation. Proc. AAAI Conf. Artif. Intelli. 36, 2117–2125 (2022)

    Google Scholar 

  31. Gao, Y., Dai, M., Zhang, Q.: Cross-modal and multi-level feature refinement network for RGB-D salient object detection. The Visual Computer, 1–16 (2022)

  32. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  33. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  34. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)

  35. Niu, Y., Long, G., Liu, W., Guo, W., He, S.: Boundary-aware RGB-D salient object detection with cross-modal feature sampling. IEEE Trans. Image Process. 29, 9496–9507 (2020)

    Article  Google Scholar 

  36. Jiang, B., Zhou, Z., Wang, X., Tang, J., Luo, B.: Cmsalgan: Rgb-D salient object detection with cross-view generative adversarial networks. IEEE Trans. Multimedia 23, 1343–1353 (2020)

    Article  Google Scholar 

  37. Tu, Z., Ma, Y., Li, C., Tang, J., Luo, B.: Edge-guided non-local fully convolutional network for salient object detection. IEEE Trans. Circuits Syst. Video Technol. 31(2), 582–593 (2020)

    Article  Google Scholar 

  38. Zhou, X., Shen, K., Weng, L., Cong, R., Zheng, B., Zhang, J., Yan, C.: Edge-guided recurrent positioning network for salient object detection in optical remote sensing images. IEEE Trans. Cyber. 53(1), 539–552 (2022)

    Article  Google Scholar 

  39. Zhu, G., Li, J., Guo, Y.: Supplement and suppression: Both boundary and nonboundary are helpful for salient object detection. IEEE Trans. Neural Netw. Learn. Syst. (2021)

  40. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241 (2015). Springer

  41. Sun, Y., Peng, D., Huang, H., Ren, Z.: Feature and semantic views consensus hashing for image set classification. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 2097–2105 (2022)

  42. Fan, X., Jiang, W., Luo, H., Mao, W.: Modality-transfer generative adversarial network and dual-level unified latent representation for visible thermal person re-identification. The Visual Computer, 1–16 (2022)

  43. Wang, Y., Xu, K., Chai, Y., Jiang, Y., Qi, G.: Semantic consistent feature construction and multi-granularity feature learning for visible-infrared person re-identification. The Visual Computer, 1–17 (2023)

  44. Sun, Y., Ren, Z., Hu, P., Peng, D., Wang, X.: Hierarchical consensus hashing for cross-modal retrieval. IEEE Trans. Multimedia (2023)

  45. Cai, Y., Zhou, W., Zhang, L., Yu, L., Luo, T.: Dhfnet: Dual-decoding hierarchical fusion network for RGB-thermal semantic segmentation. The Visual Computer, 1–11 (2023)

  46. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  Google Scholar 

  47. Zhu, C., Xu, J., Feng, D., Xie, R., Song, L.: Edge-based video compression texture synthesis using generative adversarial network. IEEE Trans. Circuits Syst. Video Technol. 32(10), 7061–7076 (2022)

    Article  Google Scholar 

  48. Guo, Y., Tao, Y., Chong, Y., Pan, S., Liu, M.: Edge-guided hyperspectral image compression with interactive dual attention. IEEE Trans. Geosci. Remote Sens. 61, 1–17 (2022)

    Google Scholar 

  49. Ma, J., Wang, T., Li, G., Zhan, Q., Wu, D., Chang, Y., Xue, Y., Zhang, Y., Zuo, J.: Concrete surface roughness measurement method based on edge detection. The Visual Computer, 1–12 (2023)

  50. Gao, Y., Qi, Z., Zhao, D.: Edge-enhanced instance segmentation by grid regions of interest. Vis. Comput. 39(3), 1137–1148 (2023)

    Article  Google Scholar 

  51. Pradhan, K., Patra, S.: Semantic-aware structure-preserving median morpho-filtering. The Visual Computer, 1–17 (2023)

  52. Wei, J., Wang, S., Huang, Q.: F\(^3\)net: fusion, feedback and focus for salient object detection. In: Proc. AAAI Conf. Artif. Intelli. 34: 12321–12328 (2020)

  53. Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 1115–1119 (2014). IEEE

  54. Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: Rgbd salient object detection: A benchmark and algorithms. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part III 13, pp. 92–109 (2014). Springer

  55. Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 454–461 (2012). IEEE

  56. Fan, D.-P., Lin, Z., Zhang, Z., Zhu, M., Cheng, M.-M.: Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Trans. Neural Netw. Learn. Syst. 32(5), 2075–2089 (2020)

    Article  Google Scholar 

  57. Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7254–7263 (2019)

  58. Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1597–1604 (2009). IEEE

  59. Borji, A., Cheng, M.-M., Jiang, H., Li, J.: Salient object detection: a benchmark. IEEE Trans. Image Process. 24(12), 5706–5722 (2015)

    Article  MathSciNet  Google Scholar 

  60. Fan, D.-P., Cheng, M.-M., Liu, Y., Li, T., Borji, A.: Structure-measure: A new way to evaluate foreground maps. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4548–4557 (2017)

  61. Fan, D.-P., Gong, C., Cao, Y., Ren, B., Cheng, M.-M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421 (2018)

  62. Gao, S.-H., Cheng, M.-M., Zhao, K., Zhang, X.-Y., Yang, M.-H., Torr, P.: Res2net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2019)

    Article  Google Scholar 

  63. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  64. Pang, Y., Zhang, L., Zhao, X., Lu, H.: Hierarchical dynamic filtering network for rgb-d salient object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pp. 235–252 (2020). Springer

  65. Fu, K., Fan, D.-P., Ji, G.-P., Zhao, Q.: Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3052–3062 (2020)

  66. Chen, S., Fu, Y.: Progressively guided alternate refinement network for rgb-d salient object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pp. 520–538 (2020). Springer

  67. Li, G., Liu, Z., Chen, M., Bai, Z., Lin, W., Ling, H.: Hierarchical alternate interaction network for RGB-D salient object detection. IEEE Trans. Image Process. 30, 3528–3542 (2021)

    Article  Google Scholar 

  68. Ji, W., Li, J., Yu, S., Zhang, M., Piao, Y., Yao, S., Bi, Q., Ma, K., Zheng, Y., Lu, H., : Calibrated rgb-d salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9471–9481 (2021)

  69. Sun, P., Zhang, W., Wang, H., Li, S., Li, X.: Deep rgb-d saliency detection with depth-sensitive attention and automatic multi-modal fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1407–1417 (2021)

  70. Wang, F., Pan, J., Xu, S., Tang, J.: Learning discriminative cross-modality features for RGB-D saliency detection. IEEE Trans. Image Process. 31, 1285–1297 (2022)

    Article  Google Scholar 

  71. Lee, M., Park, C., Cho, S., Lee, S.: Spsn: Superpixel prototype sampling network for rgb-d salient object detection. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIX, pp. 630–647 (2022). Springer

Download references

Acknowledgements

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hua Bao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bao, H., Fan, B. Multi-modality information refinement fusion network for RGB-D salient object detection. Vis Comput 40, 4183–4199 (2024). https://doi.org/10.1007/s00371-023-03076-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-03076-6

Keywords