Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

DSFNet: dynamic selection-fusion networks for video salient object detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

How to effectively fuse spatiotemporal clues is the key to improve the accuracy of video salient object detection. Although most existing methods have achieved great success regarding fusion strategies, the issue of reliability of spatiotemporal clues needs further investigation, and the use of unreliable spatiotemporal clues can corrupt the final saliency results. In this work, we propose a novel dynamic selection-fusion network (DSFNet) for video salient object detection, and DSFNet is jointly constructed by two branches. The one is the spatial learning network, which completes the learning of video sequences to obtain the spatial saliency of images. The other is the spatiotemporal contrast network, which creatively obtains the dynamic spatiotemporal saliency in the synchronized state by learning the video sequence and the corresponding optical flow images. To further screen and fuse the spatiotemporal clues, a series of joint modules for selection were developed, mainly including contrast transformation module (CTM), contrast analysis module (CAM) and selection guidance module (SGM) which play an important role in selecting spatiotemporal features. In addition, a fusion refinement module (FRM) is designed to further refine and enhance the input features. The experimental results show that the proposed method is significantly better than other algorithms in solving the problem of motion information distortion and spatiotemporal salient irrelevance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are not publicly available due to [REASON(S) WHY DATA ARE NOT PUBLIC] but are available from the corresponding author on reasonable request.

References

  1. Cornia M, Baraldi L, Serra G, Cucchiara R (2018) Paying more attention to saliency: image captioning with saliency and context attention. Acm Trans Multimed Comput Commun Appl 14(2)

  2. Jerripothula KR, Cai JF, Yuan JS (2019) Efficient video object co-localization with co-saliency activated tracklets. IEEE Trans Circ Syst Vid Technol 29(3):744–755

    Article  Google Scholar 

  3. Fan RC, Cheng MM, Hou QB, Mu TJ, Wang JD, Hu SM, Soc IC (2019) S4Net: single stage salient-instance segmentation. IEEE Conf Comput Vis Pattern Recognit 6096–6105

  4. Borji A, Cheng MM, Jiang HZ, Li J (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24(12):5706–5722

    Article  MathSciNet  Google Scholar 

  5. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259

    Article  Google Scholar 

  6. Chen CLZ, Li S, Qin H, Hao AM (2016) Robust salient motion detection in non-stationary videos via novel integrated strategies of spatio-temporal coherency cues and low-rank analysis. Pattern Recog 52:410–432

    Article  Google Scholar 

  7. Chen CLZ, Li YX, Li S, Qin H, Hao AM (2018) A novel bottom-up saliency detection method for video with dynamic background. Ieee Signal Process Lett 25(2):154–158

    Article  Google Scholar 

  8. Fan DP, Wang WG, Cheng MM, Shen JB, Soc IC (2019) Shifting more attention to video salient object detection. IEEE Conf Comput Vis Pattern Recognit 8546–8556

  9. Chen C, Wang G, Peng C, Fang Y, Zhang D, Qin H (2021) Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans Image Process 30:3995–4007

    Article  Google Scholar 

  10. Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, van der Smagt P, Cremers D, Brox T (2015) FlowNet: learning optical flow with convolutional networks. IEEE Int Conf Comp Vis 2758–2766

  11. Li HF, Chen GQ, Li GB, Yu YZ (2019) Motion guided attention for video salient object detection. IEEE Int Conf Comput Vis 7273–7282

  12. Kong Y, Wang Y, Li A, Huang Q (2021) Self-sufficient feature enhancing networks for video salient object detection. IEEE Trans Multimed 1–1

  13. Xi T, Zhao W, Wang H, Lin WS (2017) Salient object detection with spatiotemporal background priors for video. IEEE Trans Image Process 26(7):3425–3436

    Article  MathSciNet  Google Scholar 

  14. Li GB, Xie Y, Wei TH, Wang KZ, Lin L (2018) Flow guided recurrent neural encoder for video salient object detection. IEEE Conf Comput Vis Pattern Recognit 3243–3252

  15. Tang Y, Zou W, Jin Z, Chen Y, Hua Y, Li X (2019) Weakly supervised salient object detection with spatiotemporal cascade neural networks. IEEE Trans Circ Syst Vid Technol 29(7):1973–1984

    Article  Google Scholar 

  16. Li YX, Li S, Chen CLZ, Hao AM, Qin H (2020) Accurate and robust video saliency detection via self-paced diffusion. IEEE Trans Multimed 22(5):1153–1167

    Article  Google Scholar 

  17. Wang WG, Shen JB, Shao L (2018) Video salient object detection via fully convolutional networks. IEEE Trans Image Process 27(1):38–49

    Article  MathSciNet  Google Scholar 

  18. Li FX, Kim T, Humayun A, Tsai D, Rehg JM (2013) Video segmentation by tracking many figure-ground segments. IEEE Int Conf Comput Vis 2192–2199

  19. Wang WG, Shen JB, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans Image Process 24(11):4185–4196

    Article  MathSciNet  Google Scholar 

  20. Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. IEEE Conference on computer vision and pattern recognition, pp 724–732

  21. Liu JJ, Hou QB, Cheng MM, Feng JS, Jiang JM, Soc IC (2019) A simple pooling-based design for real-time salient object detection. IEEE conference on computer vision and pattern recognition, pp 3912–3921

  22. Yan PX, Li GB, Xie Y, Li Z, Wang C, Chen TS, Lin L (2019) Semi-supervised video salient object detection using pseudo-labels. IEEE international conference on computer vision, pp 7283–7292

  23. Hou QB, Cheng MM, Hu XW, Borji A, Tu ZW, Torr PHS (2019) Deeply supervised salient object detection with short connections. IEEE Trans Pattern Anal Mach Intell 41(4):815–828

    Article  Google Scholar 

  24. Luo Z, Mishra A, Achkar A, Eichel J, Li S, Jodoin P-M (2017) Non-local deep features for salient object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6593–6601

  25. Zhang PP, Wang D, Lu HC, Wang HY, Ruan X (2017) Amulet: aggregating multi-level convolutional features for salient object detection. IEEE Int Conf Computer Vis 202–211

  26. Le TN, Sugimoto A (2018) Video salient object detection using spatiotemporal deep features. IEEE Trans Image Process 27(10):5002–5015

    Article  MathSciNet  Google Scholar 

  27. Guanbin L, Yuan X, Liang L, Yizhou Y (2017) Instance-level salient object segmentation, pp 10–11

  28. Wang TT, Zhang LH, Wang S, Lu HC, Yang G, Ruan X, Borji A (2018) Detect globally, refine locally: a novel approach to saliency detection. IEEE conference on computer vision and pattern recognition, pp 3127–3135

  29. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. Lect Notes Comput Sci 234–241

  30. Lin TY, Dollar P, Girshick R, He KM, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. IEEE conference on computer vision and pattern recognition, pp 936–944

  31. Cong R, Lei J, Fu H, Cheng MM, Lin W, Huang Q (2019) Review of visual saliency detection with comprehensive information. IEEE Trans Circ Syst Vid Technol 29(10):2941–2959

    Article  Google Scholar 

  32. Fortun D, Bouthemy P, Kervrann C (2015) Optical flow modeling and computation: a survey. Comput Vis Image Underst 134:1–21

    Article  Google Scholar 

  33. Yao R, Lin GS, Xia SX, Zhao JQ, Zhou Y (2020) Video object segmentation and tracking: a survey. ACM Trans Intell Syst Technol 11(4)

  34. Han JW, Zhang DW, Hu XT, Guo L, Ren JC, Wu F (2015) Background prior-based salient object detection via deep reconstruction residual. IEEE Trans Circ Syst Vid Technol 25(8):1309–1321

    Article  Google Scholar 

  35. Li GB, Yu YZ (2016) Visual saliency detection based on multiscale deep CNN features. IEEE Trans Image Process 25(11):5012–5024

    Article  MathSciNet  Google Scholar 

  36. Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651

    Article  Google Scholar 

  37. Liu NA, Han JW (2016) DHSNet: deep hierarchical saliency network for salient object detection. IEEE conference on computer vision and pattern recognition, pp 678–686

  38. Zhao T, Wu X (2019) Pyramid feature attention network for saliency detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3080–3089

  39. Liu NA, Han JW, Yang MH (2020) PiCANet: pixel-wise contextual attention learning for accurate saliency detection. IEEE Trans Image Process 29:6438–6451

    Article  Google Scholar 

  40. Wang J, Yang QP, Yang SQ, Chai XL, Zhang WJ (2022) Dual-path processing network for high-resolution salient object detection. Appl Intell 52(10):12034–12048

    Article  Google Scholar 

  41. Wang J, Zhao ZY, Yang SQ, Chai XL, Zhang WJ, Zhang MH (2022) Global contextual guided residual attention network for salient object detection. Appl Intell 52(6):6208–6226

    Article  Google Scholar 

  42. Zhengyun Z, Qingpeng Y, Shangqin Y, Jun W (2021) Depth guided cross-modal residual adaptive network for RGB-D salient object detection. J Phys Conf Ser 1873:012024

  43. Chen YH, Zou WB, Tang Y, Li X, Xu C, Komodakis N (2018) SCOM: spatiotemporal constrained optimization for salient object detection. IEEE Trans Image Process 27(7):3345–3357

    Article  MathSciNet  Google Scholar 

  44. Chen CLZ, Wang GT, Peng C, Zhang XW, Qin H (2020) Improved robust video saliency detection based on long-term spatial-temporal information. IEEE Trans Image Process 29:1090–1100

    Article  MathSciNet  Google Scholar 

  45. Jian MW, Wang JJ, Yu H, Wang GG (2021) Integrating object proposal with attention networks for video saliency detection. Inform Sci 576:819–830

    Article  MathSciNet  Google Scholar 

  46. Xu MZ, Liu B, Fu P, Li JB, Hu YH, Feng S (2020) Video salient object detection via robust seeds extraction and multi-graphs manifold propagation. IEEE Trans Circ Syst Vid Technol 30(7):2191–2206

    Google Scholar 

  47. Ji Y, Zhang H, Jie Z, Ma L, Wu QMJ (2021) CASNet: a cross-attention Siamese network for video salient object detection. IEEE Trans Neural Netw Learn Syst 32(6):2676–2690

    Article  Google Scholar 

  48. Zhang M, Liu J, Wang YF, Piao YR, Yao SY, Ji W, Li JJ, Lu HC, Luo ZX (2021) Dynamic context-sensitive filtering network for video salient object detection, pp 1533–1543

  49. Guo F, Wang W, Shen Z, Shen J, Shao L, Tao D (2020) Motion-aware rapid video saliency detection. IEEE Trans Circ Syst Vid Technol 30(12):4887–4898

    Article  Google Scholar 

  50. Zheng Q, Li Y, Zheng L, Shen Q (2022) Progressively real-time video salient object detection via cascaded fully convolutional networks with motion attention. Neurocomputing 467:465–475

    Article  Google Scholar 

  51. Xu C, Gao Z, Zhang H, Li S, de Albuquerque VHC (2021) Video salient object detection using dual-stream spatiotemporal attention. Appl Soft Comput 108

  52. Ochs P, Malik J, Brox T (2014) Segmentation of moving objects by long term video analysis. IEEE Trans Pattern Anal Mach Intell 36(6):1187–1200

    Article  Google Scholar 

  53. Fan DP, Cheng MM, Liu Y, Li T, Borji A (2017) Structure-measure: a new way to evaluate foreground maps. IEEE international conference on computer vision, pp 4558–4567

  54. Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. IEEE conference on computer vision and pattern recognition, p 1597

  55. Cheng MM, Mitra NJ, Huang XL, Torr PHS, Hu SM (2015) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582

    Article  Google Scholar 

  56. Yang C, Zhang LH, Lu HC, Ruan X, Yang MH (2013) Saliency detection via graph-based manifold ranking. IEEE conference on computer vision and pattern recognition, pp 3166–3173

  57. Li GB, Yu YZ (2015) Visual saliency based on muItiscale deep features. IEEE conference on computer vision and pattern recognition, pp 5455–5463

  58. Wang LJ, Lu HC, Wang YF, Feng MY, Wang D, Yin BC, Ruan X (2017) Learning to detect salient objects with image-level supervision. IEEE conference on computer vision and pattern recognition, pp 3796–3805

  59. Cheng MM, Fan DP (2021) Structure-measure: a new way to evaluate foreground maps. Int J Comput Vision 129(9):2622–2638

    Article  Google Scholar 

  60. De Boer PT, Kroese DP, Mannor S, Rubinstein RY (2005) A tutorial on the cross-entropy method. Annals Oper Res 134(1):19–67

    Article  MathSciNet  Google Scholar 

  61. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

  62. Song HM, Wang WG, Zhao SY, Shen JB, Lam KM (2018) Pyramid dilated deeper ConvLSTM for video salient object detection. Lect Notes Comput Sci 744–760

  63. Li SY, Seybold B, Vorobyov A, Lei XJ, Kuo CCJ (2018) Unsupervised video object segmentation with motion-based bilateral networks. Lect Notes Comput Sci 215–231

  64. Deng J, Dong W, Socher R, Li LJ, Li K, Li FF (2009) ImageNet: a large-scale hierarchical image database. IEEE conference on computer vision and pattern recognition, pp 248–255

  65. He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. IEEE conference on computer vision and pattern recognition, pp 770–778

  66. Ji GP, Fu KR, Wu Z, Fan DP, Shen JB, Shao L (2021) Full-duplex strategy for video object segmentation, pp 4902–4913

  67. Liu Z, Wang Y, Tu Z, Xiao Y, Tang B (2021) TriTransNet. In: Proceedings of the 29th ACM international conference on multimedia, pp 4481–4490

  68. Peijia C, Jianhuang L, Guangcong W, Huajun Z (2021) Confidence-guided adaptive gate and dual differential enhancement for video salient object detection. 2021 IEEE international conference on multimedia and expo (ICME), p 6

  69. Gu YC, Wang LJ, Wang ZQ, Liu Y, Cheng MM, Lu SP, Assoc Advancement Artificial I (2020) Pyramid constrained self-attention network for fast video salient object detection, AAAI Conf Artif Intell, p 1086910876

  70. Chen C, Wang G, Peng C, Zhang X, Qin H (2019) Improved robust video saliency detection based on long-term spatial-temporal information. IEEE Trans Image Process

  71. Wu Z, Su L, Huang QM, Soc IC (2019) Cascaded partial decoder for fast and accurate salient object detection. IEEE conference on computer vision and pattern recognition, pp 3902–3911

  72. Zhao JX, Liu JJ, Fan DP, Cao Y, Yang JF, Cheng MM (2019) EGNet: edge guidance network for salient object detection. IEEE international conference on computer vision, pp 8778–8787

  73. Huang L, Yan P, Li G, Wang Q, Lin L (2019) Attention embedded spatio-temporal network for video salient object detection. IEEE Access 7:166203–166213

    Article  Google Scholar 

  74. Chen CZ, Li SA, Wang YG, Qin H, Hao AM (2017) Video saliency detection via spatial-temporal fusion and low-rank coherency diffusion. IEEE Trans Image Process 26(7):3156–3170

    Article  MathSciNet  Google Scholar 

  75. Liu Z, Li J, Ye L, Sun G, Shen L (2017) Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE Trans Circ Syst Vid Technol 27(12):2527–2542

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China Youth Fund (No.62202142), the Science and Technology Foundation of Henan Province of China (No.212102210156) and the Scientific Research Key Foundation of Higher Education Institutions of Henan Province (No.23A520025).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Jun Wang, Zhu Huang, Ziqing Huang and Miaohui Zhang. The first draft of the manuscript was written by Xing Ren and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xing Ren.

Ethics declarations

Informed consent

The data used did not involve human participants and animal studies.

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, “DSFNet: Dynamic Selection-fusion Networks for Video Salient Object Detection”.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Huang, Z., Huang, Z. et al. DSFNet: dynamic selection-fusion networks for video salient object detection. Multimed Tools Appl 83, 53139–53164 (2024). https://doi.org/10.1007/s11042-023-17614-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17614-w

Keywords