DSFNet: dynamic selection-fusion networks for video salient object detection

Wang, Jun; Huang, Zhu; Huang, Ziqing; Zhang, Miaohui; Ren, Xing

doi:10.1007/s11042-023-17614-w

DSFNet: dynamic selection-fusion networks for video salient object detection

Published: 16 November 2023

Volume 83, pages 53139–53164, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jun Wang¹,
Zhu Huang¹,
Ziqing Huang¹,
Miaohui Zhang¹ &
…
Xing Ren¹

281 Accesses
Explore all metrics

Abstract

How to effectively fuse spatiotemporal clues is the key to improve the accuracy of video salient object detection. Although most existing methods have achieved great success regarding fusion strategies, the issue of reliability of spatiotemporal clues needs further investigation, and the use of unreliable spatiotemporal clues can corrupt the final saliency results. In this work, we propose a novel dynamic selection-fusion network (DSFNet) for video salient object detection, and DSFNet is jointly constructed by two branches. The one is the spatial learning network, which completes the learning of video sequences to obtain the spatial saliency of images. The other is the spatiotemporal contrast network, which creatively obtains the dynamic spatiotemporal saliency in the synchronized state by learning the video sequence and the corresponding optical flow images. To further screen and fuse the spatiotemporal clues, a series of joint modules for selection were developed, mainly including contrast transformation module (CTM), contrast analysis module (CAM) and selection guidance module (SGM) which play an important role in selecting spatiotemporal features. In addition, a fusion refinement module (FRM) is designed to further refine and enhance the input features. The experimental results show that the proposed method is significantly better than other algorithms in solving the problem of motion information distortion and spatiotemporal salient irrelevance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fie-net: spatiotemporal full-stage interaction enhancement network for video salient object detection

Article 17 June 2024

Local-Global Interaction and Progressive Aggregation for Video Salient Object Detection

Video Salient Object Extraction Model Guided by Spatio-Temporal Contrast

Data availability

The datasets generated during and/or analysed during the current study are not publicly available due to [REASON(S) WHY DATA ARE NOT PUBLIC] but are available from the corresponding author on reasonable request.

References

Cornia M, Baraldi L, Serra G, Cucchiara R (2018) Paying more attention to saliency: image captioning with saliency and context attention. Acm Trans Multimed Comput Commun Appl 14(2)
Jerripothula KR, Cai JF, Yuan JS (2019) Efficient video object co-localization with co-saliency activated tracklets. IEEE Trans Circ Syst Vid Technol 29(3):744–755
Article Google Scholar
Fan RC, Cheng MM, Hou QB, Mu TJ, Wang JD, Hu SM, Soc IC (2019) S4Net: single stage salient-instance segmentation. IEEE Conf Comput Vis Pattern Recognit 6096–6105
Borji A, Cheng MM, Jiang HZ, Li J (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24(12):5706–5722
Article MathSciNet Google Scholar
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Article Google Scholar
Chen CLZ, Li S, Qin H, Hao AM (2016) Robust salient motion detection in non-stationary videos via novel integrated strategies of spatio-temporal coherency cues and low-rank analysis. Pattern Recog 52:410–432
Article Google Scholar
Chen CLZ, Li YX, Li S, Qin H, Hao AM (2018) A novel bottom-up saliency detection method for video with dynamic background. Ieee Signal Process Lett 25(2):154–158
Article Google Scholar
Fan DP, Wang WG, Cheng MM, Shen JB, Soc IC (2019) Shifting more attention to video salient object detection. IEEE Conf Comput Vis Pattern Recognit 8546–8556
Chen C, Wang G, Peng C, Fang Y, Zhang D, Qin H (2021) Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans Image Process 30:3995–4007
Article Google Scholar
Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, van der Smagt P, Cremers D, Brox T (2015) FlowNet: learning optical flow with convolutional networks. IEEE Int Conf Comp Vis 2758–2766
Li HF, Chen GQ, Li GB, Yu YZ (2019) Motion guided attention for video salient object detection. IEEE Int Conf Comput Vis 7273–7282
Kong Y, Wang Y, Li A, Huang Q (2021) Self-sufficient feature enhancing networks for video salient object detection. IEEE Trans Multimed 1–1
Xi T, Zhao W, Wang H, Lin WS (2017) Salient object detection with spatiotemporal background priors for video. IEEE Trans Image Process 26(7):3425–3436
Article MathSciNet Google Scholar
Li GB, Xie Y, Wei TH, Wang KZ, Lin L (2018) Flow guided recurrent neural encoder for video salient object detection. IEEE Conf Comput Vis Pattern Recognit 3243–3252
Tang Y, Zou W, Jin Z, Chen Y, Hua Y, Li X (2019) Weakly supervised salient object detection with spatiotemporal cascade neural networks. IEEE Trans Circ Syst Vid Technol 29(7):1973–1984
Article Google Scholar
Li YX, Li S, Chen CLZ, Hao AM, Qin H (2020) Accurate and robust video saliency detection via self-paced diffusion. IEEE Trans Multimed 22(5):1153–1167
Article Google Scholar
Wang WG, Shen JB, Shao L (2018) Video salient object detection via fully convolutional networks. IEEE Trans Image Process 27(1):38–49
Article MathSciNet Google Scholar
Li FX, Kim T, Humayun A, Tsai D, Rehg JM (2013) Video segmentation by tracking many figure-ground segments. IEEE Int Conf Comput Vis 2192–2199
Wang WG, Shen JB, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans Image Process 24(11):4185–4196
Article MathSciNet Google Scholar
Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. IEEE Conference on computer vision and pattern recognition, pp 724–732
Liu JJ, Hou QB, Cheng MM, Feng JS, Jiang JM, Soc IC (2019) A simple pooling-based design for real-time salient object detection. IEEE conference on computer vision and pattern recognition, pp 3912–3921
Yan PX, Li GB, Xie Y, Li Z, Wang C, Chen TS, Lin L (2019) Semi-supervised video salient object detection using pseudo-labels. IEEE international conference on computer vision, pp 7283–7292
Hou QB, Cheng MM, Hu XW, Borji A, Tu ZW, Torr PHS (2019) Deeply supervised salient object detection with short connections. IEEE Trans Pattern Anal Mach Intell 41(4):815–828
Article Google Scholar
Luo Z, Mishra A, Achkar A, Eichel J, Li S, Jodoin P-M (2017) Non-local deep features for salient object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6593–6601
Zhang PP, Wang D, Lu HC, Wang HY, Ruan X (2017) Amulet: aggregating multi-level convolutional features for salient object detection. IEEE Int Conf Computer Vis 202–211
Le TN, Sugimoto A (2018) Video salient object detection using spatiotemporal deep features. IEEE Trans Image Process 27(10):5002–5015
Article MathSciNet Google Scholar
Guanbin L, Yuan X, Liang L, Yizhou Y (2017) Instance-level salient object segmentation, pp 10–11
Wang TT, Zhang LH, Wang S, Lu HC, Yang G, Ruan X, Borji A (2018) Detect globally, refine locally: a novel approach to saliency detection. IEEE conference on computer vision and pattern recognition, pp 3127–3135
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. Lect Notes Comput Sci 234–241
Lin TY, Dollar P, Girshick R, He KM, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. IEEE conference on computer vision and pattern recognition, pp 936–944
Cong R, Lei J, Fu H, Cheng MM, Lin W, Huang Q (2019) Review of visual saliency detection with comprehensive information. IEEE Trans Circ Syst Vid Technol 29(10):2941–2959
Article Google Scholar
Fortun D, Bouthemy P, Kervrann C (2015) Optical flow modeling and computation: a survey. Comput Vis Image Underst 134:1–21
Article Google Scholar
Yao R, Lin GS, Xia SX, Zhao JQ, Zhou Y (2020) Video object segmentation and tracking: a survey. ACM Trans Intell Syst Technol 11(4)
Han JW, Zhang DW, Hu XT, Guo L, Ren JC, Wu F (2015) Background prior-based salient object detection via deep reconstruction residual. IEEE Trans Circ Syst Vid Technol 25(8):1309–1321
Article Google Scholar
Li GB, Yu YZ (2016) Visual saliency detection based on multiscale deep CNN features. IEEE Trans Image Process 25(11):5012–5024
Article MathSciNet Google Scholar
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Article Google Scholar
Liu NA, Han JW (2016) DHSNet: deep hierarchical saliency network for salient object detection. IEEE conference on computer vision and pattern recognition, pp 678–686
Zhao T, Wu X (2019) Pyramid feature attention network for saliency detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3080–3089
Liu NA, Han JW, Yang MH (2020) PiCANet: pixel-wise contextual attention learning for accurate saliency detection. IEEE Trans Image Process 29:6438–6451
Article Google Scholar
Wang J, Yang QP, Yang SQ, Chai XL, Zhang WJ (2022) Dual-path processing network for high-resolution salient object detection. Appl Intell 52(10):12034–12048
Article Google Scholar
Wang J, Zhao ZY, Yang SQ, Chai XL, Zhang WJ, Zhang MH (2022) Global contextual guided residual attention network for salient object detection. Appl Intell 52(6):6208–6226
Article Google Scholar
Zhengyun Z, Qingpeng Y, Shangqin Y, Jun W (2021) Depth guided cross-modal residual adaptive network for RGB-D salient object detection. J Phys Conf Ser 1873:012024
Chen YH, Zou WB, Tang Y, Li X, Xu C, Komodakis N (2018) SCOM: spatiotemporal constrained optimization for salient object detection. IEEE Trans Image Process 27(7):3345–3357
Article MathSciNet Google Scholar
Chen CLZ, Wang GT, Peng C, Zhang XW, Qin H (2020) Improved robust video saliency detection based on long-term spatial-temporal information. IEEE Trans Image Process 29:1090–1100
Article MathSciNet Google Scholar
Jian MW, Wang JJ, Yu H, Wang GG (2021) Integrating object proposal with attention networks for video saliency detection. Inform Sci 576:819–830
Article MathSciNet Google Scholar
Xu MZ, Liu B, Fu P, Li JB, Hu YH, Feng S (2020) Video salient object detection via robust seeds extraction and multi-graphs manifold propagation. IEEE Trans Circ Syst Vid Technol 30(7):2191–2206
Google Scholar
Ji Y, Zhang H, Jie Z, Ma L, Wu QMJ (2021) CASNet: a cross-attention Siamese network for video salient object detection. IEEE Trans Neural Netw Learn Syst 32(6):2676–2690
Article Google Scholar
Zhang M, Liu J, Wang YF, Piao YR, Yao SY, Ji W, Li JJ, Lu HC, Luo ZX (2021) Dynamic context-sensitive filtering network for video salient object detection, pp 1533–1543
Guo F, Wang W, Shen Z, Shen J, Shao L, Tao D (2020) Motion-aware rapid video saliency detection. IEEE Trans Circ Syst Vid Technol 30(12):4887–4898
Article Google Scholar
Zheng Q, Li Y, Zheng L, Shen Q (2022) Progressively real-time video salient object detection via cascaded fully convolutional networks with motion attention. Neurocomputing 467:465–475
Article Google Scholar
Xu C, Gao Z, Zhang H, Li S, de Albuquerque VHC (2021) Video salient object detection using dual-stream spatiotemporal attention. Appl Soft Comput 108
Ochs P, Malik J, Brox T (2014) Segmentation of moving objects by long term video analysis. IEEE Trans Pattern Anal Mach Intell 36(6):1187–1200
Article Google Scholar
Fan DP, Cheng MM, Liu Y, Li T, Borji A (2017) Structure-measure: a new way to evaluate foreground maps. IEEE international conference on computer vision, pp 4558–4567
Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. IEEE conference on computer vision and pattern recognition, p 1597
Cheng MM, Mitra NJ, Huang XL, Torr PHS, Hu SM (2015) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582
Article Google Scholar
Yang C, Zhang LH, Lu HC, Ruan X, Yang MH (2013) Saliency detection via graph-based manifold ranking. IEEE conference on computer vision and pattern recognition, pp 3166–3173
Li GB, Yu YZ (2015) Visual saliency based on muItiscale deep features. IEEE conference on computer vision and pattern recognition, pp 5455–5463
Wang LJ, Lu HC, Wang YF, Feng MY, Wang D, Yin BC, Ruan X (2017) Learning to detect salient objects with image-level supervision. IEEE conference on computer vision and pattern recognition, pp 3796–3805
Cheng MM, Fan DP (2021) Structure-measure: a new way to evaluate foreground maps. Int J Comput Vision 129(9):2622–2638
Article Google Scholar
De Boer PT, Kroese DP, Mannor S, Rubinstein RY (2005) A tutorial on the cross-entropy method. Annals Oper Res 134(1):19–67
Article MathSciNet Google Scholar
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Song HM, Wang WG, Zhao SY, Shen JB, Lam KM (2018) Pyramid dilated deeper ConvLSTM for video salient object detection. Lect Notes Comput Sci 744–760
Li SY, Seybold B, Vorobyov A, Lei XJ, Kuo CCJ (2018) Unsupervised video object segmentation with motion-based bilateral networks. Lect Notes Comput Sci 215–231
Deng J, Dong W, Socher R, Li LJ, Li K, Li FF (2009) ImageNet: a large-scale hierarchical image database. IEEE conference on computer vision and pattern recognition, pp 248–255
He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. IEEE conference on computer vision and pattern recognition, pp 770–778
Ji GP, Fu KR, Wu Z, Fan DP, Shen JB, Shao L (2021) Full-duplex strategy for video object segmentation, pp 4902–4913
Liu Z, Wang Y, Tu Z, Xiao Y, Tang B (2021) TriTransNet. In: Proceedings of the 29th ACM international conference on multimedia, pp 4481–4490
Peijia C, Jianhuang L, Guangcong W, Huajun Z (2021) Confidence-guided adaptive gate and dual differential enhancement for video salient object detection. 2021 IEEE international conference on multimedia and expo (ICME), p 6
Gu YC, Wang LJ, Wang ZQ, Liu Y, Cheng MM, Lu SP, Assoc Advancement Artificial I (2020) Pyramid constrained self-attention network for fast video salient object detection, AAAI Conf Artif Intell, p 1086910876
Chen C, Wang G, Peng C, Zhang X, Qin H (2019) Improved robust video saliency detection based on long-term spatial-temporal information. IEEE Trans Image Process
Wu Z, Su L, Huang QM, Soc IC (2019) Cascaded partial decoder for fast and accurate salient object detection. IEEE conference on computer vision and pattern recognition, pp 3902–3911
Zhao JX, Liu JJ, Fan DP, Cao Y, Yang JF, Cheng MM (2019) EGNet: edge guidance network for salient object detection. IEEE international conference on computer vision, pp 8778–8787
Huang L, Yan P, Li G, Wang Q, Lin L (2019) Attention embedded spatio-temporal network for video salient object detection. IEEE Access 7:166203–166213
Article Google Scholar
Chen CZ, Li SA, Wang YG, Qin H, Hao AM (2017) Video saliency detection via spatial-temporal fusion and low-rank coherency diffusion. IEEE Trans Image Process 26(7):3156–3170
Article MathSciNet Google Scholar
Liu Z, Li J, Ye L, Sun G, Shen L (2017) Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE Trans Circ Syst Vid Technol 27(12):2527–2542
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China Youth Fund (No.62202142), the Science and Technology Foundation of Henan Province of China (No.212102210156) and the Scientific Research Key Foundation of Higher Education Institutions of Henan Province (No.23A520025).

Author information

Authors and Affiliations

Henan University, Zhengzhou, China
Jun Wang, Zhu Huang, Ziqing Huang, Miaohui Zhang & Xing Ren

Authors

Jun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ziqing Huang
View author publications
You can also search for this author in PubMed Google Scholar
Miaohui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xing Ren
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Jun Wang, Zhu Huang, Ziqing Huang and Miaohui Zhang. The first draft of the manuscript was written by Xing Ren and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xing Ren.

Ethics declarations

Informed consent

The data used did not involve human participants and animal studies.

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, “DSFNet: Dynamic Selection-fusion Networks for Video Salient Object Detection”.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, J., Huang, Z., Huang, Z. et al. DSFNet: dynamic selection-fusion networks for video salient object detection. Multimed Tools Appl 83, 53139–53164 (2024). https://doi.org/10.1007/s11042-023-17614-w

Download citation

Received: 20 April 2023
Revised: 09 October 2023
Accepted: 25 October 2023
Published: 16 November 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11042-023-17614-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DSFNet: dynamic selection-fusion networks for video salient object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fie-net: spatiotemporal full-stage interaction enhancement network for video salient object detection

Local-Global Interaction and Progressive Aggregation for Video Salient Object Detection

Video Salient Object Extraction Model Guided by Spatio-Temporal Contrast

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Informed consent

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

DSFNet: dynamic selection-fusion networks for video salient object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fie-net: spatiotemporal full-stage interaction enhancement network for video salient object detection

Local-Global Interaction and Progressive Aggregation for Video Salient Object Detection

Video Salient Object Extraction Model Guided by Spatio-Temporal Contrast

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Informed consent

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation