Local-Global Interaction and Progressive Aggregation for Video Salient Object Detection

Min, Dingyao; Zhang, Chao; Lu, Yukang; Fu, Keren; Zhao, Qijun

doi:10.1007/978-981-99-1645-0_9

Dingyao Min¹⁰,
Chao Zhang^11,12,
Yukang Lu¹⁰,
Keren Fu¹⁰ &
…
Qijun Zhao¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1793))

Included in the following conference series:

International Conference on Neural Information Processing

1104 Accesses

Abstract

Video salient object detection (VSOD) aims at locating and segmenting visually distinctive objects in a video sequence. There still exist two problems that are not well handled in VSOD. First, facing unequal and unreliable spatio-temporal information in complex scenes, existing methods only exploit local information from different hierarchies for interaction and neglect the role of global saliency information. Second, they pay little attention to the refinement of the modality-specific features by ignoring fused high-level features. To alleviate the above issues, in this paper, we propose a novel framework named IANet, which contains local-global interaction (LGI) modules and progressive aggregation (PA) modules. LGI locally captures complementary representation to enhance RGB and OF (optical flow) features mutually, and meanwhile globally learns confidence weights of the corresponding saliency branch for elaborate interaction. In addition, PA evolves and aggregates RGB features, OF features and up-sampled features from the higher level, and can refine saliency-related features progressively. The sophisticated designs of interaction and aggregation phases effectively boost the performance. Experimental results on six benchmark datasets demonstrate the superiority of our IANet over nine cutting-edge VSOD models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Fie-net: spatiotemporal full-stage interaction enhancement network for video salient object detection

Article 17 June 2024

Video salient object detection via self-attention-guided multilayer cross-stack fusion

Article 15 November 2023

DSFNet: dynamic selection-fusion networks for video salient object detection

Article 16 November 2023

References

Chen, C., Li, S., Wang, Y., Qin, H., Hao, A.: Video saliency detection via spatial-temporal fusion and low-rank coherency diffusion. IEEE Trans. Image Process. 26(7), 3156–3170 (2017)
Article MathSciNet MATH Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs 40(4), 834–848 (2017)
Google Scholar
Chen, P., Lai, J., Wang, G., Zhou, H.: Confidence-guided adaptive gate and dual differential enhancement for video salient object detection. In: 2021 IEEE International Conference on Multimedia and Expo (ICME). pp. 1–6. IEEE (2021)
Google Scholar
Fan, D.P., Cheng, M.M., Liu, Y., Li, T., Borji, A.: Structure-measure: A new way to evaluate foreground maps. In: Proceedings of the IEEE international conference on computer vision. pp. 4548–4557 (2017)
Google Scholar
Fan, D.P., Gong, C., Cao, Y., Ren, B., Cheng, M.M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421 (2018)
Fan, D.P., Wang, W., Cheng, M.M., Shen, J.: Shifting more attention to video salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8554–8564 (2019)
Google Scholar
Gu, Y., Wang, L., Wang, Z., Liu, Y., Cheng, M.M., Lu, S.P.: Pyramid constrained self-attention network for fast video salient object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Itti, L.: Automatic foveation for video compression using a neurobiological model of visual attention. IEEE TIP 13(10), 1304–1318 (2004)
Google Scholar
Ji, G.P., Fu, K., Wu, Z., Fan, D.P., Shen, J., Shao, L.: Full-duplex strategy for video object segmentation. In: ICCV (2021)
Google Scholar
Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2192–2199 (2013)
Google Scholar
Li, G., Xie, Y., Wei, T., Wang, K., Lin, L.: Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE Conference on Computer vision and Pattern Recognition, pp. 3243–3252 (2018)
Google Scholar
Li, H., Chen, G., Li, G., Yizhou, Y.: Motion guided attention for video salient object detection. In: Proceedings of International Conference on Computer Vision (2019)
Google Scholar
Li, J., Xia, C., Chen, X.: A benchmark dataset and saliency-guided stacked autoencoders for video-based salient object detection. IEEE Trans. Image Process. 27(1), 349–364 (2017)
Article MathSciNet MATH Google Scholar
Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1187–1200 (2013)
Article Google Scholar
Pan, Y., Yao, T., Li, H., Mei, T.: Video captioning with transferred semantic attributes. In: CVPR, pp. 6504–6512 (2017)
Google Scholar
Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A.: Saliency filters: Contrast based filtering for salient region detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 733–740. IEEE (2012)
Google Scholar
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–732 (2016)
Google Scholar
Rahman, M.A., Wang, Y.: Optimizing intersection-over-union in deep neural networks for image segmentation. In: ISVC, pp. 234–244 (2016)
Google Scholar
Ren, S., Han, C., Yang, X., Han, G., He, S.: TENet: triple excitation network for video salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 212–228. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_13
Chapter Google Scholar
Song, H., Wang, W., Zhao, S., Shen, J., Lam, K.-M.: Pyramid dilated deeper ConvLSTM for video salient object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 744–760. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_44
Chapter Google Scholar
Su, Y., Wang, W., Liu, J., Jing, P., Yang, X.: Ds-net: dynamic spatiotemporal network for video salient object detection. arXiv preprint arXiv:2012.04886 (2020)
Tang, Y., Zou, W., Jin, Z., Chen, Y., Hua, Y., Li, X.: Weakly supervised salient object detection with spatiotemporal cascade neural networks. IEEE Trans. Circuits Syst. Video Technol. 29(7), 1973–1984 (2018)
Article Google Scholar
Teed, Z., Deng, J.: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Chapter Google Scholar
Tu, W.C., He, S., Yang, Q., Chien, S.Y.: Real-time salient object detection with a minimum spanning tree. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2334–2342 (2016)
Google Scholar
Wang, L., Lu, H., Wang, Y., Feng, M., Wang, D., Yin, B., Ruan, X.: Learning to detect salient objects with image-level supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 136–145 (2017)
Google Scholar
Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24(11), 4185–4196 (2015)
Article MathSciNet MATH Google Scholar
Wang, W., Shen, J., Shao, L.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 27(1), 38–49 (2017)
Article MathSciNet MATH Google Scholar
Xi, T., Zhao, W., Wang, H., Lin, W.: Salient object detection with spatiotemporal background priors for video. IEEE Trans. Image Process. 26(7), 3425–3436 (2016)
Article MathSciNet MATH Google Scholar
Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: CVPR. pp. 3586–3593 (2013)
Google Scholar

Download references

Acknowledgements

This work was supported by the NSFC (62176169, 621761 70, 61971005), SCU-Luzhou Municipal Peoples Government Strategic Cooperation Project (2020CDLZ-10), and Intelligent Policing Key Laboratory of Sichuan Province (ZNJW2022KFMS001, ZNJW2022ZZMS001).

Author information

Authors and Affiliations

College of Computer Science, Sichuan University, Chengdu, 610065, China
Dingyao Min, Yukang Lu, Keren Fu & Qijun Zhao
Intelligent Policing Key Laboratory of Sichuan Province, Luzhou, 646000, China
Chao Zhang
Sichuan Police College, Luzhou, 646000, China
Chao Zhang

Authors

Dingyao Min
View author publications
You can also search for this author in PubMed Google Scholar
Chao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yukang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Keren Fu
View author publications
You can also search for this author in PubMed Google Scholar
Qijun Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Chao Zhang or Keren Fu .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Min, D., Zhang, C., Lu, Y., Fu, K., Zhao, Q. (2023). Local-Global Interaction and Progressive Aggregation for Video Salient Object Detection. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1793. Springer, Singapore. https://doi.org/10.1007/978-981-99-1645-0_9

Download citation

DOI: https://doi.org/10.1007/978-981-99-1645-0_9
Published: 14 April 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1644-3
Online ISBN: 978-981-99-1645-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Local-Global Interaction and Progressive Aggregation for Video Salient Object Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Fie-net: spatiotemporal full-stage interaction enhancement network for video salient object detection

Video salient object detection via self-attention-guided multilayer cross-stack fusion

DSFNet: dynamic selection-fusion networks for video salient object detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Local-Global Interaction and Progressive Aggregation for Video Salient Object Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Fie-net: spatiotemporal full-stage interaction enhancement network for video salient object detection

Video salient object detection via self-attention-guided multilayer cross-stack fusion

DSFNet: dynamic selection-fusion networks for video salient object detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation