Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Self-attention recurrent network for saliency detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Feature maps in deep neural networks generally contain different semantics. Existing methods often omit their characteristics that may lead to sub-optimal results. In this paper, we propose a novel end-to-end deep saliency network which could effectively utilize multi-scale feature maps according to their characteristics. Shallow layers generally contain more local information, and deep layers have advantages in global semantics. Therefore, our network could generate elaborate saliency maps by exploiting the different semantics of feature maps in different layers. On one hand, local information of shallow layers is enhanced by a recurrent structure which shared convolution kernels at different time steps. On the other hand, global information of deep layers is utilized by a self-attention module, which generates attention weights for salient objects and backgrounds thus achieve better performance. Experimental results on four widely used datasets demonstrate that our method has advantages in performance over existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: A system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16. USENIX Association, pp265–283. http://dl.acm.org/citation.cfm?id=3026877.3026899

  2. Achantay R, Hemamiz S, Estraday F, Su̇sstrunky S (2009) Frequency-tuned salient region detection. In: 2009 IEEE Computer society conference on computer vision and pattern recognition workshops, CVPR workshops 2009, pp 1597–1604. https://doi.org/10.1109/CVPRW.2009.5206596

  3. Bi S, Li G, Yu Y (2014) Person re-identification using multiple experts with random subspaces. Int J Image Graph 2(2):151–157

    Google Scholar 

  4. Borji A, Frintrop S, Sihite DN, Itti L (2012) Adaptive object tracking by learning background context. In: IEEE Computer society conference on computer vision and pattern recognition workshops, pp 23–30. https://doi.org/10.1109/CVPRW.2012.6239191

  5. Cheng M, Zhang F, Mitra N, Huang X, Hu S (2010) RepFinder: Finding Approximately Repeated Scene Elements for Image Editing. ACM Trans Graph TOG 29(4):1. https://doi.org/10.1145/1778765.1778820. http://discovery.ucl.ac.uk/1327991/

    Article  Google Scholar 

  6. Cheng MM, Zhang GX, Mitra NJ, Huang X, Hu SM (2011) Global Contrast based Salient Region Detection, pp 409–416. https://doi.org/10.1109/CVPR.2011.5995344

  7. Cheng MM, Hou QB, Zhang SH, Rosin PL (2017) Intelligent visual media processing:when graphics meets vision. J Comput Sci Technol 32(1):110–121

    Article  Google Scholar 

  8. Guo C, Zhang L (2010) A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression. IEEE Trans Image Process 19(1):185–198. https://doi.org/10.1109/TIP.2009.2030969. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5223506

    Article  MathSciNet  Google Scholar 

  9. Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848

  10. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Teh YW, Titterington M (eds) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol 9. PMLR, Chia Laguna Resort, Sardinia, pp 249–256. http://proceedings.mlr.press/v9/glorot10a.html

  11. Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PH (2018) Deeply Supervised Salient Object Detection with Short Connections. https://doi.org/10.1109/TPAMI.2018.2815688

    Article  Google Scholar 

  12. Hua Y, Zhao Z, Tian H, Guo X, Cai A (2013) A probabilistic saliency model with memory-guided top-down cues for free-viewing. In: IEEE International conference on multimedia and expo, pp 1–6

  13. Itti L, Koch C, Niebur E (1998) A Model of Saliency Based Visual Attention for Rapid Scene Analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259. https://doi.org/10.1016/S1053-5357(00)00088-3

    Article  Google Scholar 

  14. Koch C, Ullman S (1985) Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiol 4(4):219–27. https://doi.org/10.1016/j.imavis.2008.02.004. http://www.ncbi.nlm.nih.gov/pubmed/3836989

    Google Scholar 

  15. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. Advances In Neural Information Processing Systems 1:1097–1105. https://doi.org/10.1016/j.protcy.2014.09.007

    Google Scholar 

  16. Kuen J, Wang Z, Wang G (2016) Recurrent attentional networks for saliency detection. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 3668–3677. https://doi.org/10.1109/CVPR.2016.399

  17. Li Y, Hou X, Koch C, Rehg J, Yuille A (2014) The secrets of salient object segmentation, pp 4321–4328. https://doi.org/10.1109/CVPR.2014.43. http://www.stat.ucla.edu/yuille/Pubs10_12/LiHouKochRehgYuille.pdf

  18. Li G, Yu Y (2016) Deep contrast learning for salient object detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, vol 2016, pp 478–487

  19. Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. IEEE Computer Society, Washington, pp 3367–3375. https://doi.org/10.1109/CVPR.2015.7298958. arXiv:https://arxiv.org/abs/1704.07709

    Google Scholar 

  20. Liu N, Han J (2016) DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 678–686. https://doi.org/10.1109/CVPR.2016.80. http://ieeexplore.ieee.org/document/7780449/

  21. Luo Z, Mishra A, Achkar A, Eichel J, Li S, Jodoin P (2017) Non-local deep features for salient object detection. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 6593–6601. https://doi.org/10.1109/CVPR.2017.698

  22. Ma YF, Lu L, Zhang HJ, Li M (2002) A user attention model for video summarization. In: Proceedings of the Tenth ACM International Conference on Multimedia, MULTIMEDIA ’02. ACM, New York, pp 533–542. https://doi.org/10.1145/641007.641116

  23. Simonyan K, Zisserman A (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. pp. 1–14. https://doi.org/10.1016/j.infsof.2008.09.005. arXiv:1409.1556

    Article  Google Scholar 

  24. Wang Y, Zhao Q (2015) Superpixel tracking via graph-based semi-supervised svm and supervised saliency detection. In: IEEE International conference on multimedia and expo, pp 1–6

  25. Wang Y, Lin X, Wu L, Zhang W, Zhang Q, Huang X (2015) Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans Image Process 24(11):3939–3949. https://doi.org/10.1109/TIP.2015.2457339

    Article  MathSciNet  Google Scholar 

  26. Wang T, Zhang L, Lu H, Sun C, Qi J (2016) Kernelized subspace ranking for saliency detection. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 450–466

    Chapter  Google Scholar 

  27. Wang Y, Zhang W, Wu L, Lin X, Fang M, Pan S (2016) Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16. AAAI Press, pp 2153–2159. http://dl.acm.org/citation.cfm?id=3060832.3060922

  28. Wang L, Wang L, Lu H, Zhang P, Xiang R (2016) Saliency detection with recurrent fully convolutional networks. In: European conference on computer vision, pp 825–841

    Chapter  Google Scholar 

  29. Wang T, Borji A, Zhang L, Zhang P, Lu H (2017) A stagewise refinement model for detecting salient objects in images. 2017 IEEE International conference on computer vision (ICCV), pp 4039–4048. https://doi.org/10.1109/ICCV.2017.433

  30. Wang Y, Lin X, Wu L, Zhang W (2017) Effective multi-query expansions: Collaborative deep networks for robust landmark retrieval. IEEE Trans Image Process 26(3):1393–1404. https://doi.org/10.1109/TIP.2017.2655449

    Article  MathSciNet  Google Scholar 

  31. Wang Y, Wu L (2018) Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering, vol 103. https://doi.org/10.1016/j.neunet.2018.03.006. http://www.sciencedirect.com/science/article/pii/S0893608018300911

    Article  Google Scholar 

  32. Wang Y, Wu L, Lin X, Gao J (2018) Multiview spectral clustering via structured low-rank matrix factorization. In: IEEE Transactions on Neural Networks and Learning Systems, pp 1–11. https://doi.org/10.1109/TNNLS.2017.2777489

    Article  Google Scholar 

  33. Wang Y, Zhang W, Wu L, Lin X, Zhao X (2017) Unsupervised metric fusion over multiview data by graph random walk-based cross-view diffusion. IEEE Trans Neural Netw Learn Syst 28(1):57–70. https://doi.org/10.1109/TNNLS.2015.2498149

    Article  Google Scholar 

  34. Wu L, Wang Y, Gao J, Li X (2018) Deep adaptive feature embedding with local sample distributions for person re-identification. Pattern Recogn 73:275–288

    Article  Google Scholar 

  35. Wu L, Wang Y, Li X, Gao J (2018) Deep attention-based spatially recursive networks for fine-grained visual recognition. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2018.2813971

    Article  Google Scholar 

  36. Wu L, Wang Y, Li X, Gao J (2018) What-and-where to match: Deep spatially multiplicative integration networks for person re-identification. Pattern Recogn 76:727–738

    Article  Google Scholar 

  37. Yang J (2012) Top-down visual saliency via joint crf and dictionary learning. In: Computer vision and pattern recognition, pp 2296–2303

  38. Yang C, Zhang L, Lu H, Ruan X, Yang MH (2013) Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 3166–3173. https://doi.org/10.1109/CVPR.2013.407

  39. Zhang GX, Cheng MM, Hu SM, Martin RR (2009) A shape-preserving approach to image resizing. Comput Graph Forum 28(7):1897–1906. https://doi.org/10.1111/j.1467-8659.2009.01568.x

    Article  Google Scholar 

  40. Zhang J, Sclaroff S, Lin Z, Shen X, Price B, Mech R (2016) Minimum barrier salient object detection at 80 FPS. pp 1404–1412. https://doi.org/10.1109/ICCV.2015.165

  41. Zhang P, Wang D, Lu H, Wang H, Yin B (2017) Learning uncertain convolutional features for accurate saliency detection. In: 2017 IEEE International conference on computer vision (ICCV), pp 212–221. https://doi.org/10.1109/ICCV.2017.32

  42. Zhang H, Goodfellow I, Metaxas D, Odena A (2018) Self-Attention Generative Adversarial Networks. arXiv:1805.08318

  43. Zhang P, Wang L, Wang D, Lu H, Shen C (2018) Agile Amulet: Real-Time Salient Object Detection with Contextual Attention. arXiv:1802.06960

  44. Zhang X, Wang T, Qi J, Lu H, Wang G (2018) Progressive Attention Guided Recurrent Network for Salient Object Detection. In: Cvpr, pp. 714–722. https://doi.org/10.1109/CVPR.2018.00081. https://github.com/zhangxiaoning666/PAGR

  45. Zhu L, Klein DA, Frintrop S, Cao Z, Cremers AB (2014) A multisize superpixel approach for salient object detection based on multivariate normal distribution estimation. IEEE Trans Image Process 23(12):5094–5107. https://doi.org/10.1109/TIP.2014.2361024

    Article  MathSciNet  Google Scholar 

  46. Zhu W, Liang S, Wei Y, Sun J (2014) Saliency optimization from robust background detection. In: 2014 IEEE Conference on computer vision and pattern recognition. Columbus, OH, pp 2814–2821. https://doi.org/10.1109/CVPR.2014.360

Download references

Acknowledgments

This work was supported by the Science and Technology Development Plan of Jilin Province under Grant 20170204020GX, the National Science Foundation of China under Grant U1564211.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenhui Li.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, F., Li, W. & Guan, Y. Self-attention recurrent network for saliency detection. Multimed Tools Appl 78, 30793–30807 (2019). https://doi.org/10.1007/s11042-018-6591-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6591-3

Keywords