Abstract
Feature maps in deep neural networks generally contain different semantics. Existing methods often omit their characteristics that may lead to sub-optimal results. In this paper, we propose a novel end-to-end deep saliency network which could effectively utilize multi-scale feature maps according to their characteristics. Shallow layers generally contain more local information, and deep layers have advantages in global semantics. Therefore, our network could generate elaborate saliency maps by exploiting the different semantics of feature maps in different layers. On one hand, local information of shallow layers is enhanced by a recurrent structure which shared convolution kernels at different time steps. On the other hand, global information of deep layers is utilized by a self-attention module, which generates attention weights for salient objects and backgrounds thus achieve better performance. Experimental results on four widely used datasets demonstrate that our method has advantages in performance over existing algorithms.
Similar content being viewed by others
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: A system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16. USENIX Association, pp265–283. http://dl.acm.org/citation.cfm?id=3026877.3026899
Achantay R, Hemamiz S, Estraday F, Su̇sstrunky S (2009) Frequency-tuned salient region detection. In: 2009 IEEE Computer society conference on computer vision and pattern recognition workshops, CVPR workshops 2009, pp 1597–1604. https://doi.org/10.1109/CVPRW.2009.5206596
Bi S, Li G, Yu Y (2014) Person re-identification using multiple experts with random subspaces. Int J Image Graph 2(2):151–157
Borji A, Frintrop S, Sihite DN, Itti L (2012) Adaptive object tracking by learning background context. In: IEEE Computer society conference on computer vision and pattern recognition workshops, pp 23–30. https://doi.org/10.1109/CVPRW.2012.6239191
Cheng M, Zhang F, Mitra N, Huang X, Hu S (2010) RepFinder: Finding Approximately Repeated Scene Elements for Image Editing. ACM Trans Graph TOG 29(4):1. https://doi.org/10.1145/1778765.1778820. http://discovery.ucl.ac.uk/1327991/
Cheng MM, Zhang GX, Mitra NJ, Huang X, Hu SM (2011) Global Contrast based Salient Region Detection, pp 409–416. https://doi.org/10.1109/CVPR.2011.5995344
Cheng MM, Hou QB, Zhang SH, Rosin PL (2017) Intelligent visual media processing:when graphics meets vision. J Comput Sci Technol 32(1):110–121
Guo C, Zhang L (2010) A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression. IEEE Trans Image Process 19(1):185–198. https://doi.org/10.1109/TIP.2009.2030969. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5223506
Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Teh YW, Titterington M (eds) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol 9. PMLR, Chia Laguna Resort, Sardinia, pp 249–256. http://proceedings.mlr.press/v9/glorot10a.html
Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PH (2018) Deeply Supervised Salient Object Detection with Short Connections. https://doi.org/10.1109/TPAMI.2018.2815688
Hua Y, Zhao Z, Tian H, Guo X, Cai A (2013) A probabilistic saliency model with memory-guided top-down cues for free-viewing. In: IEEE International conference on multimedia and expo, pp 1–6
Itti L, Koch C, Niebur E (1998) A Model of Saliency Based Visual Attention for Rapid Scene Analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259. https://doi.org/10.1016/S1053-5357(00)00088-3
Koch C, Ullman S (1985) Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiol 4(4):219–27. https://doi.org/10.1016/j.imavis.2008.02.004. http://www.ncbi.nlm.nih.gov/pubmed/3836989
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. Advances In Neural Information Processing Systems 1:1097–1105. https://doi.org/10.1016/j.protcy.2014.09.007
Kuen J, Wang Z, Wang G (2016) Recurrent attentional networks for saliency detection. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 3668–3677. https://doi.org/10.1109/CVPR.2016.399
Li Y, Hou X, Koch C, Rehg J, Yuille A (2014) The secrets of salient object segmentation, pp 4321–4328. https://doi.org/10.1109/CVPR.2014.43. http://www.stat.ucla.edu/yuille/Pubs10_12/LiHouKochRehgYuille.pdf
Li G, Yu Y (2016) Deep contrast learning for salient object detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, vol 2016, pp 478–487
Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. IEEE Computer Society, Washington, pp 3367–3375. https://doi.org/10.1109/CVPR.2015.7298958. arXiv:https://arxiv.org/abs/1704.07709
Liu N, Han J (2016) DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 678–686. https://doi.org/10.1109/CVPR.2016.80. http://ieeexplore.ieee.org/document/7780449/
Luo Z, Mishra A, Achkar A, Eichel J, Li S, Jodoin P (2017) Non-local deep features for salient object detection. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 6593–6601. https://doi.org/10.1109/CVPR.2017.698
Ma YF, Lu L, Zhang HJ, Li M (2002) A user attention model for video summarization. In: Proceedings of the Tenth ACM International Conference on Multimedia, MULTIMEDIA ’02. ACM, New York, pp 533–542. https://doi.org/10.1145/641007.641116
Simonyan K, Zisserman A (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. pp. 1–14. https://doi.org/10.1016/j.infsof.2008.09.005. arXiv:1409.1556
Wang Y, Zhao Q (2015) Superpixel tracking via graph-based semi-supervised svm and supervised saliency detection. In: IEEE International conference on multimedia and expo, pp 1–6
Wang Y, Lin X, Wu L, Zhang W, Zhang Q, Huang X (2015) Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans Image Process 24(11):3939–3949. https://doi.org/10.1109/TIP.2015.2457339
Wang T, Zhang L, Lu H, Sun C, Qi J (2016) Kernelized subspace ranking for saliency detection. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 450–466
Wang Y, Zhang W, Wu L, Lin X, Fang M, Pan S (2016) Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16. AAAI Press, pp 2153–2159. http://dl.acm.org/citation.cfm?id=3060832.3060922
Wang L, Wang L, Lu H, Zhang P, Xiang R (2016) Saliency detection with recurrent fully convolutional networks. In: European conference on computer vision, pp 825–841
Wang T, Borji A, Zhang L, Zhang P, Lu H (2017) A stagewise refinement model for detecting salient objects in images. 2017 IEEE International conference on computer vision (ICCV), pp 4039–4048. https://doi.org/10.1109/ICCV.2017.433
Wang Y, Lin X, Wu L, Zhang W (2017) Effective multi-query expansions: Collaborative deep networks for robust landmark retrieval. IEEE Trans Image Process 26(3):1393–1404. https://doi.org/10.1109/TIP.2017.2655449
Wang Y, Wu L (2018) Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering, vol 103. https://doi.org/10.1016/j.neunet.2018.03.006. http://www.sciencedirect.com/science/article/pii/S0893608018300911
Wang Y, Wu L, Lin X, Gao J (2018) Multiview spectral clustering via structured low-rank matrix factorization. In: IEEE Transactions on Neural Networks and Learning Systems, pp 1–11. https://doi.org/10.1109/TNNLS.2017.2777489
Wang Y, Zhang W, Wu L, Lin X, Zhao X (2017) Unsupervised metric fusion over multiview data by graph random walk-based cross-view diffusion. IEEE Trans Neural Netw Learn Syst 28(1):57–70. https://doi.org/10.1109/TNNLS.2015.2498149
Wu L, Wang Y, Gao J, Li X (2018) Deep adaptive feature embedding with local sample distributions for person re-identification. Pattern Recogn 73:275–288
Wu L, Wang Y, Li X, Gao J (2018) Deep attention-based spatially recursive networks for fine-grained visual recognition. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2018.2813971
Wu L, Wang Y, Li X, Gao J (2018) What-and-where to match: Deep spatially multiplicative integration networks for person re-identification. Pattern Recogn 76:727–738
Yang J (2012) Top-down visual saliency via joint crf and dictionary learning. In: Computer vision and pattern recognition, pp 2296–2303
Yang C, Zhang L, Lu H, Ruan X, Yang MH (2013) Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 3166–3173. https://doi.org/10.1109/CVPR.2013.407
Zhang GX, Cheng MM, Hu SM, Martin RR (2009) A shape-preserving approach to image resizing. Comput Graph Forum 28(7):1897–1906. https://doi.org/10.1111/j.1467-8659.2009.01568.x
Zhang J, Sclaroff S, Lin Z, Shen X, Price B, Mech R (2016) Minimum barrier salient object detection at 80 FPS. pp 1404–1412. https://doi.org/10.1109/ICCV.2015.165
Zhang P, Wang D, Lu H, Wang H, Yin B (2017) Learning uncertain convolutional features for accurate saliency detection. In: 2017 IEEE International conference on computer vision (ICCV), pp 212–221. https://doi.org/10.1109/ICCV.2017.32
Zhang H, Goodfellow I, Metaxas D, Odena A (2018) Self-Attention Generative Adversarial Networks. arXiv:1805.08318
Zhang P, Wang L, Wang D, Lu H, Shen C (2018) Agile Amulet: Real-Time Salient Object Detection with Contextual Attention. arXiv:1802.06960
Zhang X, Wang T, Qi J, Lu H, Wang G (2018) Progressive Attention Guided Recurrent Network for Salient Object Detection. In: Cvpr, pp. 714–722. https://doi.org/10.1109/CVPR.2018.00081. https://github.com/zhangxiaoning666/PAGR
Zhu L, Klein DA, Frintrop S, Cao Z, Cremers AB (2014) A multisize superpixel approach for salient object detection based on multivariate normal distribution estimation. IEEE Trans Image Process 23(12):5094–5107. https://doi.org/10.1109/TIP.2014.2361024
Zhu W, Liang S, Wei Y, Sun J (2014) Saliency optimization from robust background detection. In: 2014 IEEE Conference on computer vision and pattern recognition. Columbus, OH, pp 2814–2821. https://doi.org/10.1109/CVPR.2014.360
Acknowledgments
This work was supported by the Science and Technology Development Plan of Jilin Province under Grant 20170204020GX, the National Science Foundation of China under Grant U1564211.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sun, F., Li, W. & Guan, Y. Self-attention recurrent network for saliency detection. Multimed Tools Appl 78, 30793–30807 (2019). https://doi.org/10.1007/s11042-018-6591-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6591-3