PPNet : pooling position attention network for semantic segmentation

Xu, Haixia; Wang, Wei; Wang, Shuailong; Zhou, Wei; Chen, Qi; Peng, Wei

doi:10.1007/s11042-023-16230-y

PPNet : pooling position attention network for semantic segmentation

1230: Sentient Multimedia Systems and Visual Intelligence
Published: 02 September 2023

Volume 83, pages 37007–37023, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Haixia Xu ORCID: orcid.org/0000-0001-8587-7044¹,
Wei Wang¹,
Shuailong Wang¹,
Wei Zhou¹,
Qi Chen¹ &
…
Wei Peng¹

211 Accesses
Explore all metrics

Abstract

Semantic segmentation with attention module has made great progress in many computer vision tasks. However, attention modules ignore some boundary information. To explore a more comprehensive map of context features, we propose a pooling position attention network (PPNet) for semantic segmentation. Based on the Encoder-Decoder structure, we import attention modules into the encoder to enhance the correlation between deep information. Pooling cross attention module (PCAM) aims to weight deep semantic information and expands the feature recognition area, and pooling position attention module (PPAM) calculates the weighted features to generate features with strong semantic information. Finally, the enhanced deep features and shallow features are fused by decoder to enhance the dependency between pixels and to achieve better semantic segmentation. Experiments show that of our proposed PPNet is superior to other state-of-the-art models in the performance of segmentation accuracy on datasets PACSCAL VOC 2012 and Cityscapes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic segmentation based on double pyramid network with improved global attention mechanism

Article 14 February 2023

Position attention optimized deep semantic segmentation

Article 13 September 2023

Fully convolutional network with attention modules for semantic segmentation

Article 02 January 2021

Data Availability Statement

The data underlying this article will be shared on reasonable request to the corresponding author.

References

Otsu NA (1979) Threshold selection method from gray-level histograms. In: IEEE Trans Syst Man Cybern 9(1):62–66
Bezdek JC, Ehrlich R, Full W (1984) FCM: The fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203
Ng HP, Ong SH, Foong KWC et al (2006) Medical image segmentation using k-means clustering and improved watershed algorithm. In: 2006 IEEE southwest symposium on image analysis and interpretation. IEEE 2006: 61–65
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3431–3440
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2881–2890
Fu J, Liu J, Tian H, et al (2019) Dual attention network for scene segmentation.In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3146–3154
Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7794–7803
Huang Z, Wang X, Huang L, et al (2019) Ccnet: Criss-cross attention for semantic segmentation.In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 603–612
Mostajabi M, Yadollahpour P, Shakhnarovich G (2015) Feedforward semantic segmentation with zoom-out features. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3376–3385
Ghiasi G, Fowlkes CC (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation. In: European conference on computer vision. Springer, Cham, pp 519–534
Kreso I, Causevic D, Krapac J et al (2016) Convolutional scale invariance for semantic segmentation. In: German Conference on Pattern Recognition. Springer, Cham, pp 64–75
Liu Z, Li X, Luo P, et al (2015) Semantic image segmentation via deep parsing network. In: Proceedings of the IEEE international conference on computer vision. 2015: 1377–1385
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122
Lin G, Shen C, Van Den Hengel A et al (2016) Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3194–3203
Yuan Y, Chen X, Wang J (2020) Object-contextual representations for semantic segmentation. In: European conference on computer vision. Springer, Cham, pp 173–190
Zheng S, Jayasumana S, Romera-Paredes B et al (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision. 2015: 1529–1537
Raviteja Vemulapalli, Oncel Tuzel, Ming-Yu Liu et al (2016) Gaussian conditional random field network for semantic segmentation. In: IEEE Conf. On Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, Jun. 26-Jul.1, pp 3224–3233
Li X, Meng L, Tan Y et al (2021) Deep semantic segmentation-based multiple description coding. Multimedia Tools Appl 80(7):10323–10337
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision. pp 1520–1528
Gao S, Cheng M M, Zhao K et al (2019) Res2net: A new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell
Chen LC, Papandreou G, Kokkinos I et al (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062
Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen LC, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587
Chen LC, Zhu Y, Papandreou G et al (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV). pp 801–818
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv: arXiv:1409.0473
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 234–241
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
Lin G, Milan A, Shen C et al (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1925–1934
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7132–7141
Zhao H, Zhang Y, Liu S et al (2018) Psanet: Point-wise spatial attention network for scene parsing. In: Proceedings of the European Conference on Computer Vision (ECCV). pp 267–283
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770–778
Zhou B, Khosla A, Lapedriza A et al (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. x: 2921–2929
Everingham M, Van Gool L, Williams CKI et al (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Cordts M, Omran M, Ramos S et al (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3213–3223
Wu T, Huang J, Gao G et al (2021) Embedded discriminative attention mechanism for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 16765–16774
Fan M, Lai S, Huang J et al (2021) Rethinking bisenet for real-time semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 9716–9725

Download references

Acknowledgements

This work was supported in part by Key Program of Hunan Provincial Department of Education ( 22A0127), and by Key Laboratory Fund Project (No.2023ICIP07, No.2023ICIP03), and in part by the Natural Science Foundation of China (No.62003288).

Author information

Authors and Affiliations

School of Automation and Electronic Information, XiangTan University, XiangTan, China
Haixia Xu, Wei Wang, Shuailong Wang, Wei Zhou, Qi Chen & Wei Peng

Authors

Haixia Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shuailong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Qi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wei Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haixia Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, H., Wang, W., Wang, S. et al. PPNet : pooling position attention network for semantic segmentation. Multimed Tools Appl 83, 37007–37023 (2024). https://doi.org/10.1007/s11042-023-16230-y

Download citation

Received: 20 May 2022
Revised: 15 May 2023
Accepted: 04 July 2023
Published: 02 September 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11042-023-16230-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PPNet : pooling position attention network for semantic segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic segmentation based on double pyramid network with improved global attention mechanism

Position attention optimized deep semantic segmentation

Fully convolutional network with attention modules for semantic segmentation

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

PPNet : pooling position attention network for semantic segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic segmentation based on double pyramid network with improved global attention mechanism

Position attention optimized deep semantic segmentation

Fully convolutional network with attention modules for semantic segmentation

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation