Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3655532.3655583acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicrsaConference Proceedingsconference-collections
research-article

Image Segmentation by Position-Edge-Aware Network

Published: 28 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Existing segmentation methods still suffer from problems with the loss of location information and edge details of segmented objects, resulting in the inability to efficiently model long-range data dependencies, further affecting the segmentation performance of the model. To address this issue, an edge- and position-aware adaptive attention block (EAPA) is designed. It can aggregate features along two spatial directions and then calculate offsets from the input features to obtain a boundary-enhanced feature map. The attention block captures cross-channel information while incorporating boundary details and position sensitivity, enabling the extraction of fine-grained details. Furthermore, to further improve the model's segmentation performance and the feature extraction capability of edge detail information. This paper presents an element-by-element weighted (EW) feature fusion module. This module combines high-level and low-level features to generate a fused feature map that contains rich semantic information and edge details. Experimental results demonstrate that our proposed network architecture with these two modules performs better in the Cityscapes dataset, achieving superior segmentation performance.

    References

    [1]
    Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yong-dong Zhang, and Jintao Li. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the 22nd ACM international conference on Multimedia, pages 157–166, 2014.
    [2]
    Markus Oberweger, Paul Wohlhart, and Vincent Lepetit. Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807, 2015.
    [3]
    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus En-zweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
    [4]
    Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 603–612, 2019.
    [5]
    Qibin Hou, Li Zhang, Ming-Ming Cheng, and Jiashi Feng. Strip pooling: Rethinking spatial pooling for scene parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4003–4012, 2020.
    [6]
    Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7132–7141, 2018.
    [7]
    Jongchan Park, Sanghyun Woo, Joon-Young Lee, and In So Kweon. Bam: Bottle-neck attention module. arXiv preprint arXiv:1807.06514, 2018.
    [8]
    Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision, pages 3–19, 2018.
    [9]
    Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015.
    [10]
    Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12):2481–2495, 2017.
    [11]
    Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017.
    [12]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9):1904–1916, 2015.
    [13]
    Hengshuang Zhao, Yi Zhang, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin, and Jiaya Jia. Psanet: Point-wise spatial attention network for scene parsing. In Proceedings of the European conference on computer vision, pages 267–283, 2018.
    [14]
    Li Zhang, Dan Xu, Anurag Arnab, and Philip HS Torr. Dynamic graph message passing networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3726–3735, 2020.
    [15]
    Yuhui Yuan, Xilin Chen, and Jingdong Wang. Object-contextual representations for semantic segmentation. In Proceedings of the European conference on computer vision, pages 173–190. Springer, 2020.
    [16]
    Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 43(10):3349–3364, 2020.
    [17]
    Meng-Hao Guo, Cheng-Ze Lu, Qibin Hou, Zhengning Liu, Ming-Ming Cheng, and Shi-Min Hu. Segnext: Rethinking convolutional attention design for semantic segmentation. arXiv preprint arXiv:2209.08575, 2022.
    [18]
    Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Andrea Vedaldi. Gather-excite: Exploiting feature context in convolutional neural networks. Advances in neural information processing systems, 31, 2018.
    [19]
    Drew Linsley, Dan Shiebler, Sven Eberhardt, and Thomas Serre. Learning what and where to attend. arXiv preprint arXiv:1805.08819, 2018.
    [20]
    Diganta Misra, Trikay Nalamada, Ajay Uppili Arasanipalai, and Qibin Hou. Rotate to attend: Convolutional triplet attention module. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pages 3139–3148, 2021.
    [21]
    Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7794–7803, 2018.
    [22]
    Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, and Han Hu. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE international conference on computer vision workshops, pages 0–0, 2019.
    [23]
    Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
    [24]
    Qibin Hou, Daquan Zhou, and Jiashi Feng. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 13713–13722, 2021.
    [25]
    Rohit Mohan and Abhinav Valada. Efficientps: Efficient panoptic segmentation. International Journal of Computer Vision, 129(5):1551–1579, 2021.
    [26]
    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
    [27]
    Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollár. Panoptic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9404–9413, 2019.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICRSA '23: Proceedings of the 2023 6th International Conference on Robot Systems and Applications
    September 2023
    335 pages
    ISBN:9798400708039
    DOI:10.1145/3655532
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Attention Mechanisms
    2. Panoptic Segmentation
    3. Semantic Segmentation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICRSA 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 4
      Total Downloads
    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media