Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3475402acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

End-to-end Boundary Exploration for Weakly-supervised Semantic Segmentation

Published: 17 October 2021 Publication History
  • Get Citation Alerts
  • Abstract

    It is full of challenges for weakly supervised semantic segmentation (WSSS) acquiring the pixel-level object location with only image-level annotations. Especially, the single-stage methods learn image- and pixel-level labels simultaneously to avoid complicated multi-stage computations and sophisticated training procedures. In this paper, we argue that using a single model to accomplish image- and pixel-level classification will fall into the balance of multi-target and consequently weakens the recognition capability. Because the image-level task tends to learn position-independent features, but the pixel-level task tends to be position-sensitive. Hence, we propose an effective encoder-decoder framework to explore object boundaries and solve the above dilemma. The encoder and decoder learn position-independent and position-sensitive features independently during the end-to-end training. In addition, a global soft pooling is suggested to suppress background pixels' activation for the encoder training and further improve the class activation map (CAM) performance. The edge annotations for the decoder training are synthesized by the high confidence CAMs, which do not requires extra supervision. The extensive experiments on the Pascal VOC12 dataset demonstrate that our method achieves state-of-the-art compared to the end-to-end approaches. It gets 63.6% and 65.7% mIoU scores on val and test sets respectively.

    References

    [1]
    Jiwoon Ahn, Sunghyun Cho, and Suha Kwak. 2019. Weakly supervised learning of instance segmentation with inter-pixel relations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2209--2218.
    [2]
    Jiwoon Ahn and Suha Kwak. 2018. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4981--4990.
    [3]
    Nikita Araslanov and Stefan Roth. 2020. Single-Stage Semantic Segmentation from Image Labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4253--4262.
    [4]
    Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 12 (2017), 2481--2495.
    [5]
    Amy Bearman, Olga Russakovsky, Vittorio Ferrari, and Li Fei-Fei. 2016. What's the point: Semantic segmentation with point supervision. In European conference on computer vision. Springer, 549--565.
    [6]
    Gedas Bertasius, Lorenzo Torresani, Stella X Yu, and Jianbo Shi. 2017. Convolutional random walk networks for semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 858--866.
    [7]
    Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, and Ming-Hsuan Yang. 2020. Weakly-Supervised Semantic Segmentation via Sub-Category Exploration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8991--9000.
    [8]
    Liyi Chen, Weiwei Wu, Chenchen Fu, Xiao Han, and Yuntao Zhang. 2020. Weakly Supervised Semantic Segmentation with Boundary Exploration. In European Conference on Computer Vision. Springer, 347--362.
    [9]
    Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 4 (2017), 834--848.
    [10]
    Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV). 801--818.
    [11]
    Bowen Cheng, Maxwell D Collins, Yukun Zhu, Ting Liu, Thomas S Huang, Hartwig Adam, and Liang-Chieh Chen. 2020. Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12475--12485.
    [12]
    Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision, Vol. 88, 2 (2010), 303--338.
    [13]
    M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. [n.d.]. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/ VOC/voc2012/workshop/index.html.
    [14]
    Junsong Fan, Zhaoxiang Zhang, Chunfeng Song, and Tieniu Tan. 2020. Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4283--4292.
    [15]
    Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014).
    [16]
    Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.
    [17]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
    [18]
    Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. 2019. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 603--612.
    [19]
    Zilong Huang, Xinggang Wang, Jiasi Wang, Wenyu Liu, and Jingdong Wang. 2018. Weakly-supervised semantic segmentation network with deep seeded region growing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7014--7023.
    [20]
    Peng-Tao Jiang, Qibin Hou, Yang Cao, Ming-Ming Cheng, Yunchao Wei, and Hong-Kai Xiong. 2019. Integral object mining via online attention accumulation. In Proceedings of the IEEE International Conference on Computer Vision. 2070--2079.
    [21]
    Anna Khoreva, Rodrigo Benenson, Jan Hosang, Matthias Hein, and Bernt Schiele. 2017. Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 876--885.
    [22]
    Alexander Kolesnikov and Christoph H Lampert. 2016. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In European conference on computer vision. Springer, 695--711.
    [23]
    Viveka Kulharia, Siddhartha Chandra, Amit Agrawal, Philip Torr, and Ambrish Tyagi. 2020. Box2seg: Attention weighted loss and discriminative feature learning for weakly supervised segmentation. In European Conference on Computer Vision. Springer, 290--308.
    [24]
    Jungbeom Lee, Eunji Kim, Sungmin Lee, Jangho Lee, and Sungroh Yoon. 2019. Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5267--5276.
    [25]
    Weide Liu, Chi Zhang, Guosheng Lin, Tzu-Yi HUNG, and Chunyan Miao. 2020. Weakly Supervised Segmentation with Maximum Bipartite Graph Matching. In Proceedings of the 28th ACM International Conference on Multimedia. 2085--2094.
    [26]
    Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.
    [27]
    Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision. 1520--1528.
    [28]
    Seong Joon Oh, Rodrigo Benenson, Anna Khoreva, Zeynep Akata, Mario Fritz, and Bernt Schiele. 2017. Exploiting saliency for object segmentation from image level labels. In 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, 5038--5047.
    [29]
    Pedro O Pinheiro and Ronan Collobert. 2015. From image-level to pixel-level labeling with convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1713--1721.
    [30]
    Mengyang Pu, Yaping Huang, Qingji Guan, and Qi Zou. 2018. GraphNet: Learning image pseudo annotations for weakly-supervised semantic segmentation. In Proceedings of the 26th ACM international conference on Multimedia. 483--491.
    [31]
    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.
    [32]
    Anirban Roy and Sinisa Todorovic. 2017. Combining bottom-up, top-down, and smoothness cues for weakly supervised image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3529--3538.
    [33]
    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision, Vol. 115, 3 (2015), 211--252.
    [34]
    Wataru Shimoda and Keiji Yanai. 2019. Self-supervised difference detection for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5208--5217.
    [35]
    Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
    [36]
    Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. 2019. Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3136--3145.
    [37]
    Meng Tang, Federico Perazzi, Abdelaziz Djelouah, Ismail Ben Ayed, Christopher Schroers, and Yuri Boykov. 2018. On regularized losses for weakly-supervised cnn segmentation. In Proceedings of the European Conference on Computer Vision (ECCV). 507--522.
    [38]
    Paul Vernaza and Manmohan Chandraker. 2017. Learning random-walk label propagation for weakly-supervised semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7158--7166.
    [39]
    Xiang Wang, Shaodi You, Xi Li, and Huimin Ma. 2018. Weakly-supervised semantic segmentation by iteratively mining common object features. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1354--1362.
    [40]
    Yude Wang, Jie Zhang, Meina Kan, Shiguang Shan, and Xilin Chen. 2020. Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12275--12284.
    [41]
    Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, and Shuicheng Yan. 2017. Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1568--1576.
    [42]
    Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie, Jiashi Feng, and Thomas S Huang. 2018. Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7268--7277.
    [43]
    Saining Xie and Zhuowen Tu. 2015. Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision. 1395--1403.
    [44]
    Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. Learning a discriminative feature network for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1857--1866.
    [45]
    Bingfeng Zhang, Jimin Xiao, Yunchao Wei, Mingjie Sun, and Kaizhu Huang. 2020. Reliability does matter: An end-to-end weakly supervised semantic segmentation approach. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12765--12772.
    [46]
    Xiaolin Zhang, Yunchao Wei, Jiashi Feng, Yi Yang, and Thomas S Huang. 2018. Adversarial complementary learning for weakly supervised object localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1325--1334.
    [47]
    Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2881--2890.
    [48]
    Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip HS Torr. 2015. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE international conference on computer vision. 1529--1537.
    [49]
    Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2921--2929.

    Cited By

    View all
    • (2024)Subclassified Loss: Rethinking Data Imbalance From Subclass Perspective for Semantic SegmentationIEEE Transactions on Intelligent Vehicles10.1109/TIV.2023.33253439:1(1547-1558)Online publication date: Jan-2024
    • (2024)FlipCAM: A feature-level flipping augmentation method for weakly supervised building extraction from high-resolution remote sensing imageryIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.3360276(1-1)Online publication date: 2024
    • (2024)Adjustable patch and feature prior token-based transformer for weakly supervised semantic segmentationInternational Journal of Computers and Applications10.1080/1206212X.2024.2333122(1-10)Online publication date: 15-Apr-2024
    • Show More Cited By

    Index Terms

    1. End-to-end Boundary Exploration for Weakly-supervised Semantic Segmentation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '21: Proceedings of the 29th ACM International Conference on Multimedia
      October 2021
      5796 pages
      ISBN:9781450386517
      DOI:10.1145/3474085
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 October 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. end-to-end
      2. image semantic segmentation
      3. weakly supervised

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MM '21
      Sponsor:
      MM '21: ACM Multimedia Conference
      October 20 - 24, 2021
      Virtual Event, China

      Acceptance Rates

      Overall Acceptance Rate 995 of 4,171 submissions, 24%

      Upcoming Conference

      MM '24
      The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)52
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Subclassified Loss: Rethinking Data Imbalance From Subclass Perspective for Semantic SegmentationIEEE Transactions on Intelligent Vehicles10.1109/TIV.2023.33253439:1(1547-1558)Online publication date: Jan-2024
      • (2024)FlipCAM: A feature-level flipping augmentation method for weakly supervised building extraction from high-resolution remote sensing imageryIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.3360276(1-1)Online publication date: 2024
      • (2024)Adjustable patch and feature prior token-based transformer for weakly supervised semantic segmentationInternational Journal of Computers and Applications10.1080/1206212X.2024.2333122(1-10)Online publication date: 15-Apr-2024
      • (2023)Learning Cross-Channel Representations for Semantic SegmentationIEEE Transactions on Multimedia10.1109/TMM.2022.315114525(2774-2787)Online publication date: 1-Jan-2023
      • (2023)Multi-Granularity Denoising and Bidirectional Alignment for Weakly Supervised Semantic SegmentationIEEE Transactions on Image Processing10.1109/TIP.2023.327591332(2960-2971)Online publication date: 2023
      • (2023)One Model Is Enough: Toward Multiclass Weakly Supervised Remote Sensing Image Semantic SegmentationIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2023.329024261(1-13)Online publication date: 2023

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media