research-article

End-to-end Boundary Exploration for Weakly-supervised Semantic Segmentation

Authors:

Shancheng Fang,

Jianlong TanAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 2381 - 2390

https://doi.org/10.1145/3474085.3475402

Published: 17 October 2021 Publication History

Abstract

It is full of challenges for weakly supervised semantic segmentation (WSSS) acquiring the pixel-level object location with only image-level annotations. Especially, the single-stage methods learn image- and pixel-level labels simultaneously to avoid complicated multi-stage computations and sophisticated training procedures. In this paper, we argue that using a single model to accomplish image- and pixel-level classification will fall into the balance of multi-target and consequently weakens the recognition capability. Because the image-level task tends to learn position-independent features, but the pixel-level task tends to be position-sensitive. Hence, we propose an effective encoder-decoder framework to explore object boundaries and solve the above dilemma. The encoder and decoder learn position-independent and position-sensitive features independently during the end-to-end training. In addition, a global soft pooling is suggested to suppress background pixels' activation for the encoder training and further improve the class activation map (CAM) performance. The edge annotations for the decoder training are synthesized by the high confidence CAMs, which do not requires extra supervision. The extensive experiments on the Pascal VOC12 dataset demonstrate that our method achieves state-of-the-art compared to the end-to-end approaches. It gets 63.6% and 65.7% mIoU scores on val and test sets respectively.

References

[1]

Jiwoon Ahn, Sunghyun Cho, and Suha Kwak. 2019. Weakly supervised learning of instance segmentation with inter-pixel relations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2209--2218.

[2]

Jiwoon Ahn and Suha Kwak. 2018. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4981--4990.

[3]

Nikita Araslanov and Stefan Roth. 2020. Single-Stage Semantic Segmentation from Image Labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4253--4262.

[4]

Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 12 (2017), 2481--2495.

[5]

Amy Bearman, Olga Russakovsky, Vittorio Ferrari, and Li Fei-Fei. 2016. What's the point: Semantic segmentation with point supervision. In European conference on computer vision. Springer, 549--565.

[6]

Gedas Bertasius, Lorenzo Torresani, Stella X Yu, and Jianbo Shi. 2017. Convolutional random walk networks for semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 858--866.

[7]

Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, and Ming-Hsuan Yang. 2020. Weakly-Supervised Semantic Segmentation via Sub-Category Exploration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8991--9000.

[8]

Liyi Chen, Weiwei Wu, Chenchen Fu, Xiao Han, and Yuntao Zhang. 2020. Weakly Supervised Semantic Segmentation with Boundary Exploration. In European Conference on Computer Vision. Springer, 347--362.

[9]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 4 (2017), 834--848.

[10]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV). 801--818.

Digital Library

[11]

Bowen Cheng, Maxwell D Collins, Yukun Zhu, Ting Liu, Thomas S Huang, Hartwig Adam, and Liang-Chieh Chen. 2020. Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12475--12485.

[12]

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision, Vol. 88, 2 (2010), 303--338.

Digital Library

[13]

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. [n.d.]. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/ VOC/voc2012/workshop/index.html.

[14]

Junsong Fan, Zhaoxiang Zhang, Chunfeng Song, and Tieniu Tan. 2020. Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4283--4292.

[15]

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014).

[16]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.

[17]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[18]

Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. 2019. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 603--612.

[19]

Zilong Huang, Xinggang Wang, Jiasi Wang, Wenyu Liu, and Jingdong Wang. 2018. Weakly-supervised semantic segmentation network with deep seeded region growing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7014--7023.

[20]

Peng-Tao Jiang, Qibin Hou, Yang Cao, Ming-Ming Cheng, Yunchao Wei, and Hong-Kai Xiong. 2019. Integral object mining via online attention accumulation. In Proceedings of the IEEE International Conference on Computer Vision. 2070--2079.

[21]

Anna Khoreva, Rodrigo Benenson, Jan Hosang, Matthias Hein, and Bernt Schiele. 2017. Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 876--885.

[22]

Alexander Kolesnikov and Christoph H Lampert. 2016. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In European conference on computer vision. Springer, 695--711.

[23]

Viveka Kulharia, Siddhartha Chandra, Amit Agrawal, Philip Torr, and Ambrish Tyagi. 2020. Box2seg: Attention weighted loss and discriminative feature learning for weakly supervised segmentation. In European Conference on Computer Vision. Springer, 290--308.

[24]

Jungbeom Lee, Eunji Kim, Sungmin Lee, Jangho Lee, and Sungroh Yoon. 2019. Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5267--5276.

[25]

Weide Liu, Chi Zhang, Guosheng Lin, Tzu-Yi HUNG, and Chunyan Miao. 2020. Weakly Supervised Segmentation with Maximum Bipartite Graph Matching. In Proceedings of the 28th ACM International Conference on Multimedia. 2085--2094.

Digital Library

[26]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.

[27]

Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision. 1520--1528.

Digital Library

[28]

Seong Joon Oh, Rodrigo Benenson, Anna Khoreva, Zeynep Akata, Mario Fritz, and Bernt Schiele. 2017. Exploiting saliency for object segmentation from image level labels. In 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, 5038--5047.

[29]

Pedro O Pinheiro and Ronan Collobert. 2015. From image-level to pixel-level labeling with convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1713--1721.

[30]

Mengyang Pu, Yaping Huang, Qingji Guan, and Qi Zou. 2018. GraphNet: Learning image pseudo annotations for weakly-supervised semantic segmentation. In Proceedings of the 26th ACM international conference on Multimedia. 483--491.

Digital Library

[31]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.

[32]

Anirban Roy and Sinisa Todorovic. 2017. Combining bottom-up, top-down, and smoothness cues for weakly supervised image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3529--3538.

[33]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision, Vol. 115, 3 (2015), 211--252.

Digital Library

[34]

Wataru Shimoda and Keiji Yanai. 2019. Self-supervised difference detection for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5208--5217.

[35]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[36]

Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang. 2019. Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3136--3145.

[37]

Meng Tang, Federico Perazzi, Abdelaziz Djelouah, Ismail Ben Ayed, Christopher Schroers, and Yuri Boykov. 2018. On regularized losses for weakly-supervised cnn segmentation. In Proceedings of the European Conference on Computer Vision (ECCV). 507--522.

[38]

Paul Vernaza and Manmohan Chandraker. 2017. Learning random-walk label propagation for weakly-supervised semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7158--7166.

[39]

Xiang Wang, Shaodi You, Xi Li, and Huimin Ma. 2018. Weakly-supervised semantic segmentation by iteratively mining common object features. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1354--1362.

[40]

Yude Wang, Jie Zhang, Meina Kan, Shiguang Shan, and Xilin Chen. 2020. Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12275--12284.

[41]

Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, and Shuicheng Yan. 2017. Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1568--1576.

[42]

Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie, Jiashi Feng, and Thomas S Huang. 2018. Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7268--7277.

[43]

Saining Xie and Zhuowen Tu. 2015. Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision. 1395--1403.

Digital Library

[44]

Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. Learning a discriminative feature network for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1857--1866.

[45]

Bingfeng Zhang, Jimin Xiao, Yunchao Wei, Mingjie Sun, and Kaizhu Huang. 2020. Reliability does matter: An end-to-end weakly supervised semantic segmentation approach. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12765--12772.

[46]

Xiaolin Zhang, Yunchao Wei, Jiashi Feng, Yi Yang, and Thomas S Huang. 2018. Adversarial complementary learning for weakly supervised object localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1325--1334.

[47]

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2881--2890.

[48]

Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip HS Torr. 2015. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE international conference on computer vision. 1529--1537.

Digital Library

[49]

Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2921--2929.

Cited By

Qiu SCheng XLu HZhang HWan RXue XPu J(2024)Subclassified Loss: Rethinking Data Imbalance From Subclass Perspective for Semantic SegmentationIEEE Transactions on Intelligent Vehicles10.1109/TIV.2023.33253439:1(1547-1558)Online publication date: Jan-2024
https://doi.org/10.1109/TIV.2023.3325343
Zhang XSu QXiao PWang WLi ZHe G(2024)FlipCAM: A feature-level flipping augmentation method for weakly supervised building extraction from high-resolution remote sensing imageryIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.3360276(1-1)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3360276
Li LZhang HXie GBai Y(2024)Adjustable patch and feature prior token-based transformer for weakly supervised semantic segmentationInternational Journal of Computers and Applications10.1080/1206212X.2024.2333122(1-10)Online publication date: 15-Apr-2024
https://doi.org/10.1080/1206212X.2024.2333122
Show More Cited By

Index Terms

End-to-end Boundary Exploration for Weakly-supervised Semantic Segmentation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation

Recommendations

Learning pseudo labels for semi-and-weakly supervised semantic segmentation
Highlights
- We improve the semi-and-weakly supervised semantic segmentation via learning high-quality pseudo labels.
Abstract
In this paper, we aim to tackle semi-and-weakly supervised semantic segmentation (SWSSS), where many image-level classification labels and a few pixel-level annotations are available. We believe the most crucial point for solving SWSSS ...
Weakly Supervised Random Forest for Multi-Label Image Clustering and Segmentation
ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

Clustering is a useful statistical tool in data mining and computer vision. Supervised information is introduced to improve the clustering performance. However, labeling each piece of data accurately is extremely expensive when the amount of data is ...
Weakly Supervised Semantic Segmentation with Boundary Exploration
Computer Vision – ECCV 2020
Abstract
Weakly supervised semantic segmentation with image-level labels has attracted a lot of attention recently because these labels are already available in most datasets. To obtain semantic segmentation under weak supervision, this paper presents a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

China Postdoctoral Science Foundation
National Key Research and Development Program of China
National Natural Science Foundation of China

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
386
Total Downloads

Downloads (Last 12 months)52
Downloads (Last 6 weeks)5

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Qiu SCheng XLu HZhang HWan RXue XPu J(2024)Subclassified Loss: Rethinking Data Imbalance From Subclass Perspective for Semantic SegmentationIEEE Transactions on Intelligent Vehicles10.1109/TIV.2023.33253439:1(1547-1558)Online publication date: Jan-2024
https://doi.org/10.1109/TIV.2023.3325343
Zhang XSu QXiao PWang WLi ZHe G(2024)FlipCAM: A feature-level flipping augmentation method for weakly supervised building extraction from high-resolution remote sensing imageryIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.3360276(1-1)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3360276
Li LZhang HXie GBai Y(2024)Adjustable patch and feature prior token-based transformer for weakly supervised semantic segmentationInternational Journal of Computers and Applications10.1080/1206212X.2024.2333122(1-10)Online publication date: 15-Apr-2024
https://doi.org/10.1080/1206212X.2024.2333122
Ma LXie HLiu CZhang Y(2023)Learning Cross-Channel Representations for Semantic SegmentationIEEE Transactions on Multimedia10.1109/TMM.2022.315114525(2774-2787)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3151145
Chen TYao YTang J(2023)Multi-Granularity Denoising and Bidirectional Alignment for Weakly Supervised Semantic SegmentationIEEE Transactions on Image Processing10.1109/TIP.2023.327591332(2960-2971)Online publication date: 2023
https://doi.org/10.1109/TIP.2023.3275913
Li ZZhang XXiao P(2023)One Model Is Enough: Toward Multiclass Weakly Supervised Remote Sensing Image Semantic SegmentationIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2023.329024261(1-13)Online publication date: 2023
https://doi.org/10.1109/TGRS.2023.3290242

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents