Abstract
We consider semantic image segmentation. Our method is inspired by Bayesian deep learning which improves image segmentation accuracy by modeling the uncertainty of the network output. In contrast to uncertainty, our method directly learns to predict the erroneous pixels of a segmentation network, which is modeled as a binary classification problem. It can speed up training comparing to the Monte Carlo integration often used in Bayesian deep learning. It also allows us to train a branch to correct the labels of erroneous pixels. Our method consists of three stages: (i) predict pixel-wise error probability of the initial result, (ii) redetermine new labels for pixels with high error probability, and (iii) fuse the initial result and the redetermined result with respect to the error probability. We formulate the error-pixel prediction problem as a classification task and employ an error-prediction branch in the network to predict pixel-wise error probabilities. We also introduce a detail branch to focus the training process on the erroneous pixels. We have experimentally validated our method on the Cityscapes and ADE20K datasets. Our model can be easily added to various advanced segmentation networks to improve their performance. Taking DeepLabv3+ as an example, our network can achieve 82.88% of mIoU on Cityscapes testing dataset and 45.73% on ADE20K validation dataset, improving corresponding DeepLabv3+ results by 0.74% and 0.13% respectively.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Everingham, M.; Eslami, S. M. A.; van Gool, L.; Williams, C. K. I.; Winn, J.; Zisserman, A. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision Vol. 111, No. 1, 98–136, 2015.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3213–3223, 2016.
Zhou, B. L.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 633–641, 2017.
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440, 2015.
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention. Lecture Notes in Computer Science, Vol. 9351. Navab, N.; Hornegger, J.; Wells, W.; Frangi, A. Eds. Springer Cham, 234–241, 2015.
Chen, L.-C.; Papandreou, G.; Schrofi, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6230–6239, 2017.
Li, H.; Xiong, P.; An, J.; Wang, L. Pyramid attention network for semantic segmentation. arXiv preprint arXiv:1805.10180, 2018.
Lin, G. S.; Milan, A.; Shen, C. H.; Reid, I. RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5168–5177, 2017.
Li, X.; Liu, Z.; Luo, P.; Loy, C. C.; Tang, X. Not all pixels are equal: Dificulty-aware semantic segmentation via deep layer cascade. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6459–6468, 2017.
Kendall, A.; Gal, Y. What uncertainties do we need in Bayesian deep learning for computer vision? In: Proceedings of the 31st Conference on Neural Information Processing Systems, 2017.
Chen, L. C.; Zhu, Y. K.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Computer Vision-ECCV 2018. Lecture Notes in Computer Science, Vol. 11211. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 833–851, 2018.
Guo, Y. M.; Liu, Y.; Georgiou, T.; Lew, M. S. A review of semantic segmentation using deep neural networks. International Journal of Multimedia Information Retrieval Vol. 7, No. 2, 87–93, 2018.
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 12, 2481–2495, 2017.
Ghiasi, G.; Fowlkes, C. C. Laplacian pyramid reconstruction and refinement for semantic segmentation. In: Computer Vision-ECCV 2016. Lecture Notes in Computer Science, Vol. 9907. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 519–534, 2016.
Peng, C.; Zhang, X. Y.; Yu, G.; Luo, G. M.; Sun, J. Large kernel matters—improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1743–1751, 2017.
Ding, H. H.; Jiang, X. D.; Shuai, B.; Liu, A. Q.; Wang, G. Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2393–2402, 2018.
Liu, W.; Rabinovich, A.; Berg, A. C. ParseNet: Looking wider to see better. arXiv preprint arXiv:1506.04579, 2015.
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7132–7141, 2018.
Chen, L. C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 4, 834–848, 2018.
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062, 2014.
Zhang, H.; Dana, K., Shi, J. P.; Zhang, Z. Y.; Wang, X. G.; Tyagi, A.; Agrawal, A. Context encoding for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7151–7160, 2018.
Sun, K.; Xiao, B.; Liu, D.; Wang, J. D. Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5686–5696, 2019.
Chen, L.-C.; Collins, M.; Zhu, Y.; Papandreou, G.; Zoph, B.; Schrofi, F.; Adam, H.; Shlens, J. Searching for efficient multi-scale architectures for dense image prediction. In: Proceedings of the 32nd Conference on Neural Information Processing Systems, 8713–8724, 2018.
Nekrasov, V.; Chen, H.; Shen, C. H.; Reid, I. Fast neural architecture search of compact semantic segmentation models via auxiliary cells. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9118–9127, 2019.
Liu, C. X.; Chen, L. C.; Schroff, F.; Adam, H.; Hua, W.; Yuille, A. L.; Fei-Fei, L. Auto-DeepLab: Hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 82–92, 2019.
Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 511–518, 2001.
Lienhart, R.; Maydt, J. An extended set of Haar-like features for rapid object detection. In: Proceedings of the International Conference on Image Processing, 2002.
Pang, J. H.; Sun, W. X.; Ren, J. S.; Yang, C. X.; Yan, Q. Cascade residual learning: A two-stage convolutional neural network for stereo matching. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 878–886, 2017.
Li, H. X.; Lin, Z.; Shen, X. H.; Brandt, J.; Hua, G. A convolutional neural network cascade for face detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5325–5334, 2015.
Iofie, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research Vol. 15, 1929–1958, 2014.
Kingma, D.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Wu, Y.; He, K. Group normalization. In: Computer Vision-ECCV 2018. Lecture Notes in Computer Science, Vol. 11217. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 3–19, 2018.
Tian, Z.; He, T.; Shen, C. H.; Yan, Y. L. Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3121–3130, 2019.
Yu, H.; Zhang, Z. N.; Qin, Z.; Wu, H.; Li, D. S.; Zhao, J.; Lu, X. Loss rank mining: A general hard example mining method for real-time detectors. In: Proceedings of the International Joint Conference on Neural Networks, 2018.
Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
Zhou, B.; Zhao, H.; Puig, X.; Xiao, T.; Fidler, S.; Barriuso, A.; Torralba, A. Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision Vol. 127, No. 3, 302–321, 2019.
Acknowledgements
We would like to thank the anonymous reviewers for their constructive comments. Weiwei Xu is partially supported by the National Natural Science Foundation of China (No. 61732016).
Author information
Authors and Affiliations
Corresponding author
Additional information
Lixue Gong received her M.S. degree from the College of Computer Science and Technology, Zhejiang University in 2020, and her B.S degree in digital media technology from Zhejiang University in 2017. Her research interests include image segmentation, image matting, and video enhancement.
Yiqun Zhang is currently a student in the College of Computer Science and Technology, Zhejiang University. She received her B.S. degree in digital media technology from Zhejiang University in 2019. Her research interests include image generation and segmentation.
Yunke Zhang is currently a Ph.D. candidate at Zhejiang University. He received his M.S. degree from Hangzhou Institute of Service Engineering, Hangzhou University in 2018, and his B.S. degree in software engineering from Zhengzhou University in 2015. His research interests include image and video matting and segmentation.
Yin Yang is an associate professor with the School of Computing, Clemson University. Previously, he was a faculty member with the Electrical and Computer Engineering Department of the University of New Mexico. He is still a research faculty member at UNM ECE and CS. He received his Ph.D. degree from the University of Texas at Dallas (with a David Daniel fellowship). He is a recipient of an NSF CRII award (2015) and a CAREER award (2019). His research aims to develop efficient and customized computing methods for challenging problems in graphics, animation, machine learning, vision, visualization, simulation, HCI, robotics, medicine, and other applied areas.
Weiwei Xu is currently a researcher at the State Key Lab of CAD&CG in Zhejiang University. He was a Qianjiang Professor at Hangzhou Normal University and a researcher in the Internet Graphics Group at Microsoft Research Asia from 2005 to 2012. He was a post-doc researcher at Ritsmeikan University in Japan for over one year. He received his Ph.D. degree in computer graphics from Zhejiang University, and B.S. and master degrees in computer science from Hohai University in 1996 and 1999, respectively.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Gong, L., Zhang, Y., Zhang, Y. et al. Erroneous pixel prediction for semantic image segmentation. Comp. Visual Media 8, 165–175 (2022). https://doi.org/10.1007/s41095-021-0235-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-021-0235-7