research-article

Semi-Supervised Convolutional Vision Transformer with Bi-Level Uncertainty Estimation for Medical Image Segmentation

Authors:

Yefeng ZhengAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 5214 - 5222

https://doi.org/10.1145/3581783.3611821

Published: 27 October 2023 Publication History

Abstract

Semi-supervised learning (SSL) has attracted much attention in the field of medical image segmentation, which enables to alleviate the heavy burden of labelling pixel-wise annotation by extracting knowledge from unlabeled data. The existing methods basically benefit from the success of convolutional neural networks (CNNs) by keeping consistency of the predictions under small perturbations imposed on the networks or inputs. Two main concerns arise when learning such a paradigm: (1) CNNs tend to retain discriminative local features, neglecting global dependency and thus leading to inaccurate localization; (2) CNNs omit reliable feature-level and pixel-level information, resulting in sketchy pseudo-labels, especially around the confusing boundary. In this paper, we revisit the model of semi-supervised learning and develop a novel CNN-Transformer learning framework that allows for effective segmentation of medical images by producing complementary and reliable features and pseudo-label with bi-level uncertainty. Motivated by the uncertainty estimation to gain insight on feature discrimination, we explore the statistical and geometrical properties of features on network optimization and thus launching an alignment method in a more accurate and stable way. We attach equal significance to pixel-level uncertainty estimation for alleviating the influence of unreliable pseudo-labels in the training progress and advocating the reliability of predictions. Experimental results show that our method significantly surpasses existing semi-supervised approaches on two public medical image segmentation datasets.

References

[1]

A.Tarvainen and H.Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems, 30, 2017.

[2]

Wenjia Bai, Ozan Oktay, Matthew Sinclair, Hideaki Suzuki, Martin Rajchl, Giacomo Tarroni, Ben Glocker, Andrew King, Paul M Matthews, and Daniel Rueckert. Semi-supervised learning for network-based cardiac mr image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 253--260. Springer, 2017.

Digital Library

[3]

H. Basak, R. Bhattacharya, R. Hussain, and A. Chatterjee. An embarrassingly simple consistency regularization method for semi-supervised medical image segmentation. arXiv preprint arXiv:2202.00677, 2022.

[4]

O. Bernard, A. Lalande, C. Zotti, F. Cervenansky, X. Yang, P. Heng, I. Cetin, K. Lekadir, O. Camara, and M. Ballester. Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE transactions on medical imaging, 37(11):2514--2525, 2018.

[5]

H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang. Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537, 2021.

[6]

Xuyang Cao, Houjin Chen, Yanfeng Li, Yahui Peng, Shu Wang, and Lin Cheng. Uncertainty aware temporal-ensembling model for semisupervised abusmass segmentation. volume 40, pages 431--443. IEEE, 2020.

[7]

Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.

[8]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. volume 40, pages 834--848. IEEE, 2017.

[9]

X. Chen, Y. Yuan, G. Zeng, and J. Wang. Semi-supervised semantic segmentation with cross pseudo supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2613--2622, 2021.

[10]

Noel Codella, Veronica Rotemberg, Philipp Tschandl, M Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368, 2019.

[11]

Jun Dan, Tao Jin, Hao Chi, Shunjie Dong, and Yixuan Shen. Uncertainty-guided joint unbalanced optimal transport for unsupervised domain adaptation. Neural Computing and Applications, pages 1--17, 2022.

[12]

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, and S. Gelly. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.

[13]

Wenlong Hang, Wei Feng, Shuang Liang, Lequan Yu, QiongWang, Kup-Sze Choi, and Jing Qin. Local and global structure-aware entropy regularized mean teacher model for 3d left atrium segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 562--571. Springer, 2020.

Digital Library

[14]

Huimin Huang, Lanfen Lin, Ruofeng Tong, Hongjie Hu, Qiaowei Zhang, Yutaro Iwamoto, Xianhua Han, Yen-Wei Chen, and Jian Wu. U-net 3: A full-scale connected u-net for medical image segmentation. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1055--1059. IEEE, 2020.

[15]

Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242, 2016.

[16]

Xiaomeng Li, Lequan Yu, Hao Chen, Chi-Wing Fu, and Pheng-Ann Heng. Semi-supervised skin lesion segmentation via transformation consistent self-ensembling model. arXiv preprint arXiv:1808.03887, 2018.

[17]

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sanchez. A survey on deep learning in medical image analysis. Medical Image Analysis, 42:60--88, 2017.

[18]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012--10022, 2021.

[19]

X. Luo, M. Hu, T.Song, G. Wang, and S. Zhang. Semi-supervised medical image segmentation via cross teaching between cnn and transformer. arXiv preprint arXiv:2112.04894, 2021.

[20]

X. Luo, W. Liao, J. Chen, T. Song, Y. Chen, S. Zhang, N. Chen, G. Wang, and S. Zhang. Efficient semi-supervised gross target volume of nasopharyngeal carcinoma segmentation via uncertainty rectified pyramid consistency. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 318--329, 2021.

Digital Library

[21]

Dong Nie, Yaozong Gao, Li Wang, and Dinggang Shen. Asdnet: attention based semi-supervised deep networks for medical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 370--378. Springer, 2018.

Digital Library

[22]

Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, et al. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999, 2018.

[23]

Y. Ouali, C. Hudelot, and M. Tami. Semi-supervised semantic segmentation with cross-consistency training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12674--12684, 2020.

[24]

Yassine Ouali, Céline Hudelot, and Myriam Tami. Semi-supervised semantic segmentation with cross-consistency training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12674--12684, 2020.

[25]

Zhiliang Peng, Wei Huang, Shanzhi Gu, Lingxi Xie, Yaowei Wang, Jianbin Jiao, and Qixiang Ye. Conformer: Local features coupling global representations for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 367--376, 2021.

[26]

S. Qiao, W. Shen, Z. Zhang, B. Wang, and A. Yuille. Deep co-training for semi-supervised image recognition. In Proceedings of the European Conference on Computer Vision, pages 135--152, 2018.

Digital Library

[27]

O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234--241. Springer, 2015.

[28]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017.

[29]

V. Verma, K. Kawaguchi, A. Lamb, J. Kannala, Y. Bengio, and D. Lopez-Paz. Interpolation consistency training for semi-supervised learning. arXiv preprint arXiv:1903.03825, 2019.

[30]

T. Vu, H. Jain, and M. Bucher. Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2517--2526, 2019.

[31]

Yixin Wang, Yao Zhang, Jiang Tian, Cheng Zhong, Zhongchao Shi, Yang Zhang, and Zhiqiang He. Double-uncertainty weighted method for semi-supervised learning. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 542--551. Springer, 2020.

Digital Library

[32]

Ziyang Wang, Jian-Qing Zheng, and Irina Voiculescu. An uncertainty aware transformer for mri cardiac semantic segmentation via mean teachers. In Medical Image Understanding and Analysis: 26th Annual Conference, MIUA 2022, Cambridge, UK, July 27-29, 2022, Proceedings, pages 494--507. Springer, 2022.

Digital Library

[33]

Y. Wu, Z. Wu, Q. Wu, Z. Ge, and J. Cai. Exploring smoothness and class-separation for semi-supervised medical image segmentation. arXiv preprint arXiv:2203.01324, 2022.

[34]

Yingda Xia, Fengze Liu, Dong Yang, Jinzheng Cai, Lequan Yu, Zhuotun Zhu, Daguang Xu, Alan Yuille, and Holger Roth. 3d semi-supervised learning with uncertainty-aware multi-view co-training. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3646--3655, 2020.

[35]

Fan Yang, Qiang Zhai, Xin Li, Rui Huang, Ao Luo, Hong Cheng, and Deng-Ping Fan. Uncertainty-guided transformer reasoning for camouflaged object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4146--4155, 2021.

[36]

L. Yu, S. Wang, S. Li, C. Fu, and P. Heng. Uncertainty-aware selfensembling model for semi-supervised 3d left atrium segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 605--613. Springer, 2019.

[37]

Y. Zhang, L. Yang, J. Chen, M. Fredericksen, D. Hughes, and D. Chen. Deep adversarial networks for biomedical image segmentation utilizing unannotated images. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention, pages 408--416. Springer, 2017.

Digital Library

[38]

Han Zheng, Lanfen Lin, Hongjie Hu, Qiaowei Zhang, Qingqing Chen, Yutaro Iwamoto, Xianhua Han, Yen-Wei Chen, Ruofeng Tong, and Jian Wu. Semi-supervised segmentation of liver using adversarial learning with deep atlas prior. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 148--156. Springer, 2019.

Digital Library

[39]

Xu Zheng, Yunhao Luo, Hao Wang, Chong Fu, and Lin Wang. Transformer-cnn cohort: semi-supervised semantic segmentation by the best of both students. arXiv preprint arXiv:2209.02178, 2022.

[40]

Yuyin Zhou, Yan Wang, Peng Tang, Song Bai, Wei Shen, Elliot Fishman, and Alan Yuille. Semi-supervised 3d abdominal multi-organ segmentation via deep multi-planar co-training. In 2019 IEEE Winter Conference on Applications of Computer Vision, pages 121--140. IEEE, 2019.

[41]

Z. Zhou, M. Siddiquee, N. Tajbakhsh, and J. Liang. U-net: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging, 39(6):1856--1867, 2019.

Index Terms

Semi-Supervised Convolutional Vision Transformer with Bi-Level Uncertainty Estimation for Medical Image Segmentation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Semi- and Weakly- Supervised Semantic Segmentation with Deep Convolutional Neural Networks
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Successful semantic segmentation methods typically rely on the training datasets containing a large number of pixel-wise labeled images. To alleviate the dependence on such a fully annotated training dataset, in this paper, we propose a semi- and weakly-...
Uncertainty-aware and dynamically-mixed pseudo-labels for semi-supervised defect segmentation
Abstract
Deep learning-based defect segmentation is one of the important tasks of machine vision in automated inspection. Supervised learning methods have recently achieved remarkable performance on this task. However, the effectiveness of the ...
Highlights
- A novel semi-supervised defect segmentation based on uncertainty-aware pseudo-label.
Uncertainty-aware deep co-training for semi-supervised medical image segmentation
Abstract
Semi-supervised learning has made significant strides in the medical domain since it alleviates the heavy burden of collecting abundant pixel-wise annotated data for semantic segmentation tasks. Existing semi-supervised approaches ...
Highlights
- We exposed the flaws of the semi-supervised segmentation method for medical images.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Major Technological Innovation Project of Hangzhou
the National Key Research and Development Project
Japanese Ministry for Education, Science, Culture and Sports
Zhejiang Provincial Natural Science Foundation of China
Major Scientific Research Project of Zhejiang Lab

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
281
Total Downloads

Downloads (Last 12 months)281
Downloads (Last 6 weeks)7

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents