Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581783.3611821acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Semi-Supervised Convolutional Vision Transformer with Bi-Level Uncertainty Estimation for Medical Image Segmentation

Published: 27 October 2023 Publication History

Abstract

Semi-supervised learning (SSL) has attracted much attention in the field of medical image segmentation, which enables to alleviate the heavy burden of labelling pixel-wise annotation by extracting knowledge from unlabeled data. The existing methods basically benefit from the success of convolutional neural networks (CNNs) by keeping consistency of the predictions under small perturbations imposed on the networks or inputs. Two main concerns arise when learning such a paradigm: (1) CNNs tend to retain discriminative local features, neglecting global dependency and thus leading to inaccurate localization; (2) CNNs omit reliable feature-level and pixel-level information, resulting in sketchy pseudo-labels, especially around the confusing boundary. In this paper, we revisit the model of semi-supervised learning and develop a novel CNN-Transformer learning framework that allows for effective segmentation of medical images by producing complementary and reliable features and pseudo-label with bi-level uncertainty. Motivated by the uncertainty estimation to gain insight on feature discrimination, we explore the statistical and geometrical properties of features on network optimization and thus launching an alignment method in a more accurate and stable way. We attach equal significance to pixel-level uncertainty estimation for alleviating the influence of unreliable pseudo-labels in the training progress and advocating the reliability of predictions. Experimental results show that our method significantly surpasses existing semi-supervised approaches on two public medical image segmentation datasets.

References

[1]
A.Tarvainen and H.Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems, 30, 2017.
[2]
Wenjia Bai, Ozan Oktay, Matthew Sinclair, Hideaki Suzuki, Martin Rajchl, Giacomo Tarroni, Ben Glocker, Andrew King, Paul M Matthews, and Daniel Rueckert. Semi-supervised learning for network-based cardiac mr image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 253--260. Springer, 2017.
[3]
H. Basak, R. Bhattacharya, R. Hussain, and A. Chatterjee. An embarrassingly simple consistency regularization method for semi-supervised medical image segmentation. arXiv preprint arXiv:2202.00677, 2022.
[4]
O. Bernard, A. Lalande, C. Zotti, F. Cervenansky, X. Yang, P. Heng, I. Cetin, K. Lekadir, O. Camara, and M. Ballester. Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE transactions on medical imaging, 37(11):2514--2525, 2018.
[5]
H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang. Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537, 2021.
[6]
Xuyang Cao, Houjin Chen, Yanfeng Li, Yahui Peng, Shu Wang, and Lin Cheng. Uncertainty aware temporal-ensembling model for semisupervised abusmass segmentation. volume 40, pages 431--443. IEEE, 2020.
[7]
Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.
[8]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. volume 40, pages 834--848. IEEE, 2017.
[9]
X. Chen, Y. Yuan, G. Zeng, and J. Wang. Semi-supervised semantic segmentation with cross pseudo supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2613--2622, 2021.
[10]
Noel Codella, Veronica Rotemberg, Philipp Tschandl, M Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368, 2019.
[11]
Jun Dan, Tao Jin, Hao Chi, Shunjie Dong, and Yixuan Shen. Uncertainty-guided joint unbalanced optimal transport for unsupervised domain adaptation. Neural Computing and Applications, pages 1--17, 2022.
[12]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, and S. Gelly. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[13]
Wenlong Hang, Wei Feng, Shuang Liang, Lequan Yu, QiongWang, Kup-Sze Choi, and Jing Qin. Local and global structure-aware entropy regularized mean teacher model for 3d left atrium segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 562--571. Springer, 2020.
[14]
Huimin Huang, Lanfen Lin, Ruofeng Tong, Hongjie Hu, Qiaowei Zhang, Yutaro Iwamoto, Xianhua Han, Yen-Wei Chen, and Jian Wu. U-net 3: A full-scale connected u-net for medical image segmentation. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1055--1059. IEEE, 2020.
[15]
Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242, 2016.
[16]
Xiaomeng Li, Lequan Yu, Hao Chen, Chi-Wing Fu, and Pheng-Ann Heng. Semi-supervised skin lesion segmentation via transformation consistent self-ensembling model. arXiv preprint arXiv:1808.03887, 2018.
[17]
Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sanchez. A survey on deep learning in medical image analysis. Medical Image Analysis, 42:60--88, 2017.
[18]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012--10022, 2021.
[19]
X. Luo, M. Hu, T.Song, G. Wang, and S. Zhang. Semi-supervised medical image segmentation via cross teaching between cnn and transformer. arXiv preprint arXiv:2112.04894, 2021.
[20]
X. Luo, W. Liao, J. Chen, T. Song, Y. Chen, S. Zhang, N. Chen, G. Wang, and S. Zhang. Efficient semi-supervised gross target volume of nasopharyngeal carcinoma segmentation via uncertainty rectified pyramid consistency. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 318--329, 2021.
[21]
Dong Nie, Yaozong Gao, Li Wang, and Dinggang Shen. Asdnet: attention based semi-supervised deep networks for medical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 370--378. Springer, 2018.
[22]
Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, et al. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999, 2018.
[23]
Y. Ouali, C. Hudelot, and M. Tami. Semi-supervised semantic segmentation with cross-consistency training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12674--12684, 2020.
[24]
Yassine Ouali, Céline Hudelot, and Myriam Tami. Semi-supervised semantic segmentation with cross-consistency training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12674--12684, 2020.
[25]
Zhiliang Peng, Wei Huang, Shanzhi Gu, Lingxi Xie, Yaowei Wang, Jianbin Jiao, and Qixiang Ye. Conformer: Local features coupling global representations for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 367--376, 2021.
[26]
S. Qiao, W. Shen, Z. Zhang, B. Wang, and A. Yuille. Deep co-training for semi-supervised image recognition. In Proceedings of the European Conference on Computer Vision, pages 135--152, 2018.
[27]
O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234--241. Springer, 2015.
[28]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017.
[29]
V. Verma, K. Kawaguchi, A. Lamb, J. Kannala, Y. Bengio, and D. Lopez-Paz. Interpolation consistency training for semi-supervised learning. arXiv preprint arXiv:1903.03825, 2019.
[30]
T. Vu, H. Jain, and M. Bucher. Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2517--2526, 2019.
[31]
Yixin Wang, Yao Zhang, Jiang Tian, Cheng Zhong, Zhongchao Shi, Yang Zhang, and Zhiqiang He. Double-uncertainty weighted method for semi-supervised learning. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 542--551. Springer, 2020.
[32]
Ziyang Wang, Jian-Qing Zheng, and Irina Voiculescu. An uncertainty aware transformer for mri cardiac semantic segmentation via mean teachers. In Medical Image Understanding and Analysis: 26th Annual Conference, MIUA 2022, Cambridge, UK, July 27-29, 2022, Proceedings, pages 494--507. Springer, 2022.
[33]
Y. Wu, Z. Wu, Q. Wu, Z. Ge, and J. Cai. Exploring smoothness and class-separation for semi-supervised medical image segmentation. arXiv preprint arXiv:2203.01324, 2022.
[34]
Yingda Xia, Fengze Liu, Dong Yang, Jinzheng Cai, Lequan Yu, Zhuotun Zhu, Daguang Xu, Alan Yuille, and Holger Roth. 3d semi-supervised learning with uncertainty-aware multi-view co-training. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3646--3655, 2020.
[35]
Fan Yang, Qiang Zhai, Xin Li, Rui Huang, Ao Luo, Hong Cheng, and Deng-Ping Fan. Uncertainty-guided transformer reasoning for camouflaged object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4146--4155, 2021.
[36]
L. Yu, S. Wang, S. Li, C. Fu, and P. Heng. Uncertainty-aware selfensembling model for semi-supervised 3d left atrium segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 605--613. Springer, 2019.
[37]
Y. Zhang, L. Yang, J. Chen, M. Fredericksen, D. Hughes, and D. Chen. Deep adversarial networks for biomedical image segmentation utilizing unannotated images. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention, pages 408--416. Springer, 2017.
[38]
Han Zheng, Lanfen Lin, Hongjie Hu, Qiaowei Zhang, Qingqing Chen, Yutaro Iwamoto, Xianhua Han, Yen-Wei Chen, Ruofeng Tong, and Jian Wu. Semi-supervised segmentation of liver using adversarial learning with deep atlas prior. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 148--156. Springer, 2019.
[39]
Xu Zheng, Yunhao Luo, Hao Wang, Chong Fu, and Lin Wang. Transformer-cnn cohort: semi-supervised semantic segmentation by the best of both students. arXiv preprint arXiv:2209.02178, 2022.
[40]
Yuyin Zhou, Yan Wang, Peng Tang, Song Bai, Wei Shen, Elliot Fishman, and Alan Yuille. Semi-supervised 3d abdominal multi-organ segmentation via deep multi-planar co-training. In 2019 IEEE Winter Conference on Applications of Computer Vision, pages 121--140. IEEE, 2019.
[41]
Z. Zhou, M. Siddiquee, N. Tajbakhsh, and J. Liang. U-net: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging, 39(6):1856--1867, 2019.

Index Terms

  1. Semi-Supervised Convolutional Vision Transformer with Bi-Level Uncertainty Estimation for Medical Image Segmentation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. semi-supervised learning
    2. transformer
    3. uncertainty estimation

    Qualifiers

    • Research-article

    Funding Sources

    • Major Technological Innovation Project of Hangzhou
    • the National Key Research and Development Project
    • Japanese Ministry for Education, Science, Culture and Sports
    • Zhejiang Provincial Natural Science Foundation of China
    • Major Scientific Research Project of Zhejiang Lab

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 995 of 4,171 submissions, 24%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 281
      Total Downloads
    • Downloads (Last 12 months)281
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 02 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media