Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581783.3612335acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Uni-Dual: A Generic Unified Dual-Task Medical Self-Supervised Learning Framework

Published: 27 October 2023 Publication History

Abstract

RGB images and medical hyperspectral images (MHSIs) are two widely-used modalities in computational pathology. The former is cheap, easy and fast to obtain while lacking pathological information such as physiochemical state. The latter is an emerging modality which captures electromagnetic radiation matter interaction but suffers from problems such as high time cost and low spatial resolution. In this paper, we bring forward a unified dual-task multi-modality self-supervised learning (SSL) framework, called Uni-Dual, which takes the most use of both paired and unpaired RGB-MHSIs. Concretely, we design a unified SSL paradigm for RGB images and MHSIs. Two tasks are proposed: (1) a discrimination learning task which learns high-level semantics via mining the cross-correlation across unpaired RGB-MHSIs, (2) a reconstruction learning task which models low-level stochastic variations via furthering the interaction across RGB-MHSI pairs. Our Uni-Dual enjoys the following benefits: (1) A unified model which can be easily transferred to different downstream tasks on various modality combinations. (2) We consider multi-constituent and structured information learning from MHSIs and RGB images for low-cost high-precision clinical purposes. Experiments conducted on various downstream tasks with different modalities show the proposed Uni-Dual substantially outperforms other competitive SSL methods.

References

[1]
Shekoofeh Azizi et al. 2021. Big self-supervised models advance medical image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3478--3488.
[2]
Hangbo Bao, Li Dong, and Furu Wei. 2021. Beit: bert pre-training of image transformers. arXiv preprint arXiv:2106.08254.
[3]
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, 9650--9660.
[4]
Krishna Chaitanya, Ertunc Erdil, Neerav Karani, and Ender Konukoglu. 2020. Contrastive learning of global and local features for medical image segmentation with limited annotations. Advances in Neural Information Processing Systems, 33, 12546--12558.
[5]
Xinlei Chen and Kaiming He. 2021. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15750--15758.
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[7]
Renwei Dian, Leyuan Fang, and Shutao Li. 2017. Hyperspectral image superresolution via non-local sparse tensor factorization. computer vision and pattern recognition.
[8]
Renwei Dian, Leyuan Fang, and Shutao Li. 2017. Hyperspectral image superresolution via non-local sparse tensor factorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5344--5353.
[9]
Ming Ding et al. 2021. Cogview: mastering text-to-image generation via transformers. Advances in Neural Information Processing Systems, 34, 19822--19835.
[10]
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677.
[11]
Jean-Bastien Grill et al. 2020. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33, 21271--21284.
[12]
Irene Gullo et al. 2020. Precancerous lesions of the stomach, gastric cancer and hereditary gastric cancer syndromes. Pathologica, 112, 3, 166.
[13]
Fatemeh Haghighi, Mohammad Reza Hosseinzadeh Taher, Michael B. Gotway, and Jianming Liang. 2022. Dira: discriminative, restorative, and adversarial learning for self-supervised medical image analysis. In.
[14]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16000--16009.
[15]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 9729--9738.
[16]
Danfeng Hong, Lianru Gao, Naoto Yokoya, Jing Yao, Jocelyn Chanussot, Qian Du, and Bing Zhang. 2020. More diverse means better: multimodal deep learning meets remote-sensing imagery classification. IEEE Transactions on Geoscience and Remote Sensing, 59, 5, 4340--4354.
[17]
Mohammad Reza Hosseinzadeh Taher, Fatemeh Haghighi, Ruibin Feng, Michael B Gotway, and Jianming Liang. 2021. A systematic benchmarking analysis of transfer learning for medical image analysis. In Domain Adaptation and Representation Transfer, and Affordable Healthcare and AI for Resource Diverse Global Health. Springer, 3--13.
[18]
Rei Kawakami, Yasuyuki Matsushita, John Wright, Moshe Ben-Ezra, Yu-Wing Tai, and Katsushi Ikeuchi. 2011. High-resolution hyperspectral imaging via matrix factorization. In CVPR 2011. IEEE, 2329--2336.
[19]
Rei Kawakami, Yasuyuki Matsushita, John Wright, Moshe Ben-Ezra, Yu-Wing Tai, and Katsushi Ikeuchi. 2011. High-resolution hyperspectral imaging via matrix factorization. computer vision and pattern recognition.
[20]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60, 6, 84--90.
[21]
Qingli Li, Mei Zhou, Hongying Liu, Yiting Wang, and Fangmin Guo. 2015. Red blood cell count automation using microscopic hyperspectral imaging technology. Applied spectroscopy, 69, 12, 1372--1380.
[22]
Weixiong Lin, Ziheng Zhao, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, and Weidi Xie. 2023. Pmc-clip: contrastive language-image pre-training using biomedical documents. arXiv preprint arXiv:2303.07240.
[23]
Lena Maier-Hein et al. 2022. Surgical data science-from concepts toward clinical translation. Medical image analysis, 76, 102306.
[24]
Alec Radford et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763.
[25]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684--10695.
[26]
Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J Fleet, and Mohammad Norouzi. 2022. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence.
[27]
Silvia Seidlitz et al. 2022. Robust deep learning-based semantic organ segmentation in hyperspectral images. Medical Image Analysis, 102488.
[28]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, 618--626.
[29]
Yucheng Tang, Dong Yang,Wenqi Li, Holger R Roth, Bennett Landman, Daguang Xu, Vishwesh Nath, and Ali Hatamizadeh. 2022. Self-supervised pre-training of swin transformers for 3d medical image analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20730--20740.
[30]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, 30.
[31]
Fuying Wang, Yuyin Zhou, Shujun Wang, Varut Vardhanabhuti, and Lequan Yu. 2022. Multi-granularity cross-modal alignment for generalized medical visual representation learning. Advances in Neural Information Processing Systems, 35, 33536--33549.
[32]
Qian Wang, Li Sun, Yan Wang, Mei Zhou, Menghan Hu, Jiangang Chen, Ying Wen, and Qingli Li. 2021. Identification of melanoma from hyperspectral pathology image using 3d convolutional networks. IEEE Trans. Medical Imaging, 40, 1, 218--227.
[33]
Xingran Xie, YanWang, and Qingli Li. 2022. S3r: self-supervised spectral regression for hyperspectral histopathology image classification. In Proc. MICCAI.
[34]
Yutong Xie, Jianpeng Zhang, Yong Xia, and Qi Wu. 2022. Unimiss: universal medical self-supervised learning via breaking dimensionality barrier. In Proc. ECCV.
[35]
Jize Xue, Yong-Qiang Zhao, Yuanyang Bu, Wenzhi Liao, Jonathan Cheung-Wai Chan, and Wilfried Philips. 2021. Spatial-spectral structured sparse low-rank representation for hyperspectral image super-resolution. IEEE Transactions on Image Processing, 30, 3084--3097.
[36]
Zhixiang Xue, Xuchu Yu, Anzhu Yu, Bing Liu, Pengqiang Zhang, and Shentong Wu. 2022. Self-supervised feature learning for multimodal remote sensing image land cover classification. IEEE Transactions on Geoscience and Remote Sensing, 60, 1--15.
[37]
Pengshuai Yang, Xiaoxu Yin, Haiming Lu, Zhongliang Hu, Xuegong Zhang, Rui Jiang, and Hairong Lv. 2022. Cs-co: a hybrid self-supervised visual representation learning method for h&e-stained histopathological images. Medical Image Analysis, 81, 102539.
[38]
Qing Zhang, Qingli Li, Guanzhen Yu, Li Sun, Mei Zhou, and Junhao Chu. 2019. A multidimensional choledoch database and benchmarks for cholangiocarcinoma diagnosis. IEEE access, 7, 149414--149421.
[39]
Ying Zhang, Yan Wang, Benyan Zhang, and Qingli Li. 2022. A hyperspectral dataset of precancerous lesions in gastric cancer and benchmarks for pathological diagnosis. Journal of Biophotonics, e202200163.
[40]
Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2022. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16816--16825.
[41]
Lei Zhou, Huidong Liu, Joseph Bae, Junjun He, Dimitris Samaras, and Prateek Prasanna. 2022. Self pre-training with masked autoencoders for medical image analysis. arXiv preprint arXiv:2203.05573.
[42]
Zongwei Zhou, Vatsal Sodha, Md Mahfuzur Rahman Siddiquee, Ruibin Feng, Nima Tajbakhsh, Michael B Gotway, and Jianming Liang. 2019. Models genesis: generic autodidactic models for 3d medical image analysis. In International conference on medical image computing and computer-assisted intervention. Springer, 384--393.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. medical hyperspectral images
  2. multi-modality image representations
  3. self-suspervised learning

Qualifiers

  • Research-article

Funding Sources

Conference

MM '23
Sponsor:
MM '23: The 31st ACM International Conference on Multimedia
October 29 - November 3, 2023
Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24
The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne , VIC , Australia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 218
    Total Downloads
  • Downloads (Last 12 months)218
  • Downloads (Last 6 weeks)5
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media