Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Application of Swin Transformer Model to Retrieve and Classify Endoscopic Images

  • Conference paper
  • First Online:
Intelligent Systems and Data Science (ISDS 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1950))

Included in the following conference series:

  • 290 Accesses

Abstract

The machine learning community is very interested in image classification and retrieval, especially in the area of computer vision and with an emphasis on medical image retrieval. Numerous machine learning approaches have been used for image retrieval problems and have made as a result of the ongoing developments in techniques like Convolutional Neral Networks (CNN) and Vision Transformers with quite good performances. The Swin Transformer model is used to create a specialized medical image retrieval system in this paper that is well suited to gastric endoscopic pictures. The suggested technique takes advantage of the Swin Transformer model's classification process to create feature vectors by combining fragmented image segments collected from local windows, making it easier to calculate similarity on the Kvasir dataset that we have added some additional images. Empirical results show that the Swin Transformer model retrieves endoscopic images with a remarkable classification accuracy of 90.5% and an 85% mean average precision at top 20 (mAP@20).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. ACM (2015)

    Google Scholar 

  2. Rao, N., Jiang, H., Luo, C.: Review on the applications of deep learning in the analysis of gastrointestinal endoscopy images. Article in IEEE Access, September 2019

    Google Scholar 

  3. Sommen, F., Zinger, S., Schoon, E.J. (eds.) Computer-aided detection of early Cancer in the Esophagus Using HD endoscopy images. In: Medical Imaging 2013: Computer-Aided Diagnosis, vol. 8670. International Society for Optics and Photonics, Florida (2013)

    Google Scholar 

  4. Hu, H., et al.: Content-based gastric image retrieval using convolutional neural networks. Accepted 20 July 2020

    Google Scholar 

  5. Dosovitskiy, A., et al.: An image is worth 16 × 16 words: transformers for image recognition at scale. Submitted on 22 Oct 2020 (v1)

    Google Scholar 

  6. Trinh, Q.-H., Nguyen, M.-V.: Endoscopy image retrieval by mixer multi-layer perceptron. Computer Science and Information Systems, pp. 223±226. ACSIS. ISSN 2300-5963

    Google Scholar 

  7. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. Submitted on 25 Mar 2021 (v1)

    Google Scholar 

  8. Pogorelov, K., Randel, K.R., Griwodz, C., Eskeland, S.L., de Lange, T., Johansen, D., et al. (eds.) Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection. Paper presented at: Proceedings of the 8th ACM on Multimedia Systems Conference. ACM (2017)

    Google Scholar 

  9. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention (2020)

    Google Scholar 

  10. Zeiler, M.: ADADELTA: An adaptive learning rate method. Endoscopic Image Classification and Retrieval use of the Clustered Convolutedonal Features, p. 1212 (2012)

    Google Scholar 

  11. Dubey, S.R., Singh, S.K., Chu, W.-T.: Vision transformer hashing for image retrieval, 26 September 2021

    Google Scholar 

  12. Xia, X., Xu, C., Nan, B.: Inception-v3 for flower classification, pp. 783–787 (2017). https://doi.org/10.1109/ICIVC.2017.7984661

  13. Chebbi, I.: VGG16: VGQR (2021)

    Google Scholar 

  14. Chollet, F.: Xception: deep learning with depthwise separable convolutions, pp. 1800–1807 (2017). https://doi.org/10.1109/CVPR.2017.195

  15. Pogorelov, K., et al.: KVASIR: a multi-class image dataset for computer aided gastrointestinal disease detection (2017). https://doi.org/10.1145/3083187.3083212

  16. Maruyama, T., et al.: Comparison of medical image classification accuracy on the machine learning methods. J. X-ray Sci. Technol. 266, 885, 893 (2018)

    Google Scholar 

  17. Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. J. Big Data 6, 1–18 (2019)

    Article  Google Scholar 

  18. Ahmad, J., Muhammad, K., Baik, S.: Medical image retrieval with compact binary codes generated in frequency domain using highly reactive convolutional features. J. Med. Syst. 42, 119 (2017). https://doi.org/10.1007/s10916-017-0875-4

    Article  Google Scholar 

  19. Shamna, P., Govindan, V.K., Nazeer, K.A.: Content-based medical image retrieval by spatial matching of visual words. J. King Saud Univ. Comp. Inf. Sci. 34 (2018). https://doi.org/10.1016/j.jksuci.2018.10.002

  20. Image content based retrieval system using cosine similarity for skin disease images. ACSIJ Adv. Comput. Sci. Int. J. 2 (2013)

    Google Scholar 

  21. Song, C., Yoon, J., Choi, S., Avrithis, Y.: Boosting vision transformers for image retrieval (2022)

    Google Scholar 

  22. El-Nouby, A., Neverova, N., Laptev, I., Jégou, H.: Training vision transformers for image retrieval (2021)

    Google Scholar 

  23. Thakrar, A., et al.: Semantic retrieval of similar radiological images using vision transformers (2023). https://doi.org/10.1101/2023.02.16.23286056

  24. Feng, Q., et al.: EViT: Privacy-preserving image retrieval via encrypted vision transformer in cloud computing (2023)

    Google Scholar 

  25. Tang, T., et al.: Learning self-regularized adversarial views for self-supervised vision transformers (2022). https://doi.org/10.48550/arXiv.2210.08458

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ngo Duc Luu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luu, N.D., Anh, V.T. (2024). Application of Swin Transformer Model to Retrieve and Classify Endoscopic Images. In: Thai-Nghe, N., Do, TN., Haddawy, P. (eds) Intelligent Systems and Data Science. ISDS 2023. Communications in Computer and Information Science, vol 1950. Springer, Singapore. https://doi.org/10.1007/978-981-99-7666-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7666-9_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7665-2

  • Online ISBN: 978-981-99-7666-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics