Abstract
The machine learning community is very interested in image classification and retrieval, especially in the area of computer vision and with an emphasis on medical image retrieval. Numerous machine learning approaches have been used for image retrieval problems and have made as a result of the ongoing developments in techniques like Convolutional Neral Networks (CNN) and Vision Transformers with quite good performances. The Swin Transformer model is used to create a specialized medical image retrieval system in this paper that is well suited to gastric endoscopic pictures. The suggested technique takes advantage of the Swin Transformer model's classification process to create feature vectors by combining fragmented image segments collected from local windows, making it easier to calculate similarity on the Kvasir dataset that we have added some additional images. Empirical results show that the Swin Transformer model retrieves endoscopic images with a remarkable classification accuracy of 90.5% and an 85% mean average precision at top 20 (mAP@20).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. ACM (2015)
Rao, N., Jiang, H., Luo, C.: Review on the applications of deep learning in the analysis of gastrointestinal endoscopy images. Article in IEEE Access, September 2019
Sommen, F., Zinger, S., Schoon, E.J. (eds.) Computer-aided detection of early Cancer in the Esophagus Using HD endoscopy images. In: Medical Imaging 2013: Computer-Aided Diagnosis, vol. 8670. International Society for Optics and Photonics, Florida (2013)
Hu, H., et al.: Content-based gastric image retrieval using convolutional neural networks. Accepted 20 July 2020
Dosovitskiy, A., et al.: An image is worth 16 × 16 words: transformers for image recognition at scale. Submitted on 22 Oct 2020 (v1)
Trinh, Q.-H., Nguyen, M.-V.: Endoscopy image retrieval by mixer multi-layer perceptron. Computer Science and Information Systems, pp. 223±226. ACSIS. ISSN 2300-5963
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. Submitted on 25 Mar 2021 (v1)
Pogorelov, K., Randel, K.R., Griwodz, C., Eskeland, S.L., de Lange, T., Johansen, D., et al. (eds.) Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection. Paper presented at: Proceedings of the 8th ACM on Multimedia Systems Conference. ACM (2017)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention (2020)
Zeiler, M.: ADADELTA: An adaptive learning rate method. Endoscopic Image Classification and Retrieval use of the Clustered Convolutedonal Features, p. 1212 (2012)
Dubey, S.R., Singh, S.K., Chu, W.-T.: Vision transformer hashing for image retrieval, 26 September 2021
Xia, X., Xu, C., Nan, B.: Inception-v3 for flower classification, pp. 783–787 (2017). https://doi.org/10.1109/ICIVC.2017.7984661
Chebbi, I.: VGG16: VGQR (2021)
Chollet, F.: Xception: deep learning with depthwise separable convolutions, pp. 1800–1807 (2017). https://doi.org/10.1109/CVPR.2017.195
Pogorelov, K., et al.: KVASIR: a multi-class image dataset for computer aided gastrointestinal disease detection (2017). https://doi.org/10.1145/3083187.3083212
Maruyama, T., et al.: Comparison of medical image classification accuracy on the machine learning methods. J. X-ray Sci. Technol. 266, 885, 893 (2018)
Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. J. Big Data 6, 1–18 (2019)
Ahmad, J., Muhammad, K., Baik, S.: Medical image retrieval with compact binary codes generated in frequency domain using highly reactive convolutional features. J. Med. Syst. 42, 119 (2017). https://doi.org/10.1007/s10916-017-0875-4
Shamna, P., Govindan, V.K., Nazeer, K.A.: Content-based medical image retrieval by spatial matching of visual words. J. King Saud Univ. Comp. Inf. Sci. 34 (2018). https://doi.org/10.1016/j.jksuci.2018.10.002
Image content based retrieval system using cosine similarity for skin disease images. ACSIJ Adv. Comput. Sci. Int. J. 2 (2013)
Song, C., Yoon, J., Choi, S., Avrithis, Y.: Boosting vision transformers for image retrieval (2022)
El-Nouby, A., Neverova, N., Laptev, I., Jégou, H.: Training vision transformers for image retrieval (2021)
Thakrar, A., et al.: Semantic retrieval of similar radiological images using vision transformers (2023). https://doi.org/10.1101/2023.02.16.23286056
Feng, Q., et al.: EViT: Privacy-preserving image retrieval via encrypted vision transformer in cloud computing (2023)
Tang, T., et al.: Learning self-regularized adversarial views for self-supervised vision transformers (2022). https://doi.org/10.48550/arXiv.2210.08458
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Luu, N.D., Anh, V.T. (2024). Application of Swin Transformer Model to Retrieve and Classify Endoscopic Images. In: Thai-Nghe, N., Do, TN., Haddawy, P. (eds) Intelligent Systems and Data Science. ISDS 2023. Communications in Computer and Information Science, vol 1950. Springer, Singapore. https://doi.org/10.1007/978-981-99-7666-9_13
Download citation
DOI: https://doi.org/10.1007/978-981-99-7666-9_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7665-2
Online ISBN: 978-981-99-7666-9
eBook Packages: Computer ScienceComputer Science (R0)