Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Hybrid U-Net: Instrument Semantic Segmentation in RMIS

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1965))

Included in the following conference series:

  • 594 Accesses

Abstract

Accurate semantic segmentation of surgical instruments from images captured by the laparoscopic system plays a crucial role in ensuring the reliability of vision-based Robot-Assisted Minimally Invasive Surgery. Despite numerous notable advancements in semantic segmentation, the achieved segmentation accuracy still falls short of meeting the requirements for surgical safety. To enhance the accuracy further, we propose several modifications to a conventional medical image segmentation network, including a modified Feature Pyramid Module. Within this modified module, Patch-Embedding with varying rates and Self-Attention Blocks are employed to mitigate the loss of feature information while simultaneously expanding the receptive field. As for the network architecture, all feature maps extracted by the encoder are seamlessly integrated into the proposed modified Feature Pyramid Module via element-wise connections. The resulting output from this module is then transmitted to the decoder blocks at each stage. Considering these hybrid properties, the proposed method is called Hybrid U-Net. Subsequently, multiple experiments were conducted on two available medical datasets and the experimental results reveal that our proposed method outperforms the recent methods in terms of accuracy on both medical datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Allan, M., et al.: Image based surgical instrument pose estimation with multi-class labelling and optical flow. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 331–338. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24553-9_41

    Chapter  Google Scholar 

  2. Allan, M., et al.: 2017 robotic instrument segmentation challenge. arXiv preprint arXiv:1902.06426 (2019)

  3. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  4. Chen, L.C., et al.: Searching for efficient multi-scale architectures for dense image prediction. In: Advances in Neural Information Processing Systems 31 (2018)

    Google Scholar 

  5. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49

    Chapter  Google Scholar 

  6. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2021)

    Google Scholar 

  7. Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)

    Article  Google Scholar 

  8. Huang, H., et al.: Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055–1059. IEEE (2020)

    Google Scholar 

  9. Iglovikov, V., Shvets, A.: Ternausnet: U-net with vgg11 encoder pre-trained on imagenet for image segmentation. arXiv e-prints, arXiv-1801 (2018)

    Google Scholar 

  10. Islam, M., Vibashan, V., Lim, C.M., Ren, H.: St-mtl: spatio-temporal multitask learning model to predict scanpath while tracking instruments in robotic surgery. Med. Image Anal. 67, 101837 (2021)

    Article  Google Scholar 

  11. Jha, D., et al.: Kvasir-instrument: diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12573, pp. 218–229. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67835-7_19

    Chapter  Google Scholar 

  12. Jha, D., Riegler, M.A., Johansen, D., Halvorsen, P., Johansen, H.D.: Doubleu-net: a deep convolutional neural network for medical image segmentation. In: 2020 IEEE 33rd International symposium on computer-based medical systems (CBMS), pp. 558–564. IEEE (2020)

    Google Scholar 

  13. Jha, D., et al.: Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE International Symposium on Multimedia (ISM), pp. 225–2255. IEEE (2019)

    Google Scholar 

  14. Jin, Y., Cheng, K., Dou, Q., Heng, P.-A.: Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 440–448. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_49

    Chapter  Google Scholar 

  15. Liu, X., et al.: Msdf-net: multi-scale deep fusion network for stroke lesion segmentation. IEEE Access 7, 178486–178495 (2019)

    Article  Google Scholar 

  16. Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  17. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  18. Mahmood, T., Cho, S.W., Park, K.R.: Dsrd-net: dual-stream residual dense network for semantic segmentation of instruments in robot-assisted surgery. Expert Syst. Appl. 202, 117420 (2022)

    Article  Google Scholar 

  19. Mohammed, A., Yildirim, S., Farup, I., Pedersen, M., Hovde, Ø.: Streoscennet: surgical stereo robotic scene segmentation. In: Medical Imaging 2019: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10951, pp. 174–182. SPIE (2019)

    Google Scholar 

  20. Mohan, R., Valada, A.: Efficientps: efficient panoptic segmentation. Int. J. Comput. Vision 129(5), 1551–1579 (2021)

    Article  Google Scholar 

  21. Moustris, G.P., Hiridis, S.C., Deliparaschos, K.M., Konstantinidis, K.M.: Evolution of autonomous and semi-autonomous robotic surgical systems: a review of the literature. Inter. J. Med. Robotics Comput. Assisted Surg. 7(4), 375–392 (2011)

    Article  Google Scholar 

  22. Ni, Z.L., et al.: Pyramid attention aggregation network for semantic segmentation of surgical instruments. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11782–11790 (2020)

    Google Scholar 

  23. Ni, Z.L., et al.: Barnet: bilinear attention network with adaptive receptive fields for surgical instrument segmentation. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 832–838 (2021)

    Google Scholar 

  24. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  25. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  26. Tan, M., Le, Q.: Efficientnetv2: smaller models and faster training. In: International Conference on Machine Learning, pp. 10096–10106. PMLR (2021)

    Google Scholar 

  27. Watanabe, T., Tanioka, K., Hiwa, S., Hiroyasu, T.: Performance comparison of deep learning architectures for artifact removal in gastrointestinal endoscopic imaging. arXiv e-prints. arXiv-2201 (2021)

    Google Scholar 

  28. Wightman, R.: Pytorch image models. https://github.com/rwightman/pytorch-image-models (2019). https://doi.org/10.5281/zenodo.4414861

  29. Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 432–448. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_26

    Chapter  Google Scholar 

  30. Yu, L., Wang, P., Yu, X., Yan, Y., Xia, Y.: A holistically-nested u-net: Surgical instrument segmentation based on convolutional neural network. J. Digit. Imaging 33(2), 341–347 (2020)

    Article  Google Scholar 

  31. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of Shandong Province under grants No. ZR2022QF058 and Shandong Province Science and Technology-Oriented Minor Enterprise Innovation Capability Enhancement Project under grants No. 2023TSGC0406 and No. 2022TSGC1328. Thanks to Dr. Max Allan for granting us access to the MICCAI EndoVis 2017 dataset, without which our experiments would not have been possible.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huajian Song .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Y. et al. (2024). Hybrid U-Net: Instrument Semantic Segmentation in RMIS. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1965. Springer, Singapore. https://doi.org/10.1007/978-981-99-8145-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8145-8_32

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8144-1

  • Online ISBN: 978-981-99-8145-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics