Abstract
Most of the initial medical image segmentation methods based on deep learning adopt a full convolutional structure, while the fixed size of the convolutional window limits the modeling of long-range dependencies. ViT has powerful global modelling capabilities, but low-level feature detail is poorly represented. To address the above problems, we propose a novel encoder structure and design a new U-shaped network for medical image segmentation, called Pie-UNet. Firstly, facing the problem of lack of localization in ViT and lack of global perception in CNN, we complement each other by encoding global and local information separately and implementing both in a parallel interaction manner; meanwhile, we propose a network with local structure-aware ViT, called Rwin Transformer, to enhance the local detail representation of ViT itself; in addition, to further refine the local representation, we construct a focal modulator based on large kernels; finally, we propose a pre-fusion approach to optimize the information interaction between heterogeneous structures. The experimental results demonstrate that our proposed Pie-UNet can achieve optimal and accurate segmentation results compared with several existing medical image segmentation methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, J., et al.: TransuNet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Chen, M., et al.: Generative pretraining from pixels. In: International Conference on Machine Learning, pp. 1691–1703. PMLR (2020)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Gao, S.H., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.: Res2Net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2117–2125 (2017)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Liu, Z., Mao, H., Wu, C.Y.: Christoph feichtenhofer trevor darrell and saining xie. a convnet for the 2020s. CoRR (2022)
Oktay, O., et al.: Attention u-net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 10 (2018)
Peng, Z., et al.: Conformer: local features coupling global representations for visual recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 367–376 (2021)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers and distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
Valanarasu, J.M.J., Patel, V.M.: UNeXt: MLP-Based Rapid Medical Image Segmentation Network. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention–MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol. 13435, pp. 23–33. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16443-9_3
Wang, H., Cao, P., Wang, J., Zaiane, O.R.: UcTransNet: rethinking the skip connections in U-NET from a channel-wise perspective with transformer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2441–2449 (2022)
Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., Chen, L.-C.: Axial-DeepLab: stand-alone axial-attention for panoptic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 108–126. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_7
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077–12090 (2021)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Yang, J., Li, C., Dai, X., Gao, J.: Focal modulation networks. Adv. Neural. Inf. Process. Syst. 35, 4203–4217 (2022)
Zhang, Y., Liu, H., Hu, Q.: TransFuse: fusing transformers and CNNs for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 14–24. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_2
Zhang, Z., Liu, Q., Wang, Y.: Road extraction by deep residual U-NET. IEEE Geosci. Remote Sens. Lett. 15(5), 749–753 (2018)
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-Net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant 62102331, the Natural Science Foundation of Sichuan Province under Grant 2022NSFSC0839 and the Doctoral Research Fund Project of Southwest University of science and Technology 22zx7110.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jiang, Y., Zhang, X., Chen, Y., Yang, S., Sun, F. (2023). Pie-UNet: A Novel Parallel Interaction Encoder for Medical Image Segmentation. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14255. Springer, Cham. https://doi.org/10.1007/978-3-031-44210-0_45
Download citation
DOI: https://doi.org/10.1007/978-3-031-44210-0_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44209-4
Online ISBN: 978-3-031-44210-0
eBook Packages: Computer ScienceComputer Science (R0)