CT-Net: Asymmetric compound branch Transformer for medical image segmentation

Ning Zhang; Long Yu; Dezhi Zhang; Weidong Wu; Shengwei Tian; Xiaojing Kang; Min Li

doi:10.1016/j.neunet.2023.11.034

CT-Net: Asymmetric compound branch Transformer for medical image segmentation

Neural Netw. 2024 Feb:170:298-311. doi: 10.1016/j.neunet.2023.11.034. Epub 2023 Nov 14.

Authors

Ning Zhang¹, Long Yu², Dezhi Zhang³, Weidong Wu³, Shengwei Tian⁴, Xiaojing Kang³, Min Li¹

Affiliations

¹ School of Computer Science and Engineering, Central South University, Changsha 410083, China.
² College of Information Science and Engineering, Xinjiang University, Urumqi 830000, China; College of Network Center, Xinjiang University, Urumqi 830000, China. Electronic address: yul@xju.edu.cn.
³ People's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang University, China.
⁴ College of Software, Xinjiang University, Urumqi 830000, China.

PMID: 38006733
DOI: 10.1016/j.neunet.2023.11.034

Abstract

The Transformer architecture has been widely applied in the field of image segmentation due to its powerful ability to capture long-range dependencies. However, its ability to capture local features is relatively weak and it requires a large amount of data for training. Medical image segmentation tasks, on the other hand, demand high requirements for local features and are often applied to small datasets. Therefore, existing Transformer networks show a significant decrease in performance when applied directly to this task. To address these issues, we have designed a new medical image segmentation architecture called CT-Net. It effectively extracts local and global representations using an asymmetric asynchronous branch parallel structure, while reducing unnecessary computational costs. In addition, we propose a high-density information fusion strategy that efficiently fuses the features of two branches using a fusion module of only 0.05M. This strategy ensures high portability and provides conditions for directly applying transfer learning to solve dataset dependency issues. Finally, we have designed a parameter-adjustable multi-perceptive loss function for this architecture to optimize the training process from both pixel-level and global perspectives. We have tested this network on 5 different tasks with 9 datasets, and compared to SwinUNet, CT-Net improves the IoU by 7.3% and 1.8% on Glas and MoNuSeg datasets respectively. Moreover, compared to SwinUNet, the average DSC on the Synapse dataset is improved by 3.5%.

Keywords: CNN; Feature fusion; Loss function; Medical image segmentation; Transformer.

MeSH terms

Image Processing, Computer-Assisted
Learning*
Synapses*
Tomography, X-Ray Computed
Upper Extremity