Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
License: CC BY 4.0
arXiv:2312.15273v1 [cs.CV] 23 Dec 2023

Benefit from public unlabeled data: A Frangi filtering-based pretraining network for 3D cerebrovascular segmentation

Gen Shi Hao Lu Hui Hui hui.hui@ia.ac.cn Jie Tian tian@ieee.org School of Engineering Medicine and School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China, and also with the Key Laboratory of Big DataBased Precision Medicine (Beihang University), Ministry of Industry and Information Technology of China, Beijing, 100191, China State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academic of Science, Beijing 10086, China CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Abstract

The precise cerebrovascular segmentation in time-of-flight magnetic resonance angiography (TOF-MRA) data is crucial for clinically computer-aided diagnosis. However, the sparse distribution of cerebrovascular structures in TOF-MRA results in an exceedingly high cost for manual data labeling. The use of unlabeled TOF-MRA data holds the potential to enhance model performance significantly. In this study, we construct the largest preprocessed unlabeled TOF-MRA datasets (1510 subjects) to date. We also provide three additional labeled datasets totaling 113 subjects. Furthermore, we propose a simple yet effective pertraining strategy based on Frangi filtering, known for enhancing vessel-like structures, to fully leverage the unlabeled data for 3D cerebrovascular segmentation. Specifically, we develop a Frangi filtering-based preprocessing workflow to handle the large-scale unlabeled dataset, and a multi-task pretraining strategy is proposed to effectively utilize the preprocessed data. By employing this approach, we maximize the knowledge gained from the unlabeled data. The pretrained model is evaluated on four cerebrovascular segmentation datasets. The results have demonstrated the superior performance of our model, with an improvement of approximately 3% compared to state-of-the-art semi- and self-supervised methods. Furthermore, the ablation studies also demonstrate the generalizability and effectiveness of the pretraining method regarding the backbone structures. The code and data have been open source at: https://github.com/shigen-StoneRoot/FFPN.

keywords:
MSC:
41A05, 41A10, 65D05, 65D17 \KWDKeyword1, Keyword2, Keyword3
journal: Medical Image Analysis
Refer to caption
Fig. 1: The illustration of using unlabeled MRA-TOF data through (a) semi-supervised learning, (b) self-supervised learning and (c) our pretraining method.

1 Introduction

TOF-MRA is one non-invasive medical imaging technique to visualize cerebral blood vessels. It provides detailed images of the blood vessels in the brain, enabling early detection and treatment of potentially life-threatening conditions Özsarlak et al. (2004); Hassouna et al. (2006). The accurate vessel segmentation is a crucial preprocessing step in TOF-MRA image analysis, which provides surgeons with essential information about the location, size, and connectivity of blood vessels and facilitates surgical planning and intervention Ni et al. (2020).

Compared to the segmentation of other biological tissues, brain vessel segmentation based on TOF-MRA presents more challenges. Most organs typically have a spherical shape and are distributed in a concentrated manner, while blood vessels have a tubular structure and their distribution in TOF-MRA is relatively sparse Chen et al. (2022a); Xia et al. (2022), which increases the difficulty and cost in obtaining manually labeled MRA-TOF data. On the contrary, the unlabeled TOF-MRA data are easily accessible from the public resource (i.e., IXI and OASIS3). Therefore, how to fully use the public unlabeled data is essential for cerebrovascular segmentation with limited manually labeled data.

There are two fundamental deep learning-based methods that are capable of using unlabeled data—-semi-supervised learning (SemiSL) Qi and Luo (2020); Yang et al. (2022) and self-supervised learning (SSL) Ericsson et al. (2022); Huang et al. (2023) methods (see Fig.1 a and b). SemiSL-based methods typically utilize unlabeled data in conjunction with labeled data during the training process, learning the consistency between them Cheplygina et al. (2019). This process requires repeated access to the unlabeled data and increases the demand for computational resources, when different research institutes want to use these data. Besides, the public unlabeled data are collected from multiple resources and may be highly heterogeneous from the single-site dataset. The issue of distribution shift, arising from this, has the potential to adversely affect the performance of SemiSL methods Chen et al. (2019). Regarding the SSL methods, they usually pretrain the model through the pretext task Zhang et al. (2023), and then the pretrained model can be reused by different institutes without access to the unlabeled data. This process decreases the computational cost when the unlabeled data scale is large and protects patient privacy since different institutes do not directly access the unlabeled data Asadian et al. (2022).

However, the performance of SSL methods highly depends on the choice of pretext task. The pretext tasks, such as contrastive learning between augmented data Chen et al. (2020); Liao et al. (2022) and masked imaging modelling He et al. (2022), are significantly different from the downstream task (e.g., cerebrovascular segmentation in this study). This setting proves advantageous when dealing with multiple downstream tasks, while a specific single downstream task might not gain much benefits from SSL methods. Therefore, incorporating the prior knowledge of the specific downstream task into the pretraining procedure may be extremely valuable. Frangi filtering is an effective technique to enhance and extract tubular structures in medical images Frangi et al. (1998). The filtering process involves analyzing the local intensity and Hessian matrix of the image to detect vessel-like structures, making it useful for applications such as vessel segmentation and analysis in medical imaging. Frangi filtering plays a fundamental role as a preprocessing step in traditional vessel segmentation algorithms. It is worth seriously considering incorporating this filtering technique into the pretraining task.

Driven by the abovementioned analysis, we propose a Frangi filtering-based pretraining network (called FFPN) to fully use the public unlabeled TOF-MRA data. In particular, we have developed a preprocessing workflow to handle large-scale unlabeled data. This preprocessing program utilizes Frangi filtering to enhance vessel structures. Additionally, it incorporates thresholding methods and connected component analysis to obtain a coarse segmentation of the vessels. Then, the vessel-enhanced images and coarse vessel segmentations are used for a multi-task pretraining procedure. Our model is pretrained on the large-scale TOF-MRA dataset with 1510 volumetric data and evaluated on four labeled datasets. The results demonstrate that it notably outperforms the state-of-the-art SemiSL and SSL methods. Besides, the ablation studies also confirm that our proposed pretraining task helps the model achieve better performance with fewer labeled data. Moreover, this pretraining task is not limited to specific model architectures and can significantly improve the performance across various backbone structures consistently.

The main contributions of this study can be summarized as follows:

  • 1.

    We develop an automated preprocessing workflow to efficiently handle unlabeled data. The preprocessing provides the vessel enhanced images and coarse vessel segmentations that are used for the pretraining procedure.

  • 2.

    We propose a simple yet effective pretraining strategy by using the preprocessed TOF-MRA data. This pretraining task contains regression learning, coarse label segmentation, and max intensity projection (MIP) consistency learning. The results show that the proposed method shows great superiority over existing models.

  • 3.

    We constructed a large unlabeled dataset (1510 volumetric data) and three manual-annotated datasets (a total of 113 volumetric data). These data are expected to advance semi-supervised, weakly supervised, and self-supervised learning methods in cerebrovascular segmentation.

Refer to caption
Fig. 2: (a) The illustration of our proposed preprocessing workflow. (b) The overall framework of our proposed pretraining procedure with multi-task learning.

2 Related Work

2.1 Cerebrovascular Segmentation

Traditional methods for cerebrovascular segmentation typically rely on the continuity of grayscale variations in images to perform threshold-based segmentation Otsu (1979); Frangi et al. (1998). In addition, statistical modeling-based approaches utilize the concept that different brain tissues exhibit distinct pixel grayscale distributions during imaging Duan et al. (2019); Alderliesten et al. (2006). This has been successful in the segmentation of cerebrovascular structures in TOF-MRA images. In recent years, deep learning methods have shown remarkable performance in vessel segmentation tasks Sanchesa et al. (2019); Sichtermann et al. (2019); Qi et al. (2023). For instance, Livne et al. utilized a UNet-based model to segment vessels in patients with cerebrovascular disease Livne et al. (2019). Chen et al. proposed an attention-assisted generative adversarial network (A-SegAN) for automatic cerebrovascular segmentation Chen et al. (2022b). More related research can be referred to this review Chen et al. (2023).

2.2 SemiSL and SSL Methods for Medical Images

Numerous SemiSL and SSL methods have been developed to utilize unlabeled medical data and improve model performance Wang et al. (2023); Yu et al. (2023); Shi et al. (2023); Liao et al. (2023). For instance, Chen et al. proposed a generative consistency for the semi-supervised (GCS) model, which calculates the consistency of perturbed data to enhance cerebrovascular segmentation performance Chen et al. (2022a). Moreover, SSL-based methods have also demonstrated the effectiveness of pretraining in medical imaging research. To illustrate, Tang et al. introduced a multitask pretraining strategy using a 3D Swin transformer, achieving significant success in multiple organ segmentation tasks. For more detailed information on the topic, comprehensive reviews Zhang et al. (2023); Jiao et al. (2022); Krishnan et al. (2022) can be referred to.

3 Methods

In this section, we introduce the proposed FFPN model. The preprocessing workflow and pretraining procedure are shown in Fig.2 a and b, respectively. First, we preprocess the unlabeled TOF-MRA data to obtain the vessel enhanced image (VEI) and coarse vessel segmentation (CVS). Subsequently, during the pretraining procedure, the VEI and CVS are utilized as regression and segmentation targets. Further details regarding this process will be elaborated in the upcoming sections.

3.1 Preprocessing Method for Unlabeled Data

The pretraining process consists of three steps: image size cropping, vessel-structure enhancement, and coarse segmentation.

3.1.1 Image size cropping:

The TOF-MRA images often contain a significant number of background pixels, which have no relevance to the vessel segmentation task. Removing these background pixels provides a key advantage by reducing computational costs, especially when dealing with a large-scale unlabeled dataset. Previous approaches have utilized non-zero masks to crop raw images Isensee et al. (2021). However, background pixels don’t always possess intensity values of zero. In this study, we propose a novel method to calculate the cropping mask based on the variation of pixel intensities along the Z-axis (slice direction).

For a given single image, denoted as Iisubscript𝐼𝑖I_{i}italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with dimensions (H,W,D)𝐻𝑊𝐷(H,W,D)( italic_H , italic_W , italic_D ), we compute three projection images along the Z-axis: Average Intensity Projection (AIP), Maximum Intensity Projection (MIP), and Intensity Variation Map (IVM). The calculations are as follows:

AIPi=Average(Ii)𝐴𝐼subscript𝑃𝑖Averagesubscript𝐼𝑖\displaystyle AIP_{i}=\mathrm{Average}(I_{i})italic_A italic_I italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_Average ( italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (1)
MIPi=Maximum(Ii)𝑀𝐼subscript𝑃𝑖Maximumsubscript𝐼𝑖\displaystyle MIP_{i}=\mathrm{Maximum}(I_{i})italic_M italic_I italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_Maximum ( italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
IVMi=STD(Ii)𝐼𝑉subscript𝑀𝑖STDsubscript𝐼𝑖\displaystyle IVM_{i}=\mathrm{STD}(I_{i})italic_I italic_V italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_STD ( italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )

To create the cropping mask, we employ threshold segmentation. Taking MIP as an example, we generate a mask by retaining only the positions of the top percentage (e.g., 35%) of pixels in the MIP image. The resulting masks, namely the AIP mask, MIP mask, and IVM mask, are then merged using a bitwise AND operation. Additionally, small regions containing fewer than 200 pixels are removed. Finally, we obtain a square cropping mask that is applied to each slice of Iisubscript𝐼𝑖I_{i}italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

In summary, our proposed approach enables the generation of a cropping mask based on the intensity variations along the Z-axis in TOF-MRA images. By applying this mask, background pixels are effectively eliminated, resulting in a significant reduction in computational costs.

3.1.2 Frangi fitering:

For 3D TOF-MRA images, the Frangi filtering involves several steps to enhance the visibility of blood vessels. First, it computes the second-order mixed partial derivatives for each pixel using the Hessian matrix. The Hessian matrix is defined as:

H=[2Ix22Ixy2Ixz2Iyx2Iy22Iyz2Izx2Izy2Iz2]𝐻matrixsuperscript2𝐼superscript𝑥2superscript2𝐼𝑥𝑦superscript2𝐼𝑥𝑧superscript2𝐼𝑦𝑥superscript2𝐼superscript𝑦2superscript2𝐼𝑦𝑧superscript2𝐼𝑧𝑥superscript2𝐼𝑧𝑦superscript2𝐼superscript𝑧2H=\begin{bmatrix}\frac{\partial^{2}I}{\partial x^{2}}&\frac{\partial^{2}I}{% \partial x\partial y}&\frac{\partial^{2}I}{\partial x\partial z}\\ \frac{\partial^{2}I}{\partial y\partial x}&\frac{\partial^{2}I}{\partial y^{2}% }&\frac{\partial^{2}I}{\partial y\partial z}\\ \frac{\partial^{2}I}{\partial z\partial x}&\frac{\partial^{2}I}{\partial z% \partial y}&\frac{\partial^{2}I}{\partial z^{2}}\\ \end{bmatrix}italic_H = [ start_ARG start_ROW start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I end_ARG start_ARG ∂ italic_x ∂ italic_y end_ARG end_CELL start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I end_ARG start_ARG ∂ italic_x ∂ italic_z end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I end_ARG start_ARG ∂ italic_y ∂ italic_x end_ARG end_CELL start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I end_ARG start_ARG ∂ italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I end_ARG start_ARG ∂ italic_y ∂ italic_z end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I end_ARG start_ARG ∂ italic_z ∂ italic_x end_ARG end_CELL start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I end_ARG start_ARG ∂ italic_z ∂ italic_y end_ARG end_CELL start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I end_ARG start_ARG ∂ italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW end_ARG ] (2)

where 2Ix2superscript2𝐼superscript𝑥2\frac{\partial^{2}I}{\partial x^{2}}divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG represents the second partial derivative of the image intensity I𝐼Iitalic_I with respect to the x𝑥xitalic_x coordinate, and similarly for the other partial derivatives. Next, the eigenvalues of the Hessian matrix, denoted as λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and λ3subscript𝜆3\lambda_{3}italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, are calculated. These eigenvalues provide information about the local structure of the image. Specifically, the magnitude and orientation of the eigenvalues help identify pixels that belong to blood vessels.

In general, larger positive eigenvalues correspond to the central region of vessels, smaller positive eigenvalues represent the vessel edges, while negative eigenvalues typically indicate irrelevant structures and backgrounds. By analyzing the eigenvalues, pixels with vascular features are selected, and the VEI is obtained at this step.

3.1.3 Coarse segmentation:

To obtain a coarse segmentation of the blood vessels, a thresholding method and connected component analysis are applied. First, a vessel mask is created by retaining only the positions of the top percentage (e.g., 5%) of pixels for VEIs. This thresholding step helps separate the vessel pixels from the background and other structures. Then, connected component analysis is performed on the vessel mask. The largest k𝑘kitalic_k connected regions, where k𝑘kitalic_k could be empirically chosen between 3-5, are retained as the coarse segmentation result of blood vessels. By combining the Frangi filtering, thresholding, and connected component analysis, the CVS is obtained, which provides an initial approximation of the blood vessel structure in the 3D TOF-MRA images.

Datasets TubeTK-42 Brains IXI-45 EDEN
Method Dice clDice HD95 Dice clDice HD95 Dice clDice HD95 Dice clDice HD95
UNet (MICCAI’15) 72.98 81.33 9.09 73.77 75.86 3.72 79.82 75.23 10.25 79.79 80.62 9.89
DTC (AAAI’21) 69.31 76.35 25.60 64.36 64.86 34.72 70.44 63.48 63.31 71.65 65.65 91.93
GCS (TMI’22) 18.11 33.40 22.50 10.37 9.28 24.92 25.05 21.32 55.10 61.36 67.85 28.42
GBDL (CVPR’22) 36.40 55.67 16.16 39.79 52.84 63.71 12.46 20.37 42.43 51.78 64.48 11.41
SSNet (MICCAI’22) 65.15 71.84 28.65 64.22 63.88 35.04 73.55 65.14 52.05 69.72 64.26 96.65
BCP (CVPR’23) 73.36 81.95 8.33 77.25 80.31 2.88 81.37 80.64 6.17 86.44 89.41 2.71
PCRL (ICCV’21) 72.90 81.92 7.82 75.94 79.09 2.74 83.16 83.69 4.05 86.15 88.71 2.91
MAE (CVPR’22) 72.39 79.37 11.82 77.56 79.70 3.49 83.21 80.14 10.04 84.62 85.90 6.73
Swin UNETR (CVPR’22) 74.66 82.18 9.71 77.76 80.40 3.30 80.64 79.01 6.37 81.23 81.87 12.90
UniMiSS (ECCV’22) 70.54 82.59 9.03 77.63 79.62 3.03 85.45 83.49 3.83 88.02 88.51 2.91
GVSL (CVPR’23) 74.27 82.57 8.30 77.87 79.68 3.16 85.92 84.82 3.35 85.15 89.39 2.61
FFPN (Ours) 77.53 85.71 6.69 80.47 83.28 2.19 87.24 86.25 2.75 90.75 91.53 1.24
Preprocessing 29.64 31.27 71.29 31.73 37.01 38.55 43.18 55.50 38.56 68.48 69.43 46.18
Table 1: The cerebrovascular segmentation results on TubeTK-42, Brains, IXI-45 and EDEN datasets

3.2 Pretraining task and Loss Function

Our proposed pretraining process contains three tasks: regression learning, coarse vessel segmentation learning, and consistency learning in the MIP projection. Let function f()𝑓f(\cdot)italic_f ( ⋅ ) be the backbone neural network, and Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are the VEI and CVS for subject i𝑖iitalic_i. f()𝑓f(\cdot)italic_f ( ⋅ ) is used to encode the raw images, and one regression header Rgn()𝑅𝑔𝑛Rgn(\cdot)italic_R italic_g italic_n ( ⋅ )and Seg()𝑆𝑒𝑔Seg(\cdot)italic_S italic_e italic_g ( ⋅ )segmentation header are used to predict Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, respectively. It can be formulated as:

V^i=Rgn(f(Ii))subscript^𝑉𝑖𝑅𝑔𝑛𝑓subscript𝐼𝑖\displaystyle\hat{V}_{i}=Rgn(f(I_{i}))over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_R italic_g italic_n ( italic_f ( italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) (3)
S^i=Seg(f(Ii))subscript^𝑆𝑖𝑆𝑒𝑔𝑓subscript𝐼𝑖\displaystyle\hat{S}_{i}=Seg(f(I_{i}))over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_S italic_e italic_g ( italic_f ( italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) )

where Rgn()𝑅𝑔𝑛Rgn(\cdot)italic_R italic_g italic_n ( ⋅ )and Seg()𝑆𝑒𝑔Seg(\cdot)italic_S italic_e italic_g ( ⋅ ) are convolution operation with kernel size 1. 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is used to compute regression loss. For segmentation loss, Dice loss and Binary Cross Entropy are used. These can be formulated as:

rgn=V^iVi1subscript𝑟𝑔𝑛subscriptnormsubscript^𝑉𝑖subscript𝑉𝑖1\mathcal{L}_{rgn}=\|\hat{V}_{i}-V_{i}\|_{1}caligraphic_L start_POSTSUBSCRIPT italic_r italic_g italic_n end_POSTSUBSCRIPT = ∥ over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (4)
seg=Diceloss(S^iSi)+BCEloss(S^iSi)subscript𝑠𝑒𝑔Dicelosssubscript^𝑆𝑖subscript𝑆𝑖BCElosssubscript^𝑆𝑖subscript𝑆𝑖\mathcal{L}_{seg}=\mathrm{Diceloss}(\hat{S}_{i}-S_{i})+\mathrm{BCEloss}(\hat{S% }_{i}-S_{i})caligraphic_L start_POSTSUBSCRIPT italic_s italic_e italic_g end_POSTSUBSCRIPT = roman_Diceloss ( over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + roman_BCEloss ( over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (5)

In this study, we do not conduct skull stripping for the preprocessed data. In such case, the Frangi filtering will also enhance the skull pixels, while the CVS will not contain the skull pixels after connected component analysis. In order to focus more on learning the vascular structures, we incorporate an extra consistency loss by calculating the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT loss between S^iV^idirect-productsubscript^𝑆𝑖subscript^𝑉𝑖\hat{S}_{i}\odot\hat{V}_{i}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊙ over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and S^iIidirect-productsubscript^𝑆𝑖subscript𝐼𝑖\hat{S}_{i}\odot I_{i}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊙ italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT after applying maximum intensity projection along Z-axis. It can be formulated as:

MIPVS=Maximum(S^iV^i)subscriptMIP𝑉𝑆Maximumdirect-productsubscript^𝑆𝑖subscript^𝑉𝑖\displaystyle\mathrm{MIP}_{VS}=\mathrm{Maximum}(\hat{S}_{i}\odot\hat{V}_{i})roman_MIP start_POSTSUBSCRIPT italic_V italic_S end_POSTSUBSCRIPT = roman_Maximum ( over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊙ over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (6)
MIPIS=Maximum(S^iI^i)subscriptMIP𝐼𝑆Maximumdirect-productsubscript^𝑆𝑖subscript^𝐼𝑖\displaystyle\mathrm{MIP}_{IS}=\mathrm{Maximum}(\hat{S}_{i}\odot\hat{I}_{i})roman_MIP start_POSTSUBSCRIPT italic_I italic_S end_POSTSUBSCRIPT = roman_Maximum ( over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊙ over^ start_ARG italic_I end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
consistency=MIPVSMIPIS1subscript𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦subscriptnormsubscriptMIP𝑉𝑆subscriptMIP𝐼𝑆1\displaystyle\mathcal{L}_{consistency}=\|\mathrm{MIP}_{VS}-\mathrm{MIP}_{IS}\|% _{1}caligraphic_L start_POSTSUBSCRIPT italic_c italic_o italic_n italic_s italic_i italic_s italic_t italic_e italic_n italic_c italic_y end_POSTSUBSCRIPT = ∥ roman_MIP start_POSTSUBSCRIPT italic_V italic_S end_POSTSUBSCRIPT - roman_MIP start_POSTSUBSCRIPT italic_I italic_S end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

where S^iV^idirect-productsubscript^𝑆𝑖subscript^𝑉𝑖\hat{S}_{i}\odot\hat{V}_{i}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊙ over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and S^iIidirect-productsubscript^𝑆𝑖subscript𝐼𝑖\hat{S}_{i}\odot I_{i}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊙ italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT have almost removed the skull pixels. Meanwhile, the MIP images of TOF-MRA are commonly used for the initial assessment of vascular morphology. The consistency loss will help the model focus more on learning the vascular structures.

Finally, the total loss function \mathcal{L}caligraphic_L will be:

=γ1rgn+γ2seg+γ3consistencysubscript𝛾1subscript𝑟𝑔𝑛subscript𝛾2subscript𝑠𝑒𝑔subscript𝛾3subscript𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦\mathcal{L}=\gamma_{1}\mathcal{L}_{rgn}+\gamma_{2}\mathcal{L}_{seg}+\gamma_{3}% \mathcal{L}_{consistency}caligraphic_L = italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_r italic_g italic_n end_POSTSUBSCRIPT + italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_s italic_e italic_g end_POSTSUBSCRIPT + italic_γ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_c italic_o italic_n italic_s italic_i italic_s italic_t italic_e italic_n italic_c italic_y end_POSTSUBSCRIPT (7)

where γ1,γ2subscript𝛾1subscript𝛾2\gamma_{1},\gamma_{2}italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and γ3subscript𝛾3\gamma_{3}italic_γ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are the loss weight. We empirically set γ1=0.4,γ2=0.4formulae-sequencesubscript𝛾10.4subscript𝛾20.4\gamma_{1}=0.4,\gamma_{2}=0.4italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.4 , italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.4 and γ3=0.2subscript𝛾30.2\gamma_{3}=0.2italic_γ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0.2 in this study.

Refer to caption
Fig. 3: Statistical validation on Brains dataset by using the T-test. ’*’ indicates p<0.05𝑝0.05p<0.05italic_p < 0.05 and ’**’ indicates p<0.001𝑝0.001p<0.001italic_p < 0.001.

3.3 Backbone

We select the Swin UNETR Hatamizadeh et al. (2021) as the backbone in this study. Different from the vanilla model, successive convolutional layers are used in the initial embedding layers Zhou et al. (2021a). This setting enhances the capture of spatial location information, a feature that is potentially beneficial to the segmentation of cerebrovascular structures with sparse distribution. Please note that we also examined the impact of our pretraining strategy on other backbone structures in the result section.

Refer to caption
Fig. 4: The visualization comparison results of our method and the competing baseline models from the 3D view.

4 Experimental Setup

4.1 Datasets

Unlabeled Datasets for Pretraining: We collect a large unlabeled TOF-MRA consisting of 1510 subjects from 5 public datasets, including IXI (525 subjects), OASIS3 (525 subjects) LaMontagne et al. (2019), BrainAneurysm (280 subjects) Di Noto et al. (2023), ADAM (113 subjects) and TubeTK (67 subjects). We preprocessed these data by our proposed workflow and the model is pretrained through the proposed tasks.

Labeled Datasets for Downstream Evaluation: We evaluate our proposed method in four labeled datasets.

  • 1.

    TubeTK-42: The full TubeTK dataset contains 109 subjects, with 67 unlabeled subjects used for pretraining and 42 labeled subjects used for finetune validation. The image size is (448, 448, 128) and the spacing is (0.5134, 0.5134, 0.8000).

  • 2.

    Brains: This dataset contains 56 subjects. The size of the images is not identical, with average size of (286, 320, 204). The spacing is (0.6188, 0.6188, 0.6200).

  • 3.

    IXI-45: This dataset is provided by Chen et al. (2022b). It contains 45 subjects from the whole IXI dataset (570 subjects). The image size is (1024, 1024, 92). The provided images have been processed and therefore do not contain spacing information. The original spacing of the IXI dataset is (0.2637, 0.2637, 0.8000).

  • 4.

    EDEN: This dataset contains 15 subjects. Unlike the previous dataset, we also provide the blood vessels on the skull in this dataset. The image size is (672, 672, 210) and the spacing is (0.2976, 0.2976, 0.4500).

Please note that the pretraining dataset and evaluation dataset are completely isolated. For the labeled datasets, 80% subjects are used for training, and the rest data are used for testing. Detailed information about the dataset can be found in the Appendix.

4.2 Evaluation Metrics

Three metrics are used to assess the cerebrovascular segmentation performance, including the Dice similarity coefficient (Dice), Hausdorff Distance 95% (HD95) and centerline Dice coefficient (clDice) Shit et al. (2021). The Dice and HD95 metrics primarily focus on the overlap degree and shape differences in segmentation results, clDice takes into account the topological structure of the segmented regions, making it more suitable for evaluating tubular structures.

4.3 Baseline Models and Implementation

We compare the performance of 3D cerebrovascular segmentation between the proposed FFPN model and existing methods using unlabeled data, including the SemiSL methods BCP Bai et al. (2023), GBDL Wang and Lukasiewicz (2022), GCS Chen et al. (2022a), SS-Net Wu et al. (2022), DTC Luo et al. (2021) and the SSL methods MAE He et al. (2022), PCRL Zhou et al. (2021b), UniMiSS Xie et al. (2022), Swin UNETR Tang et al. (2022), GVSL He et al. (2023). Besides, we also include the basic UNet Ronneberger et al. (2015) model that does not utilize the unlabeled data to provide a comparison.

For our proposed FFPN model, the encoder has 4 stages which comprise of [2, 2, 6, 2] transformer blocks at each stage. The initial patch size is 2 and the feature size is 24. In pretraining task learning, the model is trained using AdamW optimizer for 200k iterations. Warm-up cosine scheduler of 20k iterations is used. The cropping patch size is (160, 160, 64) and the batch size is 2. The weight decay is 3×1053superscript1053\times 10^{-5}3 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT and learning rate is 5×1035superscript1035\times 10^{-3}5 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT. In finetune procedure, the model is trained using AdamW optimizer for 5k iterations. The learning rate is 5×1035superscript1035\times 10^{-3}5 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT (half the learning rate for the encoder) and the cropping patch size is (192, 192, 96).

5 Experimental Results

5.1 Comparison with state-of-the-art Methods

The cerebrovascular segmentation results for the four datasets are shown in Table 1. The proposed method achieves the best performance compared with other baseline models on the four datasets in terms of the three metrics. In TubeTK-42 and Brains datasets, the clDice scores of our proposed method are 85.71 and 83.28, with an improvement of approximately 3% over the second model. In IXI-45 and EDEN datasets, our proposed method also achieves an improvement of around 2% compared with the second-best model.

To further demonstrate the superiority of FFPN, a statistical analysis is shown in Fig. 3. Specifically, We conduct another 5-fold experiment on the largest dataset (Brains) using FFPN and baseline models with competitive performance (GVSL, Swin UNETR and PCRL). The results show that the improvement of FFPN is statistically significant (T-test) among the three metrics (the outcome for the metric clDice aligning closely with that of Dice).

We provide a 3D visualization result to support an intuitive evaluation (see Fig. 4). Compared with other methods, the segmentation produced by FFPN reveals marked superiority in achieving continuity for the vessel structure, particularly noticeable at locations 1, 2, 3 and 4. Furthermore, it is worth highlighting that the FFPN-enabled segmentation leads to a reduction in false positive points (e.g., locations 5 and 6).

Refer to caption
Fig. 5: Variation of the effect of the pretraining strategy with the labeled data percentage.
Loss Function Dice clDice HD95
Scratch 72.08 80.16 11.46
rgnsubscript𝑟𝑔𝑛\mathcal{L}_{rgn}caligraphic_L start_POSTSUBSCRIPT italic_r italic_g italic_n end_POSTSUBSCRIPT 72.34 81.11 9.62
segsubscript𝑠𝑒𝑔\mathcal{L}_{seg}caligraphic_L start_POSTSUBSCRIPT italic_s italic_e italic_g end_POSTSUBSCRIPT 73.46 81.58 9.10
consistencysubscript𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦\mathcal{L}_{consistency}caligraphic_L start_POSTSUBSCRIPT italic_c italic_o italic_n italic_s italic_i italic_s italic_t italic_e italic_n italic_c italic_y end_POSTSUBSCRIPT 72.15 80.78 10.13
rgn+segsubscript𝑟𝑔𝑛subscript𝑠𝑒𝑔\mathcal{L}_{rgn}+\mathcal{L}_{seg}caligraphic_L start_POSTSUBSCRIPT italic_r italic_g italic_n end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_s italic_e italic_g end_POSTSUBSCRIPT 74.18 82.17 8.14
rgn+seg+consistencysubscript𝑟𝑔𝑛subscript𝑠𝑒𝑔subscript𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦\mathcal{L}_{rgn}+\mathcal{L}_{seg}+\mathcal{L}_{consistency}caligraphic_L start_POSTSUBSCRIPT italic_r italic_g italic_n end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_s italic_e italic_g end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_c italic_o italic_n italic_s italic_i italic_s italic_t italic_e italic_n italic_c italic_y end_POSTSUBSCRIPT 74.64 82.94 7.57
Table 2: The effectiveness of different loss functions.

5.2 Influence of the Percentage of Manual Labeled Data

We then conduct a study to evaluate the effectiveness of our pretraining method using a reduced amount of manually labeled data. The results are presented in Fig. 5.

It is observed that pretraining consistently improves the performance of the model across different proportions of labeled data. When using all available labeled data, our pretraining method achieves a 1.03% higher Dice score for the IXI-45 dataset and 0.83% higher Dice score for the TubeTK-42 dataset compared to a model trained from scratch. Additionally, with less than half of the labeled data, the pretrained model demonstrates an improvement of 4.39% for the IXI-45 dataset and 2.56% for the TubeTK-42 dataset. Furthermore, as the availability of labeled images decreases, the performance gap between the models trained with and without pretraining increases.

5.3 Effectiveness of Pretraining Loss Function

In this section, we evaluate the effectiveness of the proposed three loss functions. Specifically, The model is pretrained using different loss functions and then their effectiveness is assessed during the finetuning procedure. We choose the TubeTK-42 dataset with 45.45% labeled data (i.e., 15 subjects) as an example, and the results are presented in Table 2.

For single-task pretraining, the segmentation learning achieves the best performance with metrics Dice, clDice, and HD95 values of 73.46, 81.58, and 9.10, respectively. The regression learning also shows similar performance, while the consistency learning does not significantly improve the model’s performance. Moreover, when combining the regression and segmentation learning loss functions, the model’s performance is further improved. Taking into account all three learning tasks, the model attains the highest performance.

We also present a visualization result from the slice view in Fig. 6. Incorporating pretraining with segsubscript𝑠𝑒𝑔\mathcal{L}_{seg}caligraphic_L start_POSTSUBSCRIPT italic_s italic_e italic_g end_POSTSUBSCRIPT aids in reducing the occurrence of false positive points in the generated output (e.g., locations 2 and 3). However, it is worth noting that due to the coarse segmentation results potentially retaining some information from the skull, there may still be instances where certain structures on the skull are annotated (e.g., location 4). Overall, by incorporating all pretraining tasks, the model achieves improved segmentation, characterized by enhanced continuity of vessels (see location 1) and a reduced presence of false positive points.

Refer to caption
Fig. 6: Illustration of the effectiveness of pretraining loss functions from the slice view. The number indicates the clDice metric.
Loaded component Dice clDice HD95
Scratch 80.79 78.08 8.41
Encoder 81.71 79.90 12.08
Decoder 81.14 76.22 11.19
Encoder + Decoder 83.77 81.57 6.59
All 85.19 83.09 4.43
Table 3: The effectiveness of different model component pretraining parameters loaded.

5.4 Influence of Model Components for Pretraining

Different from previous SSL methods that typically only load the encoder parameters for downstream tasks, our proposed approach loads both the encoder and decoder parameters, as well as the segmentation header. In this section, we explore the impact of loading different model components. We take the IXI-45 dataset with 41.67% labeled data (i.e., 15 subjects) as an example, and the results are presented in Table 3.

When only the encoder is loaded, there is a notable improvement in performance, with the clDice score increasing to 79.90. Besides, when only the decoder is loaded, there is a drop in performance compared to the encoder-loaded model, as indicated by the clDice score of 76.22. Furthermore, when both the encoder and decoder are loaded, there is a substantial boost in performance. The clDice score increases to 81.57. When all components, including the segmentation header, we observe the highest performance across all metrics.

Dataset EDEN
Backbone Scratch Finetune Improvement
UNet 79.79 84.26 +4.47%
AttentionUNet 78.44 88.20 +9.76%
UNETR 83.32 88.06 +4.74%
Swin UNETR 88.87 90.75 +1.88%
Dataset IXI-45
Backbone Scratch Finetune Improvement
UNet 79.82 85.24 +5.41%
AttentionUNet 81.52 84.62 +3.10%
UNETR 80.01 83.85 +3.84%
Swin UNETR 86.21 87.24 +1.03%
Table 4: The effectiveness of our proposed pretraining method on various backbone structures. The number indicates the Dice scores.

5.5 The Effectiveness on Different Backbone Structures

In this section, we assess the effectiveness of our proposed pretraining learning on different backbone structures. We focus on four fundamental medical segmentation backbones, which encompass both convolutional neural network (CNN)-based models (UNet Ronneberger et al. (2015) and AttentionUNet Oktay et al. (2018)) and Transformer-based models (UNETR Hatamizadeh et al. (2022) and Swin UNETR Hatamizadeh et al. (2021)). The corresponding results are summarized in Table 4.

For the EDEN dataset, we observe improvements when using our proposed pretraining learning compared to training from scratch across all backbone architectures. Specifically, for the UNet backbone, the Dice score increases from 79.79 to 84.26, resulting in a significant improvement of 4.47%. Similarly, the AttentionUNet and UNETR backbones show improvements of 9.76% and 4.74%, respectively. Even for the advanced Swin UNETR backbone, there is still a notable 1.88% improvement.

In the case of the IXI-45 dataset, we observe similar trends. Our proposed pretraining learning consistently leads to improvements compared to training from scratch. The UNet backbone shows an improvement of 5.41%, while the AttentionUNet, UNETR, and Swin UNETR backbones show improvements of 3.10%, 3.84%, and 1.03%, respectively.

5.6 The Robustness of Our Pretraining Method Regarding the Unlabeled Dataset Heterogeneity

We carried out this experiment to assess the robustness of our proposed method in relation to multi-source heterogeneous pretraining data, using the IXI dataset as an illustrative case.

In particular, we maintain the quantity of unlabeled data constant (525 subjects) while varying the proportion of the unlabeled IXI dataset used. Then we test the model performance on the labeled IXI-45 dataset. The best-performing semiSL method BCP is used for comparison and The outcomes are depicted in Figure 7. From these results, it is observed that the performance of the semi-supervised learning method, BCP, significantly diminishes as the number of homogeneous data decreases. Conversely, our proposed method consistently shows a stable and high segmentation performance. This result demonstrates that the proposed pretraining strategy may maintain a high level of robustness.

Refer to caption
Fig. 7: The robustness of our proposed method. Horizontal coordinates indicate the composition of unlabeled data..

6 Discussion

To unleash the potential of unlabeled TOF-MRA data for 3D cerebrovascular segmentation, our approach incorporates Frangi filtering as a key element in the pretraining phase. The proposed FFPN model demonstrates enhanced robustness and superior segmentation performance, notably outshining other SemiSL and SSL methodologies.

Our proposed cropping strategy can reduce the computation cost when processing such a large 3D MRA dataset. We take IXI dataset as an example to show the effectiveness of the proposed cropping strategy. We compute the cropping rate (CR) for each subject as the following formula:

CR=HWDHWDHWDCR𝐻𝑊𝐷superscript𝐻superscript𝑊superscript𝐷𝐻𝑊𝐷\mathrm{CR}=\frac{H\cdot W\cdot D-H^{{}^{\prime}}\cdot W^{{}^{\prime}}\cdot D^% {{}^{\prime}}}{H\cdot W\cdot D}roman_CR = divide start_ARG italic_H ⋅ italic_W ⋅ italic_D - italic_H start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ⋅ italic_W start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ⋅ italic_D start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_H ⋅ italic_W ⋅ italic_D end_ARG (8)

where (H,W,D)𝐻𝑊𝐷(H,W,D)( italic_H , italic_W , italic_D ) is the original image size and (HWD)superscript𝐻superscript𝑊superscript𝐷(H^{{}^{\prime}}\cdot W^{{}^{\prime}}\cdot D^{{}^{\prime}})( italic_H start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ⋅ italic_W start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ⋅ italic_D start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ) is the image size after cropping. Higher CR represents more background pixels being removed. The CR of the IXI dataset is shown in Fig. 8. The average CR is 42.99%, and 80% of the samples subtracted nearly 40% (38.49%) of the background points. Besides, two representative data are shown in Fig. 9. The strategy of non-zero cropping is widely employed in the analysis of 3D medical images. Yet, the presence of noise in MRI scans undermines the effectiveness of this approach, leading to suboptimal image cropping. Our proposed methodology addresses this issue by meticulously eliminating the maximum amount of background pixels. This approach gains importance in scenarios involving extensive 3D datasets, ensuring a more precise and efficient analysis.

Our proposed method is not limited by the backbone structure and has relatively high generalizability. More importantly, the proposed pretraining strategy may reduce the performance gap among the various backbone models. It is observed from Table 4 that the performance gap between Swin UNETR and UNet is around 6.5 before pretraining, while it becomes approximately 2 after using the proposed pretraining strategy. This indicates that the lightweight model (e.g., vanilla UNet) can also exhibit competitive performance after employing the proposed pretraining method, which holds significant implications for the use of lightweight networks in clinical settings.

One limitation of this study is that our method may not effectively use the other modal data that can not fully reflect the vessel morphology (e.g., T1). The proposed pretraining method relies on the preprocessing workflow for cerebrovascular extraction, while other modal data might not include any vessel-related information. Any modality of data, such as computed tomographic angiography, digital subtraction angiography, and optical coherence tomography, capable of capturing vessel morphology, can derive advantages from the proposed method. Besides, our pretraining method can serve as an auxiliary training approach for a SSL model trained using other modal data or general large vision models, and help to construct a domain-specific model for cerebrovascular segmentation.

7 Conclusion

In this study, we introduce a Frangi filtering-based pretraining network to effectively leverage the unlabeled TOF-MRA data. Our method capitalizes on the prior knowledge of tubular structures by incorporating it into the pretraining tasks, and shows great superiority over other SemiSL and SSL methods on four cerebrovascular segmentation datasets. The ablation studies also show that our proposed method significantly improves model performance, particularly with a limited number of labeled data. Besides, our method is not restricted by model architecture and can enhance the performance of various backbone structures.

Refer to caption
Fig. 8: The cropping rate in IXI dataset.
Refer to caption
Fig. 9: Three representative subjects for the cropping strategy. The number indicates the image size.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant: 62027901, 81930053, 81227901; Beijing Natural Science Foundation: JQ22023; CAS Youth Innovation Promotion Association under Grant Y2022055. The authors would like to acknowledge the instrumental and technical support of Multimodal Biomedical Imaging Experimental Platform, Institute of Automation, Chinese Academy of Sciences.

References

  • Alderliesten et al. (2006) Alderliesten, T., Bosman, P.A., Niessen, W.J., 2006. Towards a real-time minimally-invasive vascular intervention simulation system. IEEE Transactions on Medical Imaging 26, 128–132.
  • Asadian et al. (2022) Asadian, A., Weidner, E., Jiang, L., 2022. Self-supervised pretraining for differentially private learning. arXiv preprint arXiv:2206.07125 .
  • Bai et al. (2023) Bai, Y., Chen, D., Li, Q., Shen, W., Wang, Y., 2023. Bidirectional copy-paste for semi-supervised medical image segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11514–11524.
  • Chen et al. (2022a) Chen, C., Zhou, K., Wang, Z., Xiao, R., 2022a. Generative consistency for semi-supervised cerebrovascular segmentation from tof-mra. IEEE Transactions on Medical Imaging 42, 346–353.
  • Chen et al. (2023) Chen, C., Zhou, K., Wang, Z., Zhang, Q., Xiao, R., 2023. All answers are in the images: A review of deep learning for cerebrovascular segmentation. Computerized Medical Imaging and Graphics , 102229.
  • Chen et al. (2019) Chen, K., Yao, L., Zhang, D., Chang, X., Long, G., Wang, S., 2019. Distributionally robust semi-supervised learning for people-centric sensing, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3321–3328.
  • Chen et al. (2020) Chen, T., Kornblith, S., Norouzi, M., Hinton, G., 2020. A simple framework for contrastive learning of visual representations, in: International conference on machine learning, PMLR. pp. 1597–1607.
  • Chen et al. (2022b) Chen, Y., Jin, D., Guo, B., Bai, X., 2022b. Attention-assisted adversarial model for cerebrovascular segmentation in 3d tof-mra volumes. IEEE Transactions on Medical Imaging 41, 3520–3532.
  • Cheplygina et al. (2019) Cheplygina, V., de Bruijne, M., Pluim, J.P., 2019. Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Medical image analysis 54, 280–296.
  • Di Noto et al. (2023) Di Noto, T., Marie, G., Tourbier, S., Alemán-Gómez, Y., Esteban, O., Saliou, G., Cuadra, M.B., Hagmann, P., Richiardi, J., 2023. Towards automated brain aneurysm detection in tof-mra: Open data, weak labels, and anatomical knowledge. Neuroinformatics 21, 21–34.
  • Duan et al. (2019) Duan, H.H., Su, G.Q., Huang, Y.C., Song, L.T., Nie, S.D., 2019. Segmentation of pulmonary vascular tree by incorporating vessel enhancement filter and variational region-growing. Journal of X-ray science and technology 27, 343–360.
  • Ericsson et al. (2022) Ericsson, L., Gouk, H., Loy, C.C., Hospedales, T.M., 2022. Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Processing Magazine 39, 42–62. doi:10.1109/MSP.2021.3134634.
  • Frangi et al. (1998) Frangi, A.F., Niessen, W.J., Vincken, K.L., Viergever, M.A., 1998. Multiscale vessel enhancement filtering, in: Medical Image Computing and Computer-Assisted Intervention—MICCAI’98: First International Conference Cambridge, MA, USA, October 11–13, 1998 Proceedings 1, Springer. pp. 130–137.
  • Hassouna et al. (2006) Hassouna, M.S., Farag, A.A., Hushek, S., Moriarty, T., 2006. Cerebrovascular segmentation from tof using stochastic models. Medical image analysis 10, 2–18.
  • Hatamizadeh et al. (2021) Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H.R., Xu, D., 2021. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images, in: International MICCAI Brainlesion Workshop, Springer. pp. 272–284.
  • Hatamizadeh et al. (2022) Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B., Roth, H.R., Xu, D., 2022. Unetr: Transformers for 3d medical image segmentation, in: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 574–584.
  • He et al. (2022) He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R., 2022. Masked autoencoders are scalable vision learners, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16000–16009.
  • He et al. (2023) He, Y., Yang, G., Ge, R., Chen, Y., Coatrieux, J.L., Wang, B., Li, S., 2023. Geometric visual similarity learning in 3d medical image self-supervised pre-training, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9538–9547.
  • Huang et al. (2023) Huang, S.C., Pareek, A., Jensen, M., Lungren, M.P., Yeung, S., Chaudhari, A.S., 2023. Self-supervised learning for medical image classification: a systematic review and implementation guidelines. NPJ Digital Medicine 6, 74.
  • Isensee et al. (2021) Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H., 2021. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18, 203–211.
  • Jiao et al. (2022) Jiao, R., Zhang, Y., Ding, L., Cai, R., Zhang, J., 2022. Learning with limited annotations: a survey on deep semi-supervised learning for medical image segmentation. arXiv preprint arXiv:2207.14191 .
  • Krishnan et al. (2022) Krishnan, R., Rajpurkar, P., Topol, E.J., 2022. Self-supervised learning in medicine and healthcare. Nature Biomedical Engineering 6, 1346–1352.
  • LaMontagne et al. (2019) LaMontagne, P.J., Benzinger, T.L., Morris, J.C., Keefe, S., Hornbeck, R., Xiong, C., Grant, E., Hassenstab, J., Moulder, K., Vlassenko, A.G., et al., 2019. Oasis-3: longitudinal neuroimaging, clinical, and cognitive dataset for normal aging and alzheimer disease. MedRxiv , 2019–12.
  • Liao et al. (2023) Liao, W., Li, X., Wang, Q., Xu, Y., Yin, Z., Xiong, H., 2023. Cupre: Cross-domain unsupervised pre-training for few-shot cell segmentation. arXiv preprint arXiv:2310.03981 .
  • Liao et al. (2022) Liao, W., Xiong, H., Wang, Q., Mo, Y., Li, X., Liu, Y., Chen, Z., Huang, S., Dou, D., 2022. Muscle: Multi-task self-supervised continual learning to pre-train deep models for x-ray images of multiple body parts, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 151–161.
  • Livne et al. (2019) Livne, M., Rieger, J., Aydin, O.U., Taha, A.A., Akay, E.M., Kossen, T., Sobesky, J., Kelleher, J.D., Hildebrand, K., Frey, D., et al., 2019. A u-net deep learning framework for high performance vessel segmentation in patients with cerebrovascular disease. Frontiers in neuroscience 13, 97.
  • Luo et al. (2021) Luo, X., Chen, J., Song, T., Wang, G., 2021. Semi-supervised medical image segmentation through dual-task consistency, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8801–8809.
  • Ni et al. (2020) Ni, J., Wu, J., Wang, H., Tong, J., Chen, Z., Wong, K.K., Abbott, D., 2020. Global channel attention networks for intracranial vessel segmentation. Computers in biology and medicine 118, 103639.
  • Oktay et al. (2018) Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al., 2018. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 .
  • Otsu (1979) Otsu, N., 1979. A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics 9, 62–66.
  • Özsarlak et al. (2004) Özsarlak, Ö., Van Goethem, J.W., Maes, M., Parizel, P.M., 2004. Mr angiography of the intracranial vessels: technical aspects and clinical applications. Neuroradiology 46, 955–972.
  • Qi and Luo (2020) Qi, G.J., Luo, J., 2020. Small data challenges in big data era: A survey of recent progress on unsupervised and semi-supervised methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 2168–2187.
  • Qi et al. (2023) Qi, Y., He, Y., Qi, X., Zhang, Y., Yang, G., 2023. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. arXiv preprint arXiv:2307.08388 .
  • Ronneberger et al. (2015) Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, Springer. pp. 234–241.
  • Sanchesa et al. (2019) Sanchesa, P., Meyer, C., Vigon, V., Naegel, B., 2019. Cerebrovascular network segmentation of mra images with deep learning, in: 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019), IEEE. pp. 768–771.
  • Shi et al. (2023) Shi, G., Yin, L., An, Y., Li, G., Zhang, L., Bian, Z., Chen, Z., Zhang, H., Hui, H., Tian, J., 2023. Progressive pretraining network for 3d system matrix calibration in magnetic particle imaging. IEEE Transactions on Medical Imaging 42, 3639–3650. doi:10.1109/TMI.2023.3297173.
  • Shit et al. (2021) Shit, S., Paetzold, J.C., Sekuboyina, A., Ezhov, I., Unger, A., Zhylka, A., Pluim, J.P., Bauer, U., Menze, B.H., 2021. cldice-a novel topology-preserving loss function for tubular structure segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16560–16569.
  • Sichtermann et al. (2019) Sichtermann, T., Faron, A., Sijben, R., Teichert, N., Freiherr, J., Wiesmann, M., 2019. Deep learning–based detection of intracranial aneurysms in 3d tof-mra. American Journal of Neuroradiology 40, 25–32.
  • Tang et al. (2022) Tang, Y., Yang, D., Li, W., Roth, H.R., Landman, B., Xu, D., Nath, V., Hatamizadeh, A., 2022. Self-supervised pre-training of swin transformers for 3d medical image analysis, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20730–20740.
  • Wang and Lukasiewicz (2022) Wang, J., Lukasiewicz, T., 2022. Rethinking bayesian deep learning methods for semi-supervised volumetric medical image segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 182–190.
  • Wang et al. (2023) Wang, X., Tang, F., Chen, H., Cheung, C.Y., Heng, P.A., 2023. Deep semi-supervised multiple instance learning with self-correction for dme classification from oct images. Medical Image Analysis 83, 102673.
  • Wu et al. (2022) Wu, Y., Wu, Z., Wu, Q., Ge, Z., Cai, J., 2022. Exploring smoothness and class-separation for semi-supervised medical image segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 34–43.
  • Xia et al. (2022) Xia, L., Zhang, H., Wu, Y., Song, R., Ma, Y., Mou, L., Liu, J., Xie, Y., Ma, M., Zhao, Y., 2022. 3d vessel-like structure segmentation in medical images by an edge-reinforced network. Medical Image Analysis 82, 102581.
  • Xie et al. (2022) Xie, Y., Zhang, J., Xia, Y., Wu, Q., 2022. Unimiss: Universal medical self-supervised learning via breaking dimensionality barrier, in: European Conference on Computer Vision, Springer. pp. 558–575.
  • Yang et al. (2022) Yang, X., Song, Z., King, I., Xu, Z., 2022. A survey on deep semi-supervised learning. IEEE Transactions on Knowledge and Data Engineering .
  • Yu et al. (2023) Yu, K., Sun, L., Chen, J., Reynolds, M., Chaudhary, T., Batmanghelich, K., 2023. Drasclr: A self-supervised framework of learning disease-related and anatomy-specific representation for 3d lung ct images. Medical Image Analysis , 103062URL: https://www.sciencedirect.com/science/article/pii/S1361841523003225, doi:https://doi.org/10.1016/j.media.2023.103062.
  • Zhang et al. (2023) Zhang, C., Zheng, H., Gu, Y., 2023. Dive into the details of self-supervised learning for medical image analysis. Medical Image Analysis , 102879.
  • Zhou et al. (2021a) Zhou, H.Y., Guo, J., Zhang, Y., Yu, L., Wang, L., Yu, Y., 2021a. nnformer: Interleaved transformer for volumetric segmentation. arXiv preprint arXiv:2109.03201 .
  • Zhou et al. (2021b) Zhou, H.Y., Lu, C., Yang, S., Han, X., Yu, Y., 2021b. Preservational learning improves self-supervised medical image models by reconstructing diverse contexts, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3499–3509.

Supplementary Material

7.1 Pretraining Datasets

The pretraining dataset consists of 5 publicly accessible datasets:

  • 1.

    IXI111http://brain-development.org/ixi-dataset/: This dataset contains 570 subjects. 45 subjects with vessel masks Chen et al. (2022b) are used for evaluation dataset. The rest 525 subjects are used for pretraining. The image size of most data (497 subjects) is (512, 512, 100). The image size of 27 subjects is (1024, 1024, 92) and one subject is (1024, 1024, 91). The spacing is (0.4688, 0.4688, 0.8000).

  • 2.

    OASIS222https://www.oasis-brains.org: This dataset contains 525 subjects. The spacing and image size are not identical among the subjects. The average image size is (583, 765, 216). The spacing of most data (337 subjects) is (0.2995, 0.2995, 0.6000), and the average spacing is (0.2977, 0.2977, 0.5933).

  • 3.

    TubeTK333https://public.kitware.com/Wiki/TubeTK/Data: This dataset contains 109 subjects. 42 subjects with vessel masks are used for the evaluation dataset. The rest 67 subjects are used for pretraining. The image size is (448, 448, 128) and the spacing is (0.5134, 0.5134, 0.8000).

  • 4.

    ADAM444https://adam.isi.uu.nl/data/: This dataset contains 113 subjects. Each subject contains one original MRA-TOF image and one bias field corrected MRA-TOF image. Only the original data are used in this study. The average image size is (556, 556, 131). The spacing is various and the average spacing is (0.3524, 0.3524, 0.5447).

  • 5.

    BrainAneurysm555https://openneuro.org/datasets/ds003949/versions/1.0.1: This dataset contains 284 subjects, of which 127 are healthy controls and 157 are patients with brain aneurysms. Four subjects (sub-200, sub-235, sub-315 and sub-450) are removed due to the incomplete image data. The average spacing is (0.4021, 0.4021, 0.6613), and the average image size is (466, 546, 147).

7.2 Evaluation Datasets

The raw image data of the three labeled datasets are accessible at the following public websites:

We provide the voxel-wise vessel mask for the three datasets in this study.

7.3 The implementations of baseline models