Benefit from public unlabeled data: A Frangi filtering-based pretraining network for 3D cerebrovascular segmentation
Abstract
The precise cerebrovascular segmentation in time-of-flight magnetic resonance angiography (TOF-MRA) data is crucial for clinically computer-aided diagnosis. However, the sparse distribution of cerebrovascular structures in TOF-MRA results in an exceedingly high cost for manual data labeling. The use of unlabeled TOF-MRA data holds the potential to enhance model performance significantly. In this study, we construct the largest preprocessed unlabeled TOF-MRA datasets (1510 subjects) to date. We also provide three additional labeled datasets totaling 113 subjects. Furthermore, we propose a simple yet effective pertraining strategy based on Frangi filtering, known for enhancing vessel-like structures, to fully leverage the unlabeled data for 3D cerebrovascular segmentation. Specifically, we develop a Frangi filtering-based preprocessing workflow to handle the large-scale unlabeled dataset, and a multi-task pretraining strategy is proposed to effectively utilize the preprocessed data. By employing this approach, we maximize the knowledge gained from the unlabeled data. The pretrained model is evaluated on four cerebrovascular segmentation datasets. The results have demonstrated the superior performance of our model, with an improvement of approximately 3% compared to state-of-the-art semi- and self-supervised methods. Furthermore, the ablation studies also demonstrate the generalizability and effectiveness of the pretraining method regarding the backbone structures. The code and data have been open source at: https://github.com/shigen-StoneRoot/FFPN.
keywords:
MSC:
41A05, 41A10, 65D05, 65D17 \KWDKeyword1, Keyword2, Keyword31 Introduction
TOF-MRA is one non-invasive medical imaging technique to visualize cerebral blood vessels. It provides detailed images of the blood vessels in the brain, enabling early detection and treatment of potentially life-threatening conditions Özsarlak et al. (2004); Hassouna et al. (2006). The accurate vessel segmentation is a crucial preprocessing step in TOF-MRA image analysis, which provides surgeons with essential information about the location, size, and connectivity of blood vessels and facilitates surgical planning and intervention Ni et al. (2020).
Compared to the segmentation of other biological tissues, brain vessel segmentation based on TOF-MRA presents more challenges. Most organs typically have a spherical shape and are distributed in a concentrated manner, while blood vessels have a tubular structure and their distribution in TOF-MRA is relatively sparse Chen et al. (2022a); Xia et al. (2022), which increases the difficulty and cost in obtaining manually labeled MRA-TOF data. On the contrary, the unlabeled TOF-MRA data are easily accessible from the public resource (i.e., IXI and OASIS3). Therefore, how to fully use the public unlabeled data is essential for cerebrovascular segmentation with limited manually labeled data.
There are two fundamental deep learning-based methods that are capable of using unlabeled data—-semi-supervised learning (SemiSL) Qi and Luo (2020); Yang et al. (2022) and self-supervised learning (SSL) Ericsson et al. (2022); Huang et al. (2023) methods (see Fig.1 a and b). SemiSL-based methods typically utilize unlabeled data in conjunction with labeled data during the training process, learning the consistency between them Cheplygina et al. (2019). This process requires repeated access to the unlabeled data and increases the demand for computational resources, when different research institutes want to use these data. Besides, the public unlabeled data are collected from multiple resources and may be highly heterogeneous from the single-site dataset. The issue of distribution shift, arising from this, has the potential to adversely affect the performance of SemiSL methods Chen et al. (2019). Regarding the SSL methods, they usually pretrain the model through the pretext task Zhang et al. (2023), and then the pretrained model can be reused by different institutes without access to the unlabeled data. This process decreases the computational cost when the unlabeled data scale is large and protects patient privacy since different institutes do not directly access the unlabeled data Asadian et al. (2022).
However, the performance of SSL methods highly depends on the choice of pretext task. The pretext tasks, such as contrastive learning between augmented data Chen et al. (2020); Liao et al. (2022) and masked imaging modelling He et al. (2022), are significantly different from the downstream task (e.g., cerebrovascular segmentation in this study). This setting proves advantageous when dealing with multiple downstream tasks, while a specific single downstream task might not gain much benefits from SSL methods. Therefore, incorporating the prior knowledge of the specific downstream task into the pretraining procedure may be extremely valuable. Frangi filtering is an effective technique to enhance and extract tubular structures in medical images Frangi et al. (1998). The filtering process involves analyzing the local intensity and Hessian matrix of the image to detect vessel-like structures, making it useful for applications such as vessel segmentation and analysis in medical imaging. Frangi filtering plays a fundamental role as a preprocessing step in traditional vessel segmentation algorithms. It is worth seriously considering incorporating this filtering technique into the pretraining task.
Driven by the abovementioned analysis, we propose a Frangi filtering-based pretraining network (called FFPN) to fully use the public unlabeled TOF-MRA data. In particular, we have developed a preprocessing workflow to handle large-scale unlabeled data. This preprocessing program utilizes Frangi filtering to enhance vessel structures. Additionally, it incorporates thresholding methods and connected component analysis to obtain a coarse segmentation of the vessels. Then, the vessel-enhanced images and coarse vessel segmentations are used for a multi-task pretraining procedure. Our model is pretrained on the large-scale TOF-MRA dataset with 1510 volumetric data and evaluated on four labeled datasets. The results demonstrate that it notably outperforms the state-of-the-art SemiSL and SSL methods. Besides, the ablation studies also confirm that our proposed pretraining task helps the model achieve better performance with fewer labeled data. Moreover, this pretraining task is not limited to specific model architectures and can significantly improve the performance across various backbone structures consistently.
The main contributions of this study can be summarized as follows:
-
1.
We develop an automated preprocessing workflow to efficiently handle unlabeled data. The preprocessing provides the vessel enhanced images and coarse vessel segmentations that are used for the pretraining procedure.
-
2.
We propose a simple yet effective pretraining strategy by using the preprocessed TOF-MRA data. This pretraining task contains regression learning, coarse label segmentation, and max intensity projection (MIP) consistency learning. The results show that the proposed method shows great superiority over existing models.
-
3.
We constructed a large unlabeled dataset (1510 volumetric data) and three manual-annotated datasets (a total of 113 volumetric data). These data are expected to advance semi-supervised, weakly supervised, and self-supervised learning methods in cerebrovascular segmentation.
2 Related Work
2.1 Cerebrovascular Segmentation
Traditional methods for cerebrovascular segmentation typically rely on the continuity of grayscale variations in images to perform threshold-based segmentation Otsu (1979); Frangi et al. (1998). In addition, statistical modeling-based approaches utilize the concept that different brain tissues exhibit distinct pixel grayscale distributions during imaging Duan et al. (2019); Alderliesten et al. (2006). This has been successful in the segmentation of cerebrovascular structures in TOF-MRA images. In recent years, deep learning methods have shown remarkable performance in vessel segmentation tasks Sanchesa et al. (2019); Sichtermann et al. (2019); Qi et al. (2023). For instance, Livne et al. utilized a UNet-based model to segment vessels in patients with cerebrovascular disease Livne et al. (2019). Chen et al. proposed an attention-assisted generative adversarial network (A-SegAN) for automatic cerebrovascular segmentation Chen et al. (2022b). More related research can be referred to this review Chen et al. (2023).
2.2 SemiSL and SSL Methods for Medical Images
Numerous SemiSL and SSL methods have been developed to utilize unlabeled medical data and improve model performance Wang et al. (2023); Yu et al. (2023); Shi et al. (2023); Liao et al. (2023). For instance, Chen et al. proposed a generative consistency for the semi-supervised (GCS) model, which calculates the consistency of perturbed data to enhance cerebrovascular segmentation performance Chen et al. (2022a). Moreover, SSL-based methods have also demonstrated the effectiveness of pretraining in medical imaging research. To illustrate, Tang et al. introduced a multitask pretraining strategy using a 3D Swin transformer, achieving significant success in multiple organ segmentation tasks. For more detailed information on the topic, comprehensive reviews Zhang et al. (2023); Jiao et al. (2022); Krishnan et al. (2022) can be referred to.
3 Methods
In this section, we introduce the proposed FFPN model. The preprocessing workflow and pretraining procedure are shown in Fig.2 a and b, respectively. First, we preprocess the unlabeled TOF-MRA data to obtain the vessel enhanced image (VEI) and coarse vessel segmentation (CVS). Subsequently, during the pretraining procedure, the VEI and CVS are utilized as regression and segmentation targets. Further details regarding this process will be elaborated in the upcoming sections.
3.1 Preprocessing Method for Unlabeled Data
The pretraining process consists of three steps: image size cropping, vessel-structure enhancement, and coarse segmentation.
3.1.1 Image size cropping:
The TOF-MRA images often contain a significant number of background pixels, which have no relevance to the vessel segmentation task. Removing these background pixels provides a key advantage by reducing computational costs, especially when dealing with a large-scale unlabeled dataset. Previous approaches have utilized non-zero masks to crop raw images Isensee et al. (2021). However, background pixels don’t always possess intensity values of zero. In this study, we propose a novel method to calculate the cropping mask based on the variation of pixel intensities along the Z-axis (slice direction).
For a given single image, denoted as with dimensions , we compute three projection images along the Z-axis: Average Intensity Projection (AIP), Maximum Intensity Projection (MIP), and Intensity Variation Map (IVM). The calculations are as follows:
(1) | |||
To create the cropping mask, we employ threshold segmentation. Taking MIP as an example, we generate a mask by retaining only the positions of the top percentage (e.g., 35%) of pixels in the MIP image. The resulting masks, namely the AIP mask, MIP mask, and IVM mask, are then merged using a bitwise AND operation. Additionally, small regions containing fewer than 200 pixels are removed. Finally, we obtain a square cropping mask that is applied to each slice of .
In summary, our proposed approach enables the generation of a cropping mask based on the intensity variations along the Z-axis in TOF-MRA images. By applying this mask, background pixels are effectively eliminated, resulting in a significant reduction in computational costs.
3.1.2 Frangi fitering:
For 3D TOF-MRA images, the Frangi filtering involves several steps to enhance the visibility of blood vessels. First, it computes the second-order mixed partial derivatives for each pixel using the Hessian matrix. The Hessian matrix is defined as:
(2) |
where represents the second partial derivative of the image intensity with respect to the coordinate, and similarly for the other partial derivatives. Next, the eigenvalues of the Hessian matrix, denoted as , , and , are calculated. These eigenvalues provide information about the local structure of the image. Specifically, the magnitude and orientation of the eigenvalues help identify pixels that belong to blood vessels.
In general, larger positive eigenvalues correspond to the central region of vessels, smaller positive eigenvalues represent the vessel edges, while negative eigenvalues typically indicate irrelevant structures and backgrounds. By analyzing the eigenvalues, pixels with vascular features are selected, and the VEI is obtained at this step.
3.1.3 Coarse segmentation:
To obtain a coarse segmentation of the blood vessels, a thresholding method and connected component analysis are applied. First, a vessel mask is created by retaining only the positions of the top percentage (e.g., 5%) of pixels for VEIs. This thresholding step helps separate the vessel pixels from the background and other structures. Then, connected component analysis is performed on the vessel mask. The largest connected regions, where could be empirically chosen between 3-5, are retained as the coarse segmentation result of blood vessels. By combining the Frangi filtering, thresholding, and connected component analysis, the CVS is obtained, which provides an initial approximation of the blood vessel structure in the 3D TOF-MRA images.
Datasets | TubeTK-42 | Brains | IXI-45 | EDEN | ||||||||
Method | Dice | clDice | HD95 | Dice | clDice | HD95 | Dice | clDice | HD95 | Dice | clDice | HD95 |
UNet (MICCAI’15) | 72.98 | 81.33 | 9.09 | 73.77 | 75.86 | 3.72 | 79.82 | 75.23 | 10.25 | 79.79 | 80.62 | 9.89 |
DTC (AAAI’21) | 69.31 | 76.35 | 25.60 | 64.36 | 64.86 | 34.72 | 70.44 | 63.48 | 63.31 | 71.65 | 65.65 | 91.93 |
GCS (TMI’22) | 18.11 | 33.40 | 22.50 | 10.37 | 9.28 | 24.92 | 25.05 | 21.32 | 55.10 | 61.36 | 67.85 | 28.42 |
GBDL (CVPR’22) | 36.40 | 55.67 | 16.16 | 39.79 | 52.84 | 63.71 | 12.46 | 20.37 | 42.43 | 51.78 | 64.48 | 11.41 |
SSNet (MICCAI’22) | 65.15 | 71.84 | 28.65 | 64.22 | 63.88 | 35.04 | 73.55 | 65.14 | 52.05 | 69.72 | 64.26 | 96.65 |
BCP (CVPR’23) | 73.36 | 81.95 | 8.33 | 77.25 | 80.31 | 2.88 | 81.37 | 80.64 | 6.17 | 86.44 | 89.41 | 2.71 |
PCRL (ICCV’21) | 72.90 | 81.92 | 7.82 | 75.94 | 79.09 | 2.74 | 83.16 | 83.69 | 4.05 | 86.15 | 88.71 | 2.91 |
MAE (CVPR’22) | 72.39 | 79.37 | 11.82 | 77.56 | 79.70 | 3.49 | 83.21 | 80.14 | 10.04 | 84.62 | 85.90 | 6.73 |
Swin UNETR (CVPR’22) | 74.66 | 82.18 | 9.71 | 77.76 | 80.40 | 3.30 | 80.64 | 79.01 | 6.37 | 81.23 | 81.87 | 12.90 |
UniMiSS (ECCV’22) | 70.54 | 82.59 | 9.03 | 77.63 | 79.62 | 3.03 | 85.45 | 83.49 | 3.83 | 88.02 | 88.51 | 2.91 |
GVSL (CVPR’23) | 74.27 | 82.57 | 8.30 | 77.87 | 79.68 | 3.16 | 85.92 | 84.82 | 3.35 | 85.15 | 89.39 | 2.61 |
FFPN (Ours) | 77.53 | 85.71 | 6.69 | 80.47 | 83.28 | 2.19 | 87.24 | 86.25 | 2.75 | 90.75 | 91.53 | 1.24 |
Preprocessing | 29.64 | 31.27 | 71.29 | 31.73 | 37.01 | 38.55 | 43.18 | 55.50 | 38.56 | 68.48 | 69.43 | 46.18 |
3.2 Pretraining task and Loss Function
Our proposed pretraining process contains three tasks: regression learning, coarse vessel segmentation learning, and consistency learning in the MIP projection. Let function be the backbone neural network, and and are the VEI and CVS for subject . is used to encode the raw images, and one regression header and segmentation header are used to predict and , respectively. It can be formulated as:
(3) | |||
where and are convolution operation with kernel size 1. is used to compute regression loss. For segmentation loss, Dice loss and Binary Cross Entropy are used. These can be formulated as:
(4) |
(5) |
In this study, we do not conduct skull stripping for the preprocessed data. In such case, the Frangi filtering will also enhance the skull pixels, while the CVS will not contain the skull pixels after connected component analysis. In order to focus more on learning the vascular structures, we incorporate an extra consistency loss by calculating the loss between and after applying maximum intensity projection along Z-axis. It can be formulated as:
(6) | |||
where and have almost removed the skull pixels. Meanwhile, the MIP images of TOF-MRA are commonly used for the initial assessment of vascular morphology. The consistency loss will help the model focus more on learning the vascular structures.
Finally, the total loss function will be:
(7) |
where and are the loss weight. We empirically set and in this study.
3.3 Backbone
We select the Swin UNETR Hatamizadeh et al. (2021) as the backbone in this study. Different from the vanilla model, successive convolutional layers are used in the initial embedding layers Zhou et al. (2021a). This setting enhances the capture of spatial location information, a feature that is potentially beneficial to the segmentation of cerebrovascular structures with sparse distribution. Please note that we also examined the impact of our pretraining strategy on other backbone structures in the result section.
4 Experimental Setup
4.1 Datasets
Unlabeled Datasets for Pretraining: We collect a large unlabeled TOF-MRA consisting of 1510 subjects from 5 public datasets, including IXI (525 subjects), OASIS3 (525 subjects) LaMontagne et al. (2019), BrainAneurysm (280 subjects) Di Noto et al. (2023), ADAM (113 subjects) and TubeTK (67 subjects). We preprocessed these data by our proposed workflow and the model is pretrained through the proposed tasks.
Labeled Datasets for Downstream Evaluation: We evaluate our proposed method in four labeled datasets.
-
1.
TubeTK-42: The full TubeTK dataset contains 109 subjects, with 67 unlabeled subjects used for pretraining and 42 labeled subjects used for finetune validation. The image size is (448, 448, 128) and the spacing is (0.5134, 0.5134, 0.8000).
-
2.
Brains: This dataset contains 56 subjects. The size of the images is not identical, with average size of (286, 320, 204). The spacing is (0.6188, 0.6188, 0.6200).
-
3.
IXI-45: This dataset is provided by Chen et al. (2022b). It contains 45 subjects from the whole IXI dataset (570 subjects). The image size is (1024, 1024, 92). The provided images have been processed and therefore do not contain spacing information. The original spacing of the IXI dataset is (0.2637, 0.2637, 0.8000).
-
4.
EDEN: This dataset contains 15 subjects. Unlike the previous dataset, we also provide the blood vessels on the skull in this dataset. The image size is (672, 672, 210) and the spacing is (0.2976, 0.2976, 0.4500).
Please note that the pretraining dataset and evaluation dataset are completely isolated. For the labeled datasets, 80% subjects are used for training, and the rest data are used for testing. Detailed information about the dataset can be found in the Appendix.
4.2 Evaluation Metrics
Three metrics are used to assess the cerebrovascular segmentation performance, including the Dice similarity coefficient (Dice), Hausdorff Distance 95% (HD95) and centerline Dice coefficient (clDice) Shit et al. (2021). The Dice and HD95 metrics primarily focus on the overlap degree and shape differences in segmentation results, clDice takes into account the topological structure of the segmented regions, making it more suitable for evaluating tubular structures.
4.3 Baseline Models and Implementation
We compare the performance of 3D cerebrovascular segmentation between the proposed FFPN model and existing methods using unlabeled data, including the SemiSL methods BCP Bai et al. (2023), GBDL Wang and Lukasiewicz (2022), GCS Chen et al. (2022a), SS-Net Wu et al. (2022), DTC Luo et al. (2021) and the SSL methods MAE He et al. (2022), PCRL Zhou et al. (2021b), UniMiSS Xie et al. (2022), Swin UNETR Tang et al. (2022), GVSL He et al. (2023). Besides, we also include the basic UNet Ronneberger et al. (2015) model that does not utilize the unlabeled data to provide a comparison.
For our proposed FFPN model, the encoder has 4 stages which comprise of [2, 2, 6, 2] transformer blocks at each stage. The initial patch size is 2 and the feature size is 24. In pretraining task learning, the model is trained using AdamW optimizer for 200k iterations. Warm-up cosine scheduler of 20k iterations is used. The cropping patch size is (160, 160, 64) and the batch size is 2. The weight decay is and learning rate is . In finetune procedure, the model is trained using AdamW optimizer for 5k iterations. The learning rate is (half the learning rate for the encoder) and the cropping patch size is (192, 192, 96).
5 Experimental Results
5.1 Comparison with state-of-the-art Methods
The cerebrovascular segmentation results for the four datasets are shown in Table 1. The proposed method achieves the best performance compared with other baseline models on the four datasets in terms of the three metrics. In TubeTK-42 and Brains datasets, the clDice scores of our proposed method are 85.71 and 83.28, with an improvement of approximately 3% over the second model. In IXI-45 and EDEN datasets, our proposed method also achieves an improvement of around 2% compared with the second-best model.
To further demonstrate the superiority of FFPN, a statistical analysis is shown in Fig. 3. Specifically, We conduct another 5-fold experiment on the largest dataset (Brains) using FFPN and baseline models with competitive performance (GVSL, Swin UNETR and PCRL). The results show that the improvement of FFPN is statistically significant (T-test) among the three metrics (the outcome for the metric clDice aligning closely with that of Dice).
We provide a 3D visualization result to support an intuitive evaluation (see Fig. 4). Compared with other methods, the segmentation produced by FFPN reveals marked superiority in achieving continuity for the vessel structure, particularly noticeable at locations 1, 2, 3 and 4. Furthermore, it is worth highlighting that the FFPN-enabled segmentation leads to a reduction in false positive points (e.g., locations 5 and 6).
Loss Function | Dice | clDice | HD95 |
---|---|---|---|
Scratch | 72.08 | 80.16 | 11.46 |
72.34 | 81.11 | 9.62 | |
73.46 | 81.58 | 9.10 | |
72.15 | 80.78 | 10.13 | |
74.18 | 82.17 | 8.14 | |
74.64 | 82.94 | 7.57 |
5.2 Influence of the Percentage of Manual Labeled Data
We then conduct a study to evaluate the effectiveness of our pretraining method using a reduced amount of manually labeled data. The results are presented in Fig. 5.
It is observed that pretraining consistently improves the performance of the model across different proportions of labeled data. When using all available labeled data, our pretraining method achieves a 1.03% higher Dice score for the IXI-45 dataset and 0.83% higher Dice score for the TubeTK-42 dataset compared to a model trained from scratch. Additionally, with less than half of the labeled data, the pretrained model demonstrates an improvement of 4.39% for the IXI-45 dataset and 2.56% for the TubeTK-42 dataset. Furthermore, as the availability of labeled images decreases, the performance gap between the models trained with and without pretraining increases.
5.3 Effectiveness of Pretraining Loss Function
In this section, we evaluate the effectiveness of the proposed three loss functions. Specifically, The model is pretrained using different loss functions and then their effectiveness is assessed during the finetuning procedure. We choose the TubeTK-42 dataset with 45.45% labeled data (i.e., 15 subjects) as an example, and the results are presented in Table 2.
For single-task pretraining, the segmentation learning achieves the best performance with metrics Dice, clDice, and HD95 values of 73.46, 81.58, and 9.10, respectively. The regression learning also shows similar performance, while the consistency learning does not significantly improve the model’s performance. Moreover, when combining the regression and segmentation learning loss functions, the model’s performance is further improved. Taking into account all three learning tasks, the model attains the highest performance.
We also present a visualization result from the slice view in Fig. 6. Incorporating pretraining with aids in reducing the occurrence of false positive points in the generated output (e.g., locations 2 and 3). However, it is worth noting that due to the coarse segmentation results potentially retaining some information from the skull, there may still be instances where certain structures on the skull are annotated (e.g., location 4). Overall, by incorporating all pretraining tasks, the model achieves improved segmentation, characterized by enhanced continuity of vessels (see location 1) and a reduced presence of false positive points.
Loaded component | Dice | clDice | HD95 |
---|---|---|---|
Scratch | 80.79 | 78.08 | 8.41 |
Encoder | 81.71 | 79.90 | 12.08 |
Decoder | 81.14 | 76.22 | 11.19 |
Encoder + Decoder | 83.77 | 81.57 | 6.59 |
All | 85.19 | 83.09 | 4.43 |
5.4 Influence of Model Components for Pretraining
Different from previous SSL methods that typically only load the encoder parameters for downstream tasks, our proposed approach loads both the encoder and decoder parameters, as well as the segmentation header. In this section, we explore the impact of loading different model components. We take the IXI-45 dataset with 41.67% labeled data (i.e., 15 subjects) as an example, and the results are presented in Table 3.
When only the encoder is loaded, there is a notable improvement in performance, with the clDice score increasing to 79.90. Besides, when only the decoder is loaded, there is a drop in performance compared to the encoder-loaded model, as indicated by the clDice score of 76.22. Furthermore, when both the encoder and decoder are loaded, there is a substantial boost in performance. The clDice score increases to 81.57. When all components, including the segmentation header, we observe the highest performance across all metrics.
Dataset | EDEN | ||
---|---|---|---|
Backbone | Scratch | Finetune | Improvement |
UNet | 79.79 | 84.26 | +4.47% |
AttentionUNet | 78.44 | 88.20 | +9.76% |
UNETR | 83.32 | 88.06 | +4.74% |
Swin UNETR | 88.87 | 90.75 | +1.88% |
Dataset | IXI-45 | ||
Backbone | Scratch | Finetune | Improvement |
UNet | 79.82 | 85.24 | +5.41% |
AttentionUNet | 81.52 | 84.62 | +3.10% |
UNETR | 80.01 | 83.85 | +3.84% |
Swin UNETR | 86.21 | 87.24 | +1.03% |
5.5 The Effectiveness on Different Backbone Structures
In this section, we assess the effectiveness of our proposed pretraining learning on different backbone structures. We focus on four fundamental medical segmentation backbones, which encompass both convolutional neural network (CNN)-based models (UNet Ronneberger et al. (2015) and AttentionUNet Oktay et al. (2018)) and Transformer-based models (UNETR Hatamizadeh et al. (2022) and Swin UNETR Hatamizadeh et al. (2021)). The corresponding results are summarized in Table 4.
For the EDEN dataset, we observe improvements when using our proposed pretraining learning compared to training from scratch across all backbone architectures. Specifically, for the UNet backbone, the Dice score increases from 79.79 to 84.26, resulting in a significant improvement of 4.47%. Similarly, the AttentionUNet and UNETR backbones show improvements of 9.76% and 4.74%, respectively. Even for the advanced Swin UNETR backbone, there is still a notable 1.88% improvement.
In the case of the IXI-45 dataset, we observe similar trends. Our proposed pretraining learning consistently leads to improvements compared to training from scratch. The UNet backbone shows an improvement of 5.41%, while the AttentionUNet, UNETR, and Swin UNETR backbones show improvements of 3.10%, 3.84%, and 1.03%, respectively.
5.6 The Robustness of Our Pretraining Method Regarding the Unlabeled Dataset Heterogeneity
We carried out this experiment to assess the robustness of our proposed method in relation to multi-source heterogeneous pretraining data, using the IXI dataset as an illustrative case.
In particular, we maintain the quantity of unlabeled data constant (525 subjects) while varying the proportion of the unlabeled IXI dataset used. Then we test the model performance on the labeled IXI-45 dataset. The best-performing semiSL method BCP is used for comparison and The outcomes are depicted in Figure 7. From these results, it is observed that the performance of the semi-supervised learning method, BCP, significantly diminishes as the number of homogeneous data decreases. Conversely, our proposed method consistently shows a stable and high segmentation performance. This result demonstrates that the proposed pretraining strategy may maintain a high level of robustness.
6 Discussion
To unleash the potential of unlabeled TOF-MRA data for 3D cerebrovascular segmentation, our approach incorporates Frangi filtering as a key element in the pretraining phase. The proposed FFPN model demonstrates enhanced robustness and superior segmentation performance, notably outshining other SemiSL and SSL methodologies.
Our proposed cropping strategy can reduce the computation cost when processing such a large 3D MRA dataset. We take IXI dataset as an example to show the effectiveness of the proposed cropping strategy. We compute the cropping rate (CR) for each subject as the following formula:
(8) |
where is the original image size and is the image size after cropping. Higher CR represents more background pixels being removed. The CR of the IXI dataset is shown in Fig. 8. The average CR is 42.99%, and 80% of the samples subtracted nearly 40% (38.49%) of the background points. Besides, two representative data are shown in Fig. 9. The strategy of non-zero cropping is widely employed in the analysis of 3D medical images. Yet, the presence of noise in MRI scans undermines the effectiveness of this approach, leading to suboptimal image cropping. Our proposed methodology addresses this issue by meticulously eliminating the maximum amount of background pixels. This approach gains importance in scenarios involving extensive 3D datasets, ensuring a more precise and efficient analysis.
Our proposed method is not limited by the backbone structure and has relatively high generalizability. More importantly, the proposed pretraining strategy may reduce the performance gap among the various backbone models. It is observed from Table 4 that the performance gap between Swin UNETR and UNet is around 6.5 before pretraining, while it becomes approximately 2 after using the proposed pretraining strategy. This indicates that the lightweight model (e.g., vanilla UNet) can also exhibit competitive performance after employing the proposed pretraining method, which holds significant implications for the use of lightweight networks in clinical settings.
One limitation of this study is that our method may not effectively use the other modal data that can not fully reflect the vessel morphology (e.g., T1). The proposed pretraining method relies on the preprocessing workflow for cerebrovascular extraction, while other modal data might not include any vessel-related information. Any modality of data, such as computed tomographic angiography, digital subtraction angiography, and optical coherence tomography, capable of capturing vessel morphology, can derive advantages from the proposed method. Besides, our pretraining method can serve as an auxiliary training approach for a SSL model trained using other modal data or general large vision models, and help to construct a domain-specific model for cerebrovascular segmentation.
7 Conclusion
In this study, we introduce a Frangi filtering-based pretraining network to effectively leverage the unlabeled TOF-MRA data. Our method capitalizes on the prior knowledge of tubular structures by incorporating it into the pretraining tasks, and shows great superiority over other SemiSL and SSL methods on four cerebrovascular segmentation datasets. The ablation studies also show that our proposed method significantly improves model performance, particularly with a limited number of labeled data. Besides, our method is not restricted by model architecture and can enhance the performance of various backbone structures.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant: 62027901, 81930053, 81227901; Beijing Natural Science Foundation: JQ22023; CAS Youth Innovation Promotion Association under Grant Y2022055. The authors would like to acknowledge the instrumental and technical support of Multimodal Biomedical Imaging Experimental Platform, Institute of Automation, Chinese Academy of Sciences.
References
- Alderliesten et al. (2006) Alderliesten, T., Bosman, P.A., Niessen, W.J., 2006. Towards a real-time minimally-invasive vascular intervention simulation system. IEEE Transactions on Medical Imaging 26, 128–132.
- Asadian et al. (2022) Asadian, A., Weidner, E., Jiang, L., 2022. Self-supervised pretraining for differentially private learning. arXiv preprint arXiv:2206.07125 .
- Bai et al. (2023) Bai, Y., Chen, D., Li, Q., Shen, W., Wang, Y., 2023. Bidirectional copy-paste for semi-supervised medical image segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11514–11524.
- Chen et al. (2022a) Chen, C., Zhou, K., Wang, Z., Xiao, R., 2022a. Generative consistency for semi-supervised cerebrovascular segmentation from tof-mra. IEEE Transactions on Medical Imaging 42, 346–353.
- Chen et al. (2023) Chen, C., Zhou, K., Wang, Z., Zhang, Q., Xiao, R., 2023. All answers are in the images: A review of deep learning for cerebrovascular segmentation. Computerized Medical Imaging and Graphics , 102229.
- Chen et al. (2019) Chen, K., Yao, L., Zhang, D., Chang, X., Long, G., Wang, S., 2019. Distributionally robust semi-supervised learning for people-centric sensing, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3321–3328.
- Chen et al. (2020) Chen, T., Kornblith, S., Norouzi, M., Hinton, G., 2020. A simple framework for contrastive learning of visual representations, in: International conference on machine learning, PMLR. pp. 1597–1607.
- Chen et al. (2022b) Chen, Y., Jin, D., Guo, B., Bai, X., 2022b. Attention-assisted adversarial model for cerebrovascular segmentation in 3d tof-mra volumes. IEEE Transactions on Medical Imaging 41, 3520–3532.
- Cheplygina et al. (2019) Cheplygina, V., de Bruijne, M., Pluim, J.P., 2019. Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Medical image analysis 54, 280–296.
- Di Noto et al. (2023) Di Noto, T., Marie, G., Tourbier, S., Alemán-Gómez, Y., Esteban, O., Saliou, G., Cuadra, M.B., Hagmann, P., Richiardi, J., 2023. Towards automated brain aneurysm detection in tof-mra: Open data, weak labels, and anatomical knowledge. Neuroinformatics 21, 21–34.
- Duan et al. (2019) Duan, H.H., Su, G.Q., Huang, Y.C., Song, L.T., Nie, S.D., 2019. Segmentation of pulmonary vascular tree by incorporating vessel enhancement filter and variational region-growing. Journal of X-ray science and technology 27, 343–360.
- Ericsson et al. (2022) Ericsson, L., Gouk, H., Loy, C.C., Hospedales, T.M., 2022. Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Processing Magazine 39, 42–62. doi:10.1109/MSP.2021.3134634.
- Frangi et al. (1998) Frangi, A.F., Niessen, W.J., Vincken, K.L., Viergever, M.A., 1998. Multiscale vessel enhancement filtering, in: Medical Image Computing and Computer-Assisted Intervention—MICCAI’98: First International Conference Cambridge, MA, USA, October 11–13, 1998 Proceedings 1, Springer. pp. 130–137.
- Hassouna et al. (2006) Hassouna, M.S., Farag, A.A., Hushek, S., Moriarty, T., 2006. Cerebrovascular segmentation from tof using stochastic models. Medical image analysis 10, 2–18.
- Hatamizadeh et al. (2021) Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H.R., Xu, D., 2021. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images, in: International MICCAI Brainlesion Workshop, Springer. pp. 272–284.
- Hatamizadeh et al. (2022) Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B., Roth, H.R., Xu, D., 2022. Unetr: Transformers for 3d medical image segmentation, in: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 574–584.
- He et al. (2022) He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R., 2022. Masked autoencoders are scalable vision learners, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16000–16009.
- He et al. (2023) He, Y., Yang, G., Ge, R., Chen, Y., Coatrieux, J.L., Wang, B., Li, S., 2023. Geometric visual similarity learning in 3d medical image self-supervised pre-training, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9538–9547.
- Huang et al. (2023) Huang, S.C., Pareek, A., Jensen, M., Lungren, M.P., Yeung, S., Chaudhari, A.S., 2023. Self-supervised learning for medical image classification: a systematic review and implementation guidelines. NPJ Digital Medicine 6, 74.
- Isensee et al. (2021) Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H., 2021. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18, 203–211.
- Jiao et al. (2022) Jiao, R., Zhang, Y., Ding, L., Cai, R., Zhang, J., 2022. Learning with limited annotations: a survey on deep semi-supervised learning for medical image segmentation. arXiv preprint arXiv:2207.14191 .
- Krishnan et al. (2022) Krishnan, R., Rajpurkar, P., Topol, E.J., 2022. Self-supervised learning in medicine and healthcare. Nature Biomedical Engineering 6, 1346–1352.
- LaMontagne et al. (2019) LaMontagne, P.J., Benzinger, T.L., Morris, J.C., Keefe, S., Hornbeck, R., Xiong, C., Grant, E., Hassenstab, J., Moulder, K., Vlassenko, A.G., et al., 2019. Oasis-3: longitudinal neuroimaging, clinical, and cognitive dataset for normal aging and alzheimer disease. MedRxiv , 2019–12.
- Liao et al. (2023) Liao, W., Li, X., Wang, Q., Xu, Y., Yin, Z., Xiong, H., 2023. Cupre: Cross-domain unsupervised pre-training for few-shot cell segmentation. arXiv preprint arXiv:2310.03981 .
- Liao et al. (2022) Liao, W., Xiong, H., Wang, Q., Mo, Y., Li, X., Liu, Y., Chen, Z., Huang, S., Dou, D., 2022. Muscle: Multi-task self-supervised continual learning to pre-train deep models for x-ray images of multiple body parts, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 151–161.
- Livne et al. (2019) Livne, M., Rieger, J., Aydin, O.U., Taha, A.A., Akay, E.M., Kossen, T., Sobesky, J., Kelleher, J.D., Hildebrand, K., Frey, D., et al., 2019. A u-net deep learning framework for high performance vessel segmentation in patients with cerebrovascular disease. Frontiers in neuroscience 13, 97.
- Luo et al. (2021) Luo, X., Chen, J., Song, T., Wang, G., 2021. Semi-supervised medical image segmentation through dual-task consistency, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8801–8809.
- Ni et al. (2020) Ni, J., Wu, J., Wang, H., Tong, J., Chen, Z., Wong, K.K., Abbott, D., 2020. Global channel attention networks for intracranial vessel segmentation. Computers in biology and medicine 118, 103639.
- Oktay et al. (2018) Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al., 2018. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 .
- Otsu (1979) Otsu, N., 1979. A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics 9, 62–66.
- Özsarlak et al. (2004) Özsarlak, Ö., Van Goethem, J.W., Maes, M., Parizel, P.M., 2004. Mr angiography of the intracranial vessels: technical aspects and clinical applications. Neuroradiology 46, 955–972.
- Qi and Luo (2020) Qi, G.J., Luo, J., 2020. Small data challenges in big data era: A survey of recent progress on unsupervised and semi-supervised methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 2168–2187.
- Qi et al. (2023) Qi, Y., He, Y., Qi, X., Zhang, Y., Yang, G., 2023. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. arXiv preprint arXiv:2307.08388 .
- Ronneberger et al. (2015) Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, Springer. pp. 234–241.
- Sanchesa et al. (2019) Sanchesa, P., Meyer, C., Vigon, V., Naegel, B., 2019. Cerebrovascular network segmentation of mra images with deep learning, in: 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019), IEEE. pp. 768–771.
- Shi et al. (2023) Shi, G., Yin, L., An, Y., Li, G., Zhang, L., Bian, Z., Chen, Z., Zhang, H., Hui, H., Tian, J., 2023. Progressive pretraining network for 3d system matrix calibration in magnetic particle imaging. IEEE Transactions on Medical Imaging 42, 3639–3650. doi:10.1109/TMI.2023.3297173.
- Shit et al. (2021) Shit, S., Paetzold, J.C., Sekuboyina, A., Ezhov, I., Unger, A., Zhylka, A., Pluim, J.P., Bauer, U., Menze, B.H., 2021. cldice-a novel topology-preserving loss function for tubular structure segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16560–16569.
- Sichtermann et al. (2019) Sichtermann, T., Faron, A., Sijben, R., Teichert, N., Freiherr, J., Wiesmann, M., 2019. Deep learning–based detection of intracranial aneurysms in 3d tof-mra. American Journal of Neuroradiology 40, 25–32.
- Tang et al. (2022) Tang, Y., Yang, D., Li, W., Roth, H.R., Landman, B., Xu, D., Nath, V., Hatamizadeh, A., 2022. Self-supervised pre-training of swin transformers for 3d medical image analysis, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20730–20740.
- Wang and Lukasiewicz (2022) Wang, J., Lukasiewicz, T., 2022. Rethinking bayesian deep learning methods for semi-supervised volumetric medical image segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 182–190.
- Wang et al. (2023) Wang, X., Tang, F., Chen, H., Cheung, C.Y., Heng, P.A., 2023. Deep semi-supervised multiple instance learning with self-correction for dme classification from oct images. Medical Image Analysis 83, 102673.
- Wu et al. (2022) Wu, Y., Wu, Z., Wu, Q., Ge, Z., Cai, J., 2022. Exploring smoothness and class-separation for semi-supervised medical image segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 34–43.
- Xia et al. (2022) Xia, L., Zhang, H., Wu, Y., Song, R., Ma, Y., Mou, L., Liu, J., Xie, Y., Ma, M., Zhao, Y., 2022. 3d vessel-like structure segmentation in medical images by an edge-reinforced network. Medical Image Analysis 82, 102581.
- Xie et al. (2022) Xie, Y., Zhang, J., Xia, Y., Wu, Q., 2022. Unimiss: Universal medical self-supervised learning via breaking dimensionality barrier, in: European Conference on Computer Vision, Springer. pp. 558–575.
- Yang et al. (2022) Yang, X., Song, Z., King, I., Xu, Z., 2022. A survey on deep semi-supervised learning. IEEE Transactions on Knowledge and Data Engineering .
- Yu et al. (2023) Yu, K., Sun, L., Chen, J., Reynolds, M., Chaudhary, T., Batmanghelich, K., 2023. Drasclr: A self-supervised framework of learning disease-related and anatomy-specific representation for 3d lung ct images. Medical Image Analysis , 103062URL: https://www.sciencedirect.com/science/article/pii/S1361841523003225, doi:https://doi.org/10.1016/j.media.2023.103062.
- Zhang et al. (2023) Zhang, C., Zheng, H., Gu, Y., 2023. Dive into the details of self-supervised learning for medical image analysis. Medical Image Analysis , 102879.
- Zhou et al. (2021a) Zhou, H.Y., Guo, J., Zhang, Y., Yu, L., Wang, L., Yu, Y., 2021a. nnformer: Interleaved transformer for volumetric segmentation. arXiv preprint arXiv:2109.03201 .
- Zhou et al. (2021b) Zhou, H.Y., Lu, C., Yang, S., Han, X., Yu, Y., 2021b. Preservational learning improves self-supervised medical image models by reconstructing diverse contexts, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3499–3509.
Supplementary Material
7.1 Pretraining Datasets
The pretraining dataset consists of 5 publicly accessible datasets:
-
1.
IXI111http://brain-development.org/ixi-dataset/: This dataset contains 570 subjects. 45 subjects with vessel masks Chen et al. (2022b) are used for evaluation dataset. The rest 525 subjects are used for pretraining. The image size of most data (497 subjects) is (512, 512, 100). The image size of 27 subjects is (1024, 1024, 92) and one subject is (1024, 1024, 91). The spacing is (0.4688, 0.4688, 0.8000).
-
2.
OASIS222https://www.oasis-brains.org: This dataset contains 525 subjects. The spacing and image size are not identical among the subjects. The average image size is (583, 765, 216). The spacing of most data (337 subjects) is (0.2995, 0.2995, 0.6000), and the average spacing is (0.2977, 0.2977, 0.5933).
-
3.
TubeTK333https://public.kitware.com/Wiki/TubeTK/Data: This dataset contains 109 subjects. 42 subjects with vessel masks are used for the evaluation dataset. The rest 67 subjects are used for pretraining. The image size is (448, 448, 128) and the spacing is (0.5134, 0.5134, 0.8000).
-
4.
ADAM444https://adam.isi.uu.nl/data/: This dataset contains 113 subjects. Each subject contains one original MRA-TOF image and one bias field corrected MRA-TOF image. Only the original data are used in this study. The average image size is (556, 556, 131). The spacing is various and the average spacing is (0.3524, 0.3524, 0.5447).
-
5.
BrainAneurysm555https://openneuro.org/datasets/ds003949/versions/1.0.1: This dataset contains 284 subjects, of which 127 are healthy controls and 157 are patients with brain aneurysms. Four subjects (sub-200, sub-235, sub-315 and sub-450) are removed due to the incomplete image data. The average spacing is (0.4021, 0.4021, 0.6613), and the average image size is (466, 546, 147).
7.2 Evaluation Datasets
The raw image data of the three labeled datasets are accessible at the following public websites:
-
1.
TubeTK-42: https://public.kitware.com/Wiki/TubeTK/Data
- 2.
- 3.
We provide the voxel-wise vessel mask for the three datasets in this study.
7.3 The implementations of baseline models
For the baseline models, we use the following publicly available implementations.
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
The UNet is implemented in MONAI666https://monai.io/.