Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: dingbat

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY-NC-ND 4.0
arXiv:2403.09262v1 [eess.IV] 14 Mar 2024
11institutetext: Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE 11email: {fadillah.maani,anees.hashmi}@mbzuai.ac.ae 22institutetext: Northwestern University, Illinois, USA

Advanced Tumor Segmentation in Medical Imaging: An Ensemble Approach for BraTS 2023 Adult Glioma and Pediatric Tumor Tasks

Fadillah Maani*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT 11    Anees Ur Rehman Hashmi*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT 11    Mariam Aljuboory 1122    Numan Saeed 11    Ikboljon Sobirov 11    Mohammad Yaqub 11
Abstract

Automated segmentation proves to be a valuable tool in precisely detecting tumors within medical images. The accurate identification and segmentation of tumor types hold paramount importance in diagnosing, monitoring, and treating highly fatal brain tumors. The BraTS challenge serves as a platform for researchers to tackle this issue by participating in open challenges focused on tumor segmentation. This study outlines our methodology for segmenting tumors in the context of two distinct tasks from the BraTS 2023 challenge: Adult Glioma and Pediatric Tumors. Our approach leverages two encoder-decoder-based CNN models, namely SegResNet and MedNeXt, for segmenting three distinct subregions of tumors. We further introduce a set of robust postprocessing to improve the segmentation, especially for the newly introduced BraTS 2023 metrics. The specifics of our approach and comprehensive performance analyses are expounded upon in this work. Our proposed approach achieves third place in the BraTS 2023 Adult Glioma Segmentation Challenge with an average of 0.8313 and 36.38 Dice and HD95 scores on the test set, respectively.

Keywords:
BraTS MRI Glioma Tumor Segmentation BraTS-PEDs Challenge BraTS-adult
**footnotetext: Equal contribution

1 Introduction

Cancerous brain tumors are one of the deadliest types of central nervous system tumors [1], and they account for the highest number of cancer-related deaths in pediatrics. Glioma is a brain tumor that originates from glial cells, which provide structure and support to the nerve cells. Astrocytoma is the type of glioma that occurs in the astrocytes ( a type of glial cells), which are responsible for a healthy brain environment. The treatment options for brain tumors include surgery, chemotherapy, and radiation therapy. Neurologists, oncologists, and radiologists work together to develop the treatment plan for patients. Magnetic Resonance Imaging (MRI) scans provide information on the patient’s internal structure, tissue, and organs. The scans are used for treatment plans and to assess the performance of the treatment [15].

Radiologists predict tumor classification and whereabouts from MRI scans. As a result of the shortage of healthcare workers in some countries, radiologists are often overworked and under constant pressure, which at times naturally leads to human error. Manual segmentation of the tumor can be very time-consuming for radiologists. Automatic segmentation can increase the accuracy of tumor classifications and improve the workload for radiologists by providing them with additional resources that can help them feel supported.

The annual Medical Image Computing and Computer Assisted Interventions (MICCAI) conference hosts medical imaging challenges for research teams to participate internationally. One of MICCAI’s challenges is BraTS [13], the brain tumor segmentation that consists of nine tasks this year. Initially, BraTS started as a challenge that only aimed at adult glioma [5, 4, 3]; however, recently, it has expanded its dataset to increase the diversity of the segmentation tasks. The additional population in the expanded dataset includes sub-Saharan African patients, pediatrics, and meningioma tumors. The aim of expanding the dataset is to account for different brain tumors, a diverse range of image quality, and tumor sizes. This paper focuses on the adult and pediatric datasets.

Artificial intelligence uses neural networks to train on pre-existing data to learn the boundaries of brain tumors. Automatic segmentation is made possible by deep learning models that analyze the MRI dataset. As a result, physicians can use the algorithms to effectively and accurately identify tumors. This paper highlights the use of MedNeXt and SegResNet models for fully automated brain tumor segmentation. The training and validation occurred simultaneously when testing the different models on the dataset. BraTS provided researchers with the ground truth to accurately assess the performance of the models based on the predictions. The tumor segmentations were outlined by radiologists and reviewed by neurologists to ensure the accuracy of the data.

Manual segmentations are often time-consuming; semi-automated segmentation is a computer-based model with human contributions. Generally, the expert radiologist needs to manually guide the algorithm by providing the outlines for the segmentations. Then, the model uses the information imputed and automatically segments the region of interest in scans. This technique was one of the initial transitions to artificial intelligence segmentations  [21]. Two-stage segmentation frameworks are fully automatic but consist of two steps: balancing the classes and refining to the proper proportions. For brain tumors, there are several different imbalance classes; balancing the classes allows for equal representation and prevents biased training. Refining the predictions can improve the segmentation results by matching the accurate class distribution  [16]. While both the semi-automated and two-stage segmentations improve the workload in radiology, fully automated segmentations are the most efficient and consistent.

Our main contributions are the following:

  • A proposed ensemble of deep learning models for adult and pediatric brain tumor segmentation from the BraTS 2023 challenge.

  • An integration of the deep supervision component with the models and investigation of its effects on the performance.

  • A thorough investigation and analysis of post-processing techniques for the brain tumor segmentation tasks.

2 Methods

2.1 Dataset

BraTS-Adult Glioma. The dataset contains multi-institutional structural MRI scans of four different contrasts: pre and post-gadolinium T1-weighted (T1 and T1CE), T2-weighted (T2), and T2-weighted fluid-attenuated inversion recovery (T2-FLAIR). The dataset was acquired from multiple institutions with high variance in many aspects, including brain shape, appearance, and tumor morphology. The dataset [2, 9] comprises 1251 training and 219 validation brain MRI scans. Furthermore, the final model performance will be evaluated on the testing set that will not be released.

BraTS-PEDs. A total of 228 high-quality are acquired from 3 different institutions, including Children’s Brain Tumor Network (CBTN), Boston’s Children Hospital, and Yale University. The acquired MRI modalities are T1, T1Gd, T2, and T2-FLAIR. All the included pediatric subjects contain histologically-approved high-grade glioma, i.e., high-grade astrocytoma and diffuse midline glioma (DMG), including radiologically or histologically-proven diffuse intrinsic pontine glioma (DIPG) [10, 9]. The released training and validation datasets comprise 99 and 45 subjects, respectively.

Segmentation labels. The provided annotations consist of the GD-enhancing tumor (ET), the peritumoral edematous or invaded tissue (ED), and the necrotic tumor core (NCR). Instead of being evaluated on these labels, segmentation performance is assessed based on the different glioma sub-regions: enhancing tumor, tumor core (NCR + ET), and whole tumor (NCR + ET + ED).

Preprocessing. The provided MRI scans were preprocessed by co-registration of the four modalities to a standard SR124 template [17], isotropic interpolating to meet 1mm31𝑚superscript𝑚31mm^{3}1 italic_m italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT resolution and skull-stripping. All of the MRI image sizes are uniform 240×240×155240240155240\times 240\times 155240 × 240 × 155. We further preprocess each scan by cropping the foreground, normalizing voxels with non-zero intensities, and finally stacking the four modalities into a single image. Yet, we experienced a bottleneck when we applied the preprocessing steps on the fly. Thus, we preprocess all MRI scans and store them in the .npy format, and then load the Numpy arrays during training.

Refer to caption
Figure 1: The MedNeXt network [19].

2.2 Models

We conducted a comprehensive performance analysis by comparing 2 different segmentation models with varying sets of hyper-parameters. Initially, we employed MedNeXt [19], a novel 3D segmentation network inspired by ConvNeXt architecture, recently introduced into the field. Alongside this, we utilized SegResNet [14], a CNN-based segmentation model developed by the winning team of the BraTs 2018 challenge. These models were trained to predict 3 classes in 3 different output channels (TC, WT, ET); however, we also conducted experiments where we trained the models separately for one underperforming class (ET). In addition, our model input size is 128×128×128128128128128\times 128\times 128128 × 128 × 128. The models are illustrated in Figure 1 and Figure 2. The details are mentioned in the sections below.

MedNeXt. architecture draws inspiration from vision transformer [6] and incorporates them into the kernel segmentation network design. This combines the benefits of ConvNeXT [11]-like structures in a UNet [18]-like design. Consequently, MedNeXt harnesses the inherent strengths of CNN models while integrating transformer-inspired ConvNeXt blocks tailored for 3D segmentation tasks. Notably, this design also implements deep supervision (DS) that can alleviate the problem of vanishing gradients, thus enhancing model training. Our experimentation encompassed two variants of the MedNeXt model: ’base’ (B) and ’medium’ (M) from the standard MedNeXt implementation ***https://github.com/MIC-DKFZ/MedNeXt.

SegResNet. adopts a CNN-based encoder-decoder architecture that exhibits a relatively straightforward yet highly effective design for the 3D segmentation task. This architecture was originally introduced by the BraTs 2018 winning team, who achieved the highest dice score in segmenting tumor sub-regions by employing an ensemble of ten models. SegResNet has a ResNet-based [7] asymmetric architecture, containing UNet-like encoder-decoder blocks but with skip connections on both the encoder and decoder, allowing better gradient propagation. It also uses Group Normalization [20] that is suggested to work better by the authors, especially in small batch size scenarios.

Moreover, SegResNet incorporates an additional VAE (Variational Autoencoder) branch in the decoder during the training only. This VAE branch allows reconstructing the original image using features derived from the encoder bottleneck and does not contain skip connections from the encoder. This ensures a better regularization for the model training and allows the model to learn rich features for segmentation. During our experimentation, we conducted training iterations of both with and without the VAE regularization in order to assess its impact on the performance across the targeted tasks.

Refer to caption
Figure 2: The SegResNet network [14].

2.3 Inference

Prediction from a single network. Our model input size is 128×128×128128128128128\times 128\times 128128 × 128 × 128, smaller than the MR image size. We implement the sliding window inference technique with 0.5 overlaps to predict tumor probabilities for each voxel. We apply test-time-augmentation (TTA) by flipping an input image through all possible flip combinations (8 combinations) and aggregating the mean probabilities.

Ensemble. We train each model on the 5-fold CV setting, resulting in 5 trained networks for every training. To predict an input image during inference, we pass the image to each network to estimate tumor probabilities. Then, we aggregate the outputs from the 5-fold CV networks by taking the mean probabilities.

Ensembling multiple models can help improve overall performance [22] by leveraging the inherent strength of every model. In this work, we ensemble models output on probability level by weighted averaging to give importance to every model on each channel (weight_tc,weight_wt,weight_et)𝑤𝑒𝑖𝑔𝑡_𝑡𝑐𝑤𝑒𝑖𝑔𝑡_𝑤𝑡𝑤𝑒𝑖𝑔𝑡_𝑒𝑡(weight\_tc,weight\_wt,weight\_et)( italic_w italic_e italic_i italic_g italic_h italic_t _ italic_t italic_c , italic_w italic_e italic_i italic_g italic_h italic_t _ italic_w italic_t , italic_w italic_e italic_i italic_g italic_h italic_t _ italic_e italic_t ). The pseudocode of our model ensembling is given in Algorithm 1.

Algorithm 1 Model Ensembling
N𝑁Nitalic_N models with each corresponding weighting for every channel (TC, WT, ET) and an input brain MRI scan x3×H×W×D𝑥superscript3𝐻𝑊𝐷x\in\mathbb{R}^{3\times H\ \times W\times D}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT 3 × italic_H × italic_W × italic_D end_POSTSUPERSCRIPT
y𝟎3×H×W×D𝑦superscript03𝐻𝑊𝐷y\leftarrow\mathbf{0}^{3\times H\times W\times D}italic_y ← bold_0 start_POSTSUPERSCRIPT 3 × italic_H × italic_W × italic_D end_POSTSUPERSCRIPT
sumw𝟎3𝑠𝑢subscript𝑚𝑤superscript03sum_{w}\leftarrow\mathbf{0}^{3}italic_s italic_u italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ← bold_0 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
for n=1,2,N𝑛12𝑁n=1,2,\dots Nitalic_n = 1 , 2 , … italic_N do
     yy+𝚖𝚘𝚍𝚎𝚕𝚜[n](x)*𝚠𝚎𝚒𝚐𝚑𝚝𝚒𝚗𝚐𝚜[n]𝑦𝑦𝚖𝚘𝚍𝚎𝚕𝚜delimited-[]𝑛𝑥𝚠𝚎𝚒𝚐𝚑𝚝𝚒𝚗𝚐𝚜delimited-[]𝑛y\leftarrow y+\texttt{models}[n](x)*\texttt{weightings}[n]italic_y ← italic_y + models [ italic_n ] ( italic_x ) * weightings [ italic_n ]
     sumwsumw+𝚠𝚎𝚒𝚐𝚑𝚝𝚒𝚗𝚐𝚜[n]𝑠𝑢subscript𝑚𝑤𝑠𝑢subscript𝑚𝑤𝚠𝚎𝚒𝚐𝚑𝚝𝚒𝚗𝚐𝚜delimited-[]𝑛sum_{w}\leftarrow sum_{w}+\texttt{weightings}[n]italic_s italic_u italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ← italic_s italic_u italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT + weightings [ italic_n ]
end for
yy/sumw𝑦𝑦𝑠𝑢subscript𝑚𝑤y\leftarrow y/sum_{w}italic_y ← italic_y / italic_s italic_u italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT
return y

Postprocessing. The postprocessing step plays a crucial role in the overall performance, especially for this year’s competition, as the organizer decided to change the evaluation focus from study-wise to lesion-wise performance, where False positive (FP) and negative (FN) are penalized severely with 0.0 Dice and 374 HD95 scores. Our experiments show that raw segmentation prediction contains many FPs due to small-size predicted lesions. To alleviate this, we do the following for each output channel (TC, WT, and ET);

  1. 1.

    Perform thresholding with a specific threshold for each channel.

  2. 2.

    Perform connected component analysis to group predicted connected tumor voxels into lesions.

  3. 3.

    Filter every group based on tumorous voxel count and the mean of tumorous voxel probabilities.

In short, we implement two postprocessing functions as described in Algorithm 2 and Algorithm 3.

Algorithm 2 AsDiscrete(TTC,TWT,TETsubscript𝑇𝑇𝐶subscript𝑇𝑊𝑇subscript𝑇𝐸𝑇T_{TC},T_{WT},T_{ET}italic_T start_POSTSUBSCRIPT italic_T italic_C end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_W italic_T end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_E italic_T end_POSTSUBSCRIPT)
Threshold values for each channel (TTC,TWT,TETsubscript𝑇𝑇𝐶subscript𝑇𝑊𝑇subscript𝑇𝐸𝑇T_{TC},T_{WT},T_{ET}italic_T start_POSTSUBSCRIPT italic_T italic_C end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_W italic_T end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_E italic_T end_POSTSUBSCRIPT), and a predicted tumor heatmap x3×H×W×D𝑥superscript3𝐻𝑊𝐷x\in\mathbb{R}^{3\times H\ \times W\times D}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT 3 × italic_H × italic_W × italic_D end_POSTSUPERSCRIPT where the channels correspond to TC, WT, and ET respectively
0<TTC,TWT,TET<1formulae-sequence0subscript𝑇𝑇𝐶subscript𝑇𝑊𝑇subscript𝑇𝐸𝑇10<T_{TC},T_{WT},T_{ET}<10 < italic_T start_POSTSUBSCRIPT italic_T italic_C end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_W italic_T end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_E italic_T end_POSTSUBSCRIPT < 1
y𝟎3×H×W×D𝑦superscript03𝐻𝑊𝐷y\leftarrow\mathbf{0}^{3\times H\ \times W\times D}italic_y ← bold_0 start_POSTSUPERSCRIPT 3 × italic_H × italic_W × italic_D end_POSTSUPERSCRIPT
for w,h,d=𝚛𝚊𝚗𝚐𝚎(W),𝚛𝚊𝚗𝚐𝚎(H),𝚛𝚊𝚗𝚐𝚎(D)formulae-sequence𝑤𝑑𝚛𝚊𝚗𝚐𝚎𝑊𝚛𝚊𝚗𝚐𝚎𝐻𝚛𝚊𝚗𝚐𝚎𝐷w,h,d=\texttt{range}(W),\texttt{range}(H),\texttt{range}(D)italic_w , italic_h , italic_d = range ( italic_W ) , range ( italic_H ) , range ( italic_D ) do
     if x[1,w,h,d]TTC𝑥1𝑤𝑑subscript𝑇𝑇𝐶x[1,w,h,d]\geq T_{TC}italic_x [ 1 , italic_w , italic_h , italic_d ] ≥ italic_T start_POSTSUBSCRIPT italic_T italic_C end_POSTSUBSCRIPT then
         y[1,w,h,d]1𝑦1𝑤𝑑1y[1,w,h,d]\leftarrow 1italic_y [ 1 , italic_w , italic_h , italic_d ] ← 1
     end if
     if x[2,w,h,d]TWT𝑥2𝑤𝑑subscript𝑇𝑊𝑇x[2,w,h,d]\geq T_{WT}italic_x [ 2 , italic_w , italic_h , italic_d ] ≥ italic_T start_POSTSUBSCRIPT italic_W italic_T end_POSTSUBSCRIPT then
         y[2,w,h,d]1𝑦2𝑤𝑑1y[2,w,h,d]\leftarrow 1italic_y [ 2 , italic_w , italic_h , italic_d ] ← 1
     end if
     if x[3,w,h,d]TET𝑥3𝑤𝑑subscript𝑇𝐸𝑇x[3,w,h,d]\geq T_{ET}italic_x [ 3 , italic_w , italic_h , italic_d ] ≥ italic_T start_POSTSUBSCRIPT italic_E italic_T end_POSTSUBSCRIPT then
         y[3,w,h,d]1𝑦3𝑤𝑑1y[3,w,h,d]\leftarrow 1italic_y [ 3 , italic_w , italic_h , italic_d ] ← 1
     end if
end for
return y𝑦yitalic_y
Algorithm 3 FilterObjects(Ts,usubscript𝑇𝑠𝑢T_{s,u}italic_T start_POSTSUBSCRIPT italic_s , italic_u end_POSTSUBSCRIPT, Ts,lsubscript𝑇𝑠𝑙T_{s,l}italic_T start_POSTSUBSCRIPT italic_s , italic_l end_POSTSUBSCRIPT, Tp,usubscript𝑇𝑝𝑢T_{p,u}italic_T start_POSTSUBSCRIPT italic_p , italic_u end_POSTSUBSCRIPT, Tp,msubscript𝑇𝑝𝑚T_{p,m}italic_T start_POSTSUBSCRIPT italic_p , italic_m end_POSTSUBSCRIPT)
A predicted tumor heatmap and binary map xp,xbH×W×Dsubscript𝑥𝑝subscript𝑥𝑏superscript𝐻𝑊𝐷x_{p},x_{b}\in\mathbb{R}^{H\ \times W\times D}italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × italic_D end_POSTSUPERSCRIPT, upper size threshold (Ts,usubscript𝑇𝑠𝑢T_{s,u}italic_T start_POSTSUBSCRIPT italic_s , italic_u end_POSTSUBSCRIPT), lower size threshold (Ts,lsubscript𝑇𝑠𝑙T_{s,l}italic_T start_POSTSUBSCRIPT italic_s , italic_l end_POSTSUBSCRIPT), upper probability threshold (Tp,usubscript𝑇𝑝𝑢T_{p,u}italic_T start_POSTSUBSCRIPT italic_p , italic_u end_POSTSUBSCRIPT), and mid probability threshold (Tp,msubscript𝑇𝑝𝑚T_{p,m}italic_T start_POSTSUBSCRIPT italic_p , italic_m end_POSTSUBSCRIPT)
Ts,uTs,lsubscript𝑇𝑠𝑢subscript𝑇𝑠𝑙T_{s,u}\geq T_{s,l}italic_T start_POSTSUBSCRIPT italic_s , italic_u end_POSTSUBSCRIPT ≥ italic_T start_POSTSUBSCRIPT italic_s , italic_l end_POSTSUBSCRIPT, Ts,l0subscript𝑇𝑠𝑙0T_{s,l}\geq 0italic_T start_POSTSUBSCRIPT italic_s , italic_l end_POSTSUBSCRIPT ≥ 0, and 0Tp,u,Tp,m<0formulae-sequence0subscript𝑇𝑝𝑢subscript𝑇𝑝𝑚00\leq T_{p,u},T_{p,m}<00 ≤ italic_T start_POSTSUBSCRIPT italic_p , italic_u end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_p , italic_m end_POSTSUBSCRIPT < 0
y𝟎H×W×D𝑦superscript0𝐻𝑊𝐷y\leftarrow\mathbf{0}^{H\ \times W\times D}italic_y ← bold_0 start_POSTSUPERSCRIPT italic_H × italic_W × italic_D end_POSTSUPERSCRIPT
yccget_connected_components(xb)subscript𝑦𝑐𝑐get_connected_componentssubscript𝑥𝑏y_{cc}\leftarrow\texttt{get\_connected\_components}(x_{b})italic_y start_POSTSUBSCRIPT italic_c italic_c end_POSTSUBSCRIPT ← get_connected_components ( italic_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT )
Nccget_the_number_of_ccs(ycc)subscript𝑁𝑐𝑐get_the_number_of_ccssubscript𝑦𝑐𝑐N_{cc}\leftarrow\texttt{get\_the\_number\_of\_ccs}(y_{cc})italic_N start_POSTSUBSCRIPT italic_c italic_c end_POSTSUBSCRIPT ← get_the_number_of_ccs ( italic_y start_POSTSUBSCRIPT italic_c italic_c end_POSTSUBSCRIPT )
for n𝚛𝚊𝚗𝚐𝚎(Ncc)𝑛𝚛𝚊𝚗𝚐𝚎subscript𝑁𝑐𝑐n\in\texttt{range}(N_{cc})italic_n ∈ range ( italic_N start_POSTSUBSCRIPT italic_c italic_c end_POSTSUBSCRIPT ) do
     𝚜𝚒𝚣𝚎count_tumor_pixels_of_nth_cc(ycc,n)𝚜𝚒𝚣𝚎count_tumor_pixels_of_nth_ccsubscript𝑦𝑐𝑐𝑛\texttt{size}\leftarrow\texttt{count\_tumor\_pixels\_of\_nth\_cc}(y_{cc},n)size ← count_tumor_pixels_of_nth_cc ( italic_y start_POSTSUBSCRIPT italic_c italic_c end_POSTSUBSCRIPT , italic_n )
     𝚖𝚎𝚊𝚗get_mean_prob_of_nth_cc(xp,ycc,n)𝚖𝚎𝚊𝚗get_mean_prob_of_nth_ccsubscript𝑥𝑝subscript𝑦𝑐𝑐𝑛\texttt{mean}\leftarrow\texttt{get\_mean\_prob\_of\_nth\_cc}(x_{p},y_{cc},n)mean ← get_mean_prob_of_nth_cc ( italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_c italic_c end_POSTSUBSCRIPT , italic_n )
     if 𝚜𝚒𝚣𝚎Ts,u𝚜𝚒𝚣𝚎subscript𝑇𝑠𝑢\texttt{size}\geq T_{s,u}size ≥ italic_T start_POSTSUBSCRIPT italic_s , italic_u end_POSTSUBSCRIPT then
         if 𝚖𝚎𝚊𝚗Tp,u𝚖𝚎𝚊𝚗subscript𝑇𝑝𝑢\texttt{mean}\geq T_{p,u}mean ≥ italic_T start_POSTSUBSCRIPT italic_p , italic_u end_POSTSUBSCRIPT then yinsert_cc_to_y(y,ycc,n)𝑦insert_cc_to_y𝑦subscript𝑦𝑐𝑐𝑛y\leftarrow\texttt{insert\_cc\_to\_y}(y,y_{cc},n)italic_y ← insert_cc_to_y ( italic_y , italic_y start_POSTSUBSCRIPT italic_c italic_c end_POSTSUBSCRIPT , italic_n )
         end if
     else if Ts,l𝚜𝚒𝚣𝚎<Ts,usubscript𝑇𝑠𝑙𝚜𝚒𝚣𝚎subscript𝑇𝑠𝑢T_{s,l}\leq\texttt{size}<T_{s,u}italic_T start_POSTSUBSCRIPT italic_s , italic_l end_POSTSUBSCRIPT ≤ size < italic_T start_POSTSUBSCRIPT italic_s , italic_u end_POSTSUBSCRIPT then
         if 𝚖𝚎𝚊𝚗Tp,m𝚖𝚎𝚊𝚗subscript𝑇𝑝𝑚\texttt{mean}\geq T_{p,m}mean ≥ italic_T start_POSTSUBSCRIPT italic_p , italic_m end_POSTSUBSCRIPT then yinsert_cc_to_y(y,ycc,n)𝑦insert_cc_to_y𝑦subscript𝑦𝑐𝑐𝑛y\leftarrow\texttt{insert\_cc\_to\_y}(y,y_{cc},n)italic_y ← insert_cc_to_y ( italic_y , italic_y start_POSTSUBSCRIPT italic_c italic_c end_POSTSUBSCRIPT , italic_n )
         end if
     end if
end for
return y𝑦yitalic_y

2.4 Experimental setup

We follow the 5-fold CV training setting by partitioning the training data into five subsets, performing training on four of them, and validating one subset on each iteration. We train our networks based on the region-based training mechanism [8] for 150 epochs, batch size of 2, and apply on-the-fly data augmentation consisting of the random spatial crop to 128×128×128128128128128\times 128\times 128128 × 128 × 128 size, random flips, and random intensity scaling as well as shifting. For the objective function, we apply batch dice loss and focal loss with 2.0 γ𝛾\gammaitalic_γ and then sum them to get the total loss. We optimize our networks using AdamW optimizer [12], and we use the cosine-annealing with linear-warmup scheduler. The optimizer and scheduler hyperparameters are 1e-4 base learning rate (LR), 1e-6 weight decay, 8 warmup epochs, 1e-7 initial LR, 1e-6 final LR, and 150 maximum epochs. In addition, we conducted some experiments involving deep supervision by additionally applying the loss at each decoder stage and giving lower importance weights on lower resolutions by a factor of 1/2.

We apply AsDiscrete (Algorithm 2) and FilterObjects (Algorithm 3) on each channel at the output prediction. The postprocessing hyperparameters are selected experimentally as we found that different model configurations have different output characteristics, leading to different suboptimum hyperparameters.

3 Results

Refer to caption
Figure 3: The figure shows the qualitative results for the adult-glioma task on the validation sample for three cases.
Table 1: Ablation study on post-processing using 5-fold CV. We utilize MedNeXt B-3 with deep supervision. Our post-processing steps significantly affect the BraTS 2023 Score, while the Legacy Score is not much affected. Notes: (a) Test-time augmentations (TTA), (b) Replace ET to TC if total predicted ET area is small, (c) Filter connected components (tumor objects) based on size, (d) Filter connected components based on mean confidence of each tumor object.
BraTS 2023 Score Legacy Score
Dice HD95 Dice HD95
a b c d ET TC WT Avg ET TC WT Avg ET TC WT Avg ET TC WT Avg
78.82 84.55 70.88 78.08 47.66 33.84 93.57 58.36 87.14 91.38 93.29 90.60 11.69 7.50 6.59 8.59
79.99 86.30 77.45 81.25 43.25 26.62 67.94 45.94 87.24 91.40 93.46 90.70 11.82 7.19 6.54 8.52
80.91 86.30 77.45 81.55 41.26 26.62 67.94 45.27 88.08 91.40 93.40 90.98 11.06 7.19 6.54 8.26
85.51 88.63 89.06 87.74 21.83 19.28 22.96 21.36 88.29 91.10 93.34 90.91 10.49 9.89 7.02 9.13
84.94 88.42 89.61 87.66 24.47 18.36 20.83 21.22 88.19 91.46 93.48 91.04 10.34 7.47 6.35 8.06
86.01 88.70 89.44 88.05 20.59 19.11 21.59 20.43 88.38 91.07 93.36 90.94 10.35 10.10 6.81 9.09
Table 2: Adult-glioma performance on the validation leaderboard. DS indicates using Deep Supervision during training. The postprocessing hyperparameters for each submission were selected experimentally. The final postprocessing steps (*) are AsDiscrete(0.5, 0.5, 0.4), FilterObjects(2000, 100, 0.85, 0.925) for WT, FilterObjects(95, 70, 0.71, 0.5) for ET, FilterObjects(350, 350, 0, 0) for TC. The final ensemble weightings are MedNeXt-DS=(0,1,1)011(0,1,1)( 0 , 1 , 1 ), SegResNet-DS=(0,1,0)010(0,1,0)( 0 , 1 , 0 ), SegResNet=(1,0,0)100(1,0,0)( 1 , 0 , 0 ).
Dice HD95
Model ET TC WT Avg ET TC WT Avg
SegResNet 0.8280 0.8606 0.9044 0.8643 23.90 17.89 12.52 18.10
SegResNet-DS 0.8239 0.8595 0.9016 0.8617 27.46 16.24 13.48 19.06
MedNeXt 0.8400 0.8486 0.9059 0.8648 22.25 26.71 12.49 20.48
MedNeXt-DS 0.8363 0.8486 0.9051 0.8633 23.92 28.27 12.65 21.61
MedNeXt-DS
    + SegResNet-DS
0.8346 0.8622 0.9063 0.8677 23.70 18.71 11.70 18.04
*MedNeXt-DS
    + SegResNet-DS
    + SegResNet
0.8432 0.8627 0.9063 0.8707 17.37 13.10 11.70 14.06
Table 3: Performance on the test set in both tasks. Our final submission for the adult glioma segmentation task was ranked 3rdsuperscript3𝑟𝑑3^{rd}3 start_POSTSUPERSCRIPT italic_r italic_d end_POSTSUPERSCRIPT in the final test set leaderboard. DS indicates using Deep Supervision during training, and ET indicates the models specifically trained for class ET in the pediatric tumor segmentation task.
Dice HD95
Task Model ET TC WT Avg ET TC WT Avg
Adult
    Glioma
MedNeXt-DS +
     SegResNet-DS
0.8198 0.8233 0.8508 0.8313 35.15 39.86 34.12 36.38
Pediatric
    Tumors
SegResNet-ET +
    SegResNet
0.5522 0.77 0.7755 0.6992 45.32 30.35 30.45 35.37
Table 4: The table presents the validation results for BraTs-Pediatrics segmentation obtained from the leaderboard. In the WT (weighted-loss) training scheme, the focal loss weight is set to 2. Our final prediction is the result of various model combinations. Models specifically trained for class ET are marked as **-ET. The optimal performance was attained with a 5-Fold-CV of SegResNet (predicting WT and TC) and an additional 5-Fold-CV of SegResNet (predicting ET).
Dice HD95
Model ET TC WT Avg ET TC WT Avg
MedNeXt 0.3338 0.7829 0.8291 0.6586 202.33 16.76 20.96 60.17
SegResNet 0.2674 0.7930 0.8334 0.6313 183.82 19.98 19.32 55.94
SegResNet-WL 0.3238 0.7588 0.8060 0.6295 204.02 24.75 26.18 63.90
SegFormer ET + MedNeXt 0.5015 0.7829 0.8291 0.7045 149.81 16.76 20.97 47.06
MedNeXt ET model + MedNeXt 0.4004 0.7765 0.8325 0.6698 189.98 17.14 19.61 56.85
SegResNet ET model + SegResNet-VAE 0.5595 0.7777 0.8018 0.7130 124.91 21.56 29.76 44.23
SegResNet ET model + SegResNet 0.5595 0.78106 0.8206 0.7204 124.91 25.30 24.77 58.32

We trained our pipeline using the 5-fold cross-validation (CV) on the training data and performed an evaluation using the internal validation set to select the best set of hyperparameters. We further utilized highly effective post-processing steps (see section 1) on the model predictions to get the final output, which was submitted to the online leaderboard. After rigorous experimentation on different models on the internal validation set, we narrowed it down to two models for the external validation based on the online leaderboard. SegResNet and MedNeXt models are selected for the final submissions. The dice similarity coefficient (DSC) and 95% Hausdorff distance (HD95) from the online validation for the Adults-Glioma and Pediatrics task are mentioned in Table 2 and Table 4, respectively. Furthermore, the performance for both tasks on the final test set is given in Table 3.

Table 1 shows the effect of different post-processing steps on the performance evaluated using the local validation set. This ablations study used the MedNeXt-B-3 model with deep supervision (DS). The best-performing setting is achieved by using test-time augmentations (TTA) along with filtering connected components based on each tumor object’s size and mean confidence. We achieved approximately a 10% increase in average Dice score using and a large drop in the HD95 distance, suggesting the effectiveness of the used post-processing.

We further report the performance metrics on the adult Glioma task in table 2. In the vanilla 5-fold CV, both baseline models (SegResNet and MedNeXt) achieve a similar performance of approx. 0.86 mean DSC. Moreover, using DS shows a slight positive trend in HD95 distance. Following this, we create an ensemble of MedNeXt-DS, SegResNet-DS, and SegResNet without DS to achieve the highest performance in terms of mean DSC and HD95, 0.871 and 14.06, respectively. Our best-performing setting is a weighted combination of all three used models, where we used MedNeXt-DS for class TC and WT, SegResNet-DS for class TC, and SegResNet for ET class only. We used this combination for the final submission. Table 3 shows the performance on the hidden test set, where our approach achieved 0.8313 and 36.38 mean DSC and HD95, respectively. The qualitative results for the best, median, and worst performing validation samples are shown in Figure 3.

Refer to caption
Figure 4: The figure shows the qualitative results for the pediatric task on the validation sample for the best-performing model.

Following a similar experimental setting, we applied our proposed pipeline to the pediatric tumor segmentation task as well. Table 4 shows that the baseline MedNeXt and SegResNet models achieve a mean DSC of 0.65 and 0.63 and a mean HD95 of 60.17 and 55.94, respectively, with sub-optimal performance for the ET class. We developed two techniques to tackle this issue: (i) a weighted loss (WL) based model that penalizes the ET class more than other classes and (ii) a standalone model for the ET class only. While SegResNet-WL showed some improvement in the ET class performance, it did not perform on par with approach (ii), where a separate network is trained for the ET class only. Our best-performing model for this task is a combination of a SegResNet-ET and a multi-class SegResNet model, achieving a mean DSC of 0.72 and HD95 of 58.32. We used our best-performing combination for the test set submission. Table 3 shows that our approach achieved 0.6992 and 35.37 mean DSC and HD95 on the hidden test set, respectively. The qualitative results for the best, median, and worst performing validation samples are shown in figure 4.

4 Discussion

In this paper, we summarize our proposed methodology for the BraTS 2023 adult and pediatrics glioma segmentation competition. Our pipeline is based on the two highly efficient segmentation models, namely; MedNeXt and SegResNet. Our work benefited from the power of DS and an ensemble strategy involving heterogeneous models to tackle a challenging real-life problem. The primary focus was based on integrating suitable post-processing steps with deep supervision and the ensemble of diverse models, resulting in substantial performance gains. We conducted comprehensive experiments to show the significance of understanding the clinical problem and implementing domain-specific processing steps to augment the efficiency of deep-learning models, ultimately providing a significant boost in performance. The post-processing steps stood very helpful in following this year’s new scoring system that is designed carefully based on the clinical diagnosis and heavily penalizes missing even a small tumor region.

Furthermore, we used a similar approach for both tasks, motivated by the similarity between both tasks; however, the performance in both tasks remained significantly different. In the adult-glioma challenge, all three classes showed similar results, while the ET class in the pediatric dataset suffered much lower performance when compared to the other two classes. This discrepancy can potentially be due to the size of the ET regions as compared to the entire brain volume and other classes. The models find it hard to distinguish the regions, especially when one class is imbalanced, as in the case of the pediatric dataset. Another reason is that the ET class is a small area within the TC region, which itself is within the WT region. This suppresses the ET class significantly, affecting smooth model learning. Furthermore, the size of the available dataset for the pediatric task is significantly smaller than the adult-glioma dataset, which could also contribute to this performance difference in both tasks.

5 Conclusion

The paper studies different approaches to segmenting the tumor region in the brain. We used the BraTS 2023 challenge datasets for the adult glioma and pediatric tumor. Both datasets are multi-modal, multi-class segmentation tasks, with four modalities to input and three classes to predict. The automatic approach for segmenting tumor regions is highly beneficial for clinical practice, helping clinicians speed up their work. Our approach combines the advantages of deep supervision and an ensemble of models for the successful segmentation of the brain tumor from the MR images. To conclude, we achieve third place in the BraTS 2023 Adult Glioma Challenge.

References

  • [1] B. M. Alexander and T. F. Cloughesy. Adult glioblastoma. Journal of Clinical Oncology, 35(21):2402–2409, 2017.
  • [2] U. Baid, S. Ghodasara, S. Mohan, M. Bilello, E. Calabrese, E. Colak, K. Farahani, J. Kalpathy-Cramer, F. C. Kitamura, S. Pati, et al. The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv preprint arXiv:2107.02314, 2021.
  • [3] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. Kirby, J. Freymann, K. Farahani, and C. Davatzikos. Segmentation labels and radiomic features for the pre-operative scans of the tcga-gbm collection (2017). DOI: https://doi. org/10.7937 K, 9.
  • [4] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. Kirby, J. Freymann, K. Farahani, and C. Davatzikos. Segmentation labels and radiomic features for the pre-operative scans of the tcga-lgg collection. The cancer imaging archive, 286, 2017.
  • [5] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B. Freymann, K. Farahani, and C. Davatzikos. Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Scientific data, 4(1):1–13, 2017.
  • [6] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  • [7] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 630–645. Springer, 2016.
  • [8] F. Isensee, P. F. Jäger, P. M. Full, P. Vollmuth, and K. H. Maier-Hein. nnu-net for brain tumor segmentation. In A. Crimi and S. Bakas, editors, Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, pages 118–132, Cham, 2021. Springer International Publishing.
  • [9] A. Karargyris, R. Umeton, M. J. Sheller, et al. Federated benchmarking of medical artificial intelligence with medperf. Nature Machine Intelligence, 5:799–810, 2023.
  • [10] A. F. Kazerooni et al. The brain tumor segmentation (BraTS) challenge 2023: Focus on pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs). arXiv preprint arXiv:2305.17033, 2023.
  • [11] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022.
  • [12] I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
  • [13] B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y. Burren, N. Porz, J. Slotboom, R. Wiest, et al. The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging, 34(10):1993–2024, 2014.
  • [14] A. Myronenko. 3d mri brain tumor segmentation using autoencoder regularization. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, Part II, volume 4. Springer International Publishing, 2019.
  • [15] A. M. Owrangi, P. B. Greer, and C. K. Glide-Hurst. Mri-only treatment planning: benefits and challenges. Physics in Medicine & Biology, 63(5):05TR01, 2018.
  • [16] S. Pereira, A. Pinto, V. Alves, and C. A. Silva. Brain tumor segmentation using convolutional neural networks in mri images. IEEE transactions on medical imaging, 35(5):1240–1251, 2016.
  • [17] T. Rohlfing et al. The SRI24 multichannel atlas of normal adult human brain structure. Human Brain Mapping, 31(5):798–819, 2010.
  • [18] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention (MICCAI), 9351:234–241, 2015.
  • [19] S. Roy, G. Koehler, C. Ulrich, M. Baumgartner, J. Petersen, F. Isensee, P. F. Jaeger, and K. Maier-Hein. Mednext: Transformer-driven scaling of convnets for medical image segmentation. arXiv preprint arXiv:2303.09975, 2023.
  • [20] Y. Wu and K. He. Group normalization. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
  • [21] K. Xie, J. Yang, Z. Zhang, and Y. Zhu. Semi-automated brain tumor and edema segmentation using mri. European journal of radiology, 56(1):12–19, 2005.
  • [22] R. A. Zeineldin, M. E. Karar, O. Burgert, and F. Mathis-Ullrich. Multimodal cnn networks for brain tumor segmentation in mri: A brats 2022 challenge solution. In S. Bakas, A. Crimi, U. Baid, S. Malec, M. Pytlarz, B. Baheti, M. Zenk, and R. Dorent, editors, Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, pages 127–137, Cham, 2023. Springer Nature Switzerland.