Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
\melbaid\melbaauthors\firstpageno

1 \melbayear \datesubmitted \datepublished \melbaspecialissue \melbaspecialissueeditors \ShortHeadingsBenchmarking nnU-NetGunawardhana et al. \affiliations\addrAuckland Bioengineering Institute, University of Auckland, New Zealand

How good nnU-Net for Segmenting Cardiac MRI: A Comprehensive Evaluation

\firstnameMalitha \surnameGunawardhana    \nameFangqiang Xu    \nameJichao Zhao
Abstract

Cardiac segmentation is a critical task in medical imaging, essential for detailed analysis of heart structures, which is crucial for diagnosing and treating various cardiovascular diseases. With the advent of deep learning, automated segmentation techniques have demonstrated remarkable progress, achieving high accuracy and efficiency compared to traditional manual methods. Among these techniques, the nnU-Net framework stands out as a robust and versatile tool for medical image segmentation. In this study, we evaluate the performance of nnU-Net in segmenting cardiac magnetic resonance images (MRIs). Utilizing five cardiac segmentation datasets, we employ various nnU-Net configurations, including 2D, 3D full resolution, 3D low resolution, 3D cascade, and ensemble models. Our study benchmarks the capabilities of these configurations and examines the necessity of developing new models for specific cardiac segmentation tasks.

keywords:
MRI, Segmentation, nnU-Net, Benchmark

1 Introduction

Cardiovascular diseases (CVDs) accounted for an estimated 19.05 million deaths globally in 2020, reflecting an 18.71% increase from 2010. Despite this rise, the age-standardized death rate decreased by 12.19%, reaching 239.80 per 100,000 population. Additionally, the total crude prevalence of CVD worldwide reached 607.64 million cases in 2020, marking a 29.01% increase compared to 2010 (Tsao et al., 2023). These statistics underscore the urgent need for advanced diagnostic and therapeutic approaches in cardiology.

Accurate segmentation of cardiac structures is essential for understanding heart function, planning interventions, and monitoring disease progression. For example, locating and quantifying fibrosis and scars have been demonstrated to be valuable tools for treatment stratification of patients with atrial fibrillation (AF) (Allessie et al., 2002; Boldt et al., 2004) and ventricular tachycardia (Ukwatta et al., 2015). These techniques provide critical guidance for surgical or ablation procedures Vergara and Marrouche (2011), and imaging of post-ablation scars offers valuable insights into treatment outcomes Peters et al. (2007).

Cardiac segmentation involves the precise delineation of key anatomical structures within the heart, including the myocardium, ventricles, atria, and major vessels. In particular, Late Gadolinium Enhancement Magnetic Resonance Imaging (LGE-MRI) has emerged as an invaluable technique in cardiac imaging. LGE-MRI excels in highlighting areas of myocardial scarring and fibrosis, which are critical indicators in the diagnosis and management of various cardiac conditions, including myocardial infarction, cardiomyopathies, and arrhythmias (Akkaya et al., 2013; Bisbal et al., 2014).

Historically, manual segmentation by expert radiologists and cardiologists has been considered the gold standard for cardiac image analysis. However, this method is hindered by significant limitations, particularly its time-consuming nature, often requiring hours for a single dataset, making it impractical for busy clinical settings (Tobon-Gomez et al., 2015). The advent of automated segmentation methods, especially those utilizing deep learning techniques, has transformed the field of cardiac imaging analysis. These methods offer substantial advantages over traditional manual approaches, including consistency and reproducibility by eliminating inter-observer variability, rapid analysis with deep learning models capable of segmenting cardiac structures within seconds, scalability for application to large datasets, and the potential for continuous improvement as models can be fine-tuned and updated with increasing data availability.

Over the past decade, numerous approaches have been developed for automated cardiac segmentation, each with its own strengths and limitations. These methods have explored various approaches to improve segmentation accuracy and robustness, including utilizing uncertainty (Yang et al., 2019; Arega et al., 2022), semi-supervised learning (Shi et al., 2024; Mazher et al., 2022), curriculum learning (Jiang et al., 2022), and multi-task learning (Chen et al., 2019).

Despite these advancements, there remains a notable gap in the literature regarding the comprehensive evaluation of one particular architecture that has shown remarkable success in medical image segmentation across various domains: the nnU-Net (no-new-Net) (Isensee et al., 2021). The nnU-Net is a self-configuring method based on the U-Net architecture that automatically adapts preprocessing, network architecture, training, and post-processing to the specifics of a given dataset. While nnU-Net has demonstrated state-of-the-art performance in numerous biomedical segmentation challenges (Isensee et al., 2024), its potential in the specific context of cardiac segmentation has not been thoroughly explored. This presents a significant research opportunity, as cardiac MRI poses unique challenges due to its high contrast between normal and scarred myocardium, potential artefacts, and variability in image quality across different scanners and institutions.

In this study, we aim to bridge this knowledge gap by conducting a comprehensive analysis of nnU-Net’s performance in segmenting cardiac MRI. We utilize five widely used datasets for this task. Those are LAScarQS 2022 dataset  (Zhuang et al., 2023), 2018 LASC dataset  (Xiong et al., 2021), ACDC dataset (Bernard et al., 2018), MnM  (Campello et al., 2021) and MnM2 datasets (Martín-Isla et al., 2023). To the best of our knowledge, this is the first study to focus exclusively on this combination of methodology and imaging modality. By conducting this comprehensive analysis, we aim to provide the medical imaging community with valuable insights into the capabilities and limitations of nnU-Net for segmenting cardiac MRI. Our findings could potentially influence future directions in algorithm development, clinical adoption of automated segmentation tools, and standardization efforts in cardiac imaging analysis.

The remainder of this paper is organized as follows: Section 2 provides a detailed background on the nnU-Net architecture. Section 3 describes our methodology, including dataset preparation, experimental setup, and evaluation metrics. Section 4 presents our results and analysis. Section 5 discusses the implications of our findings, the limitations of the study, and future research directions.

2 nnU-net architecture

The nnU-Net framework is specifically designed for semantic segmentation, capable of handling both 2D and 3D images with various input modalities or channels. It adeptly processes voxel spacings and anisotropies and exhibits robustness even in scenarios where class distributions are highly imbalanced. Utilizing supervised learning, nnU-Net necessitates the provision of annotated training cases tailored to the application at hand. The quantity of required training cases can vary significantly depending on the complexity of the segmentation task, though nnU-Net often requires fewer cases than other solutions due to its extensive data augmentation strategies.

A key expectation for nnU-Net is its ability to process entire images during both the preprocessing and postprocessing stages, making it unsuitable for exceedingly large images. Nevertheless, it has been successfully tested on images ranging from 40x40x40 pixels up to 1500x1500x1500 in 3D, and from 40x40 up to approximately 30000x30000 in 2D. The capacity for handling larger images is contingent on the available RAM.

When presented with a new dataset, nnU-Net systematically analyzes the provided training cases to generate a ’dataset fingerprint’. Based on this analysis, it constructs several U-Net configurations tailored to the dataset:

  • 2D U-Net :- Applicable for both 2D and 3D datasets.

  • 3D Full Resolution U-Net :- Operates on high-resolution images and is intended for 3D datasets

  • 3D Low Resolution U-Net :- Operates on low-resolution images

  • 3D Cascade Full Resolution U-Net:- A 3D U-Net cascade where an initial low-resolution 3D U-Net refines predictions through a subsequent high-resolution 3D U-Net. This configuration is applied to large 3D datasets.

For datasets with smaller image sizes, the U-Net cascade (and thus the 3D low-resolution configuration) is excluded, as the patch size of the full-resolution U-Net is sufficient to cover a significant portion of the input images. The configuration of nnU-Net’s segmentation pipelines is based on a three-step approach:

  • Fixed Parameters: These parameters remain constant and are not adapted. Through the development of nnU-Net, a robust configuration was identified that includes the loss function, most data augmentation strategies, and the learning rate.

  • Rule-Based Parameters: These parameters are adjusted based on the dataset fingerprint using heuristic rules. For instance, network topology, which includes pooling behaviour and network depth, is adapted to the patch size. The patch size, network topology, and batch size are optimized jointly, considering GPU memory constraints.

  • Empirical Parameters: These parameters are determined through trial and error. This involves selecting the most suitable U-Net configuration for the dataset (2D, 3D full resolution, 3D low resolution, 3D cascade) and optimizing the postprocessing strategy.

nnU-Net’s systematic approach to configuring segmentation pipelines based on dataset-specific characteristics and robust default settings makes it a versatile and powerful tool for semantic segmentation tasks.

3 Experiment

3.1 Datasets

Refer to caption
Figure 1: Visualization of the long axis and short axis views in both end diastole and end systole phases for the MnM2 dataset. The right ventricle (RV) is highlighted in white, the Myyocardium (MYO) is highlighted in yellow, and the Left Ventricle (LV) is highlighted in red.
Table 1: Summary of Cardiac MRI Datasets
Dataset Task Labels Training Testing
LAScarQS Task 1 LA cavity, scars 50 10
Task 2 LA cavity 130 20
LASC - LA cavity 100 54
ACDC End-Diastole LV, MYO, RV 100 50
End-Systole 100 50
MnM-1 End-Diastole LV, MYO, RV 150 136
End-Systole 150 136
MnM-2 Short Axis, End-Diastole LV, MYO, RV 200 160
Short Axis, End-Systole 200 160
Long Axis, End-Diastole 200 160
Long Axis, End-Systole 200 160

In this study, we utilized five datasets. Those are namely Left atrial and Scar Quantification and segmentation Challenge (LAScarQS) 2022 dataset (Zhuang et al., 2023), 2018 Atria Segmentation Challenge (LASC) (Xiong et al., 2021), Automated Cardiac Diagnosis Challenge (ACDC)-2017 (Bernard et al., 2018), Multi-Centre, Multi-Vendorand Multi-Disease Cardiac Image Segmentation Challenge (MnM)  (Campello et al., 2021) and MnM2 (Martín-Isla et al., 2023).

3.1.1 LAScarQS Challenge Dataset

The LAScarQS challenge encompasses two primary tasks. The first task involves segmenting the left atrium (LA) cavity and scars, while the second task focuses solely on segmenting the LA cavity. For Task 1, the dataset includes 60 training images with corresponding labels and 10 validation images without labels. Task 2 provides 130 training images with labels and 20 validation images without labels. Consequently, only the training data can be utilized for both training and testing purposes. For Task 1, we allocated 50 images for training and the remaining 10 for testing. For Task 2, we used 115 images for training and 15 for testing.

The LGE-MRIs in this challenge were sourced from the University of Utah, Beth Israel Deaconess Medical Center, and King’s College London. The scans were performed using Siemens Avanto 1.5 T, Siemens Vario 3 T, or Philips Acheiva 1.5 T MRI machines. Scans were acquired either free-breathing with navigator-gating or using navigator-gating with fat suppression. The spatial resolution of the scans varied: 1.25 × 1.25 × 2.5 mm, 1.4 × 1.4 × 1.4 mm, or 1.3 × 1.3 × 4.0 mm. Patients underwent MRI scans either before undergoing ablation or between one and six months post-ablation.

3.1.2 2018 Left Atria Segmentation Challenge (LASC) Dataset

The 2018 Left Atria Segmentation Challenge (LASC) concentrated on the segmentation of the LA cavity. The dataset included 100 training images and 54 testing images, all provided with 3D binary masks of the LA cavity. Each LGE-MRI scan featured a spatial resolution of 0.625 × 0.625 × 0.625 mm3, with spatial dimensions of either 576 × 576 × 88 or 640 × 640 × 88 pixels. These clinical images were obtained using either a 1.5 Tesla Avanto or a 3.0 Tesla Verio whole-body scanner (Siemens Medical Solutions, Erlangen, Germany). The LA cavity volumes were meticulously segmented in consensus and agreement by three trained observers, ensuring the provision of high-quality ground truth annotations for both training and evaluation.

3.1.3 Automated Cardiac Diagnosis Challenge (ACDC) 2017 Dataset

The Automated Cardiac Diagnosis Challenge (ACDC) 2017 dataset comprises 150 MRI scans categorized into five subgroups: normal, previous myocardial infarction, dilated cardiomyopathy, hypertrophic cardiomyopathy, and abnormal right ventricle. These scans were collected over six years using two MRI scanners with different magnetic strengths: 1.5 Tesla (Siemens Area, Siemens Medical Solutions, Germany) and 3.0 Tesla (Siemens Trio Tim, Siemens Medical Solutions, Germany). Cine MRI images were acquired under breath-hold conditions using either retrospective or prospective gating, with a steady-state free precession (SSFP) sequence in the short-axis orientation. The scans consist of a series of short-axis slices covering the left ventricle (LV) from base to apex, with a slice thickness of 5 mm (occasionally 8 mm) and sometimes an interslice gap of 5 mm, resulting in images spaced every 5 or 10 mm depending on the examination. The spatial resolution ranges from 1.37 to 1.68 mm2/pixel, and each series includes 28 to 40 images, covering the cardiac cycle completely or partially. The dataset is divided into 100 training images and 50 testing images for the segmentation of the left ventricle (LV), myocardium (MYO), and right ventricle (RV) during both end-systolic (ES) and end-diastolic (ED) phases.

3.1.4 Multi-Centre, Multi-Vendor, and Multi-Disease Cardiac Image Segmentation Challenge (MnM-1 and MnM-2)

The MnM challenge has been conducted twice, first in 2020 (MnM-1) and then in 2021 (MnM-2). MnM-1 included a total of 345 scans, with 209 images designated for training and 136 for testing. Participants were tasked with segmenting the left ventricle (LV), myocardium (MYO), and right ventricle (RV) in both end-systolic (ES) and end-diastolic (ED) phases. The scans were obtained from clinical centres located in three countries—Spain, Germany, and Canada—and utilized four different magnetic resonance scanner vendors: Siemens, General Electric, Philips, and Canon.

MnM-2 provided a training set of 200 images and a testing set of 160 images. Similar to MnM-1, segmentation was required for the LV, MYO, and RV in both ES and ED phases. However, MnM-2 included both Short-Axis (ShA) and long-axis (LoA) views. The LoA view shows the heart from base to apex, essentially cutting the heart vertically, while the ShA view cuts the heart horizontally, perpendicular to the long axis. It shows circular cross-sections of the ventricles. Figure 1 shows the LoA and ShA for both ES and ED phases. The data for MnM-2 were acquired from clinical centers in Spain, using three different MRI scanner vendors: Siemens, General Electric, and Philips.

A summary of the datasets is shown in Table 1.

3.2 Implementation Details

In this study, we employed nnU-Net, which supports training under five main conditions: 2D, 3D full resolution, 3D low resolution, 3D cascade, and ensemble. However, it was not feasible to evaluate certain datasets using the 3D low resolution and cascade configurations. For datasets with small image sizes, the U-Net cascade (and consequently the 3D low-resolution configuration) was omitted because the patch size of the full-resolution U-Net already covered a substantial portion of the input images.

The models were trained using an NVIDIA A100 80GB PCIe GPU over 1000 epochs, beginning with an initial learning rate of 0.01. The Stochastic Gradient Descent (SGD) optimizer was employed for the training process. To ensure robust and reliable model performance, we implemented five-fold cross-validation. Additionally, for tasks with multiple labels, the models were trained as multi-class segmentation tasks.

Refer to caption
(a) Ground truth
Refer to caption
(b) 2D nnU-Net
Refer to caption
(c) 3D full resolution
Refer to caption
(d) 3D low resolution
Refer to caption
(e) 3D cascade
Refer to caption
(f) Ensemble
Figure 2: Comparison of Ground Truth and Predictions from different variations of nnU-Nets for the LAScarQs Task 1. The Left Atrial (LA) cavity is highlighted in green, and LA scars are highlighted in blue. Visualized using Amira 3D software.

3.3 Evaluation Metrics

To assess the performance of our segmentation models, we employ a comprehensive set of evaluation metrics: Dice Similarity Coefficient (DSC), Jaccard Index, Hausdorff Distance (HD), Mean Surface Distance (MSD), and the 95th percentile Hausdorff Distance (HD95). Each of these metrics provides unique insights into different aspects of the segmentation quality, offering a holistic view of model performance.

3.3.1 Dice Score

The Dice Similarity Coefficient (DSC) is a measure of overlap between the predicted segmentation and the ground truth, calculated as twice the area of overlap divided by the total number of pixels in both the predicted and ground truth masks. A higher DSC indicates better performance, signifying a greater degree of similarity between the predicted and actual segmentations.

DSC=2|PQ||P|+|Q|𝐷𝑆𝐶2𝑃𝑄𝑃𝑄DSC=\frac{2\cdot|P\cap Q|}{|P|+|Q|}italic_D italic_S italic_C = divide start_ARG 2 ⋅ | italic_P ∩ italic_Q | end_ARG start_ARG | italic_P | + | italic_Q | end_ARG (1)

where P and Q are the ground truth and predicted masks.

3.3.2 Jaccard Index

The Jaccard Index, also known as the Intersection over Union (IoU), quantifies the similarity between the predicted and ground truth segmentations. It is defined as the area of overlap divided by the area of the union of the predicted and ground truth masks. Like the DSC, a higher Jaccard Index denotes better segmentation performance.

Jaccard=|PQ||PQ|𝐽𝑎𝑐𝑐𝑎𝑟𝑑𝑃𝑄𝑃𝑄Jaccard=\frac{|P\cap Q|}{|P\cup Q|}italic_J italic_a italic_c italic_c italic_a italic_r italic_d = divide start_ARG | italic_P ∩ italic_Q | end_ARG start_ARG | italic_P ∪ italic_Q | end_ARG (2)

where P and Q are the ground truth and predicted masks.

3.3.3 Hausdorff Distance

The Hausdorff Distance (HD) measures the maximum distance from a point in the predicted segmentation to the nearest point in the ground truth segmentation, thus indicating the worst-case boundary discrepancy. Lower HD values indicate more accurate boundary delineation.

HD(P,Q)=max(h(P,Q),h(Q,P))𝐻𝐷𝑃𝑄𝑃𝑄𝑄𝑃HD(P,Q)=\max(h(P,Q),h(Q,P))italic_H italic_D ( italic_P , italic_Q ) = roman_max ( italic_h ( italic_P , italic_Q ) , italic_h ( italic_Q , italic_P ) ) (3)

where h(P,Q)𝑃𝑄h(P,Q)italic_h ( italic_P , italic_Q ) is the oriented Hausdorff distance from P𝑃Pitalic_P to Q𝑄Qitalic_Q:

h(P,Q)=maxpiPminqjQρ(pi,qj)𝑃𝑄subscriptsubscript𝑝𝑖𝑃subscriptsubscript𝑞𝑗𝑄𝜌subscript𝑝𝑖subscript𝑞𝑗h(P,Q)=\max_{p_{i}\in P}\min_{q_{j}\in Q}\rho(p_{i},q_{j})italic_h ( italic_P , italic_Q ) = roman_max start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_P end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_Q end_POSTSUBSCRIPT italic_ρ ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) (4)

and ρ(pi,qj)𝜌subscript𝑝𝑖subscript𝑞𝑗\rho(p_{i},q_{j})italic_ρ ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) is the Euclidean distance between points pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and qjsubscript𝑞𝑗q_{j}italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

3.3.4 Mean Surface Distance

The Mean Surface Distance (MSD) calculates the average distance between points on the surface of the predicted segmentation and the nearest points on the surface of the ground truth segmentation. Lower MSD values suggest closer average alignment between the predicted and actual boundaries.

MSD(P,Q)=1|P|piPminqjQρ(pi,qj)𝑀𝑆𝐷𝑃𝑄1𝑃subscriptsubscript𝑝𝑖𝑃subscriptsubscript𝑞𝑗𝑄𝜌subscript𝑝𝑖subscript𝑞𝑗MSD(P,Q)=\frac{1}{|P|}\sum_{p_{i}\in P}\min_{q_{j}\in Q}\rho(p_{i},q_{j})italic_M italic_S italic_D ( italic_P , italic_Q ) = divide start_ARG 1 end_ARG start_ARG | italic_P | end_ARG ∑ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_P end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_Q end_POSTSUBSCRIPT italic_ρ ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) (5)

3.3.5 95th percentile Hausdorff Distance

The 95th percentile Hausdorff Distance (HD95) is similar to the HD but focuses on the 95th percentile of the distances between the predicted and ground truth surfaces, thereby mitigating the impact of outliers. A lower HD95 value indicates more consistent boundary accuracy, discounting extreme deviations.

Together, these metrics provide a robust framework for evaluating segmentation performance, with higher DSC and Jaccard Index values and lower HD, MSD, and HD95 values indicating superior model performance.

Refer to caption
Figure 3: Comparison of ground truth and predictions from different nnU-Net Versions (2D, 3D Full Resolution, 3D Low Resolution, 3D Cascade, and Ensemble) in three anatomical views: Axial, Sagittal, and Coronal for the LASC dataset. The cavity area is highlighted in Red. Visualized using ITK-SNAP software.

4 Results

Table 2: Performance of LAScarQS (Task 1). The best cavity segmentation values are in red, and the best scar segmentation values are in blue. DSC- Dice Score, HD - Hausdorff Distance, MSD- Mean Surface Distance, HD95-95th percentile of HD.
Model Label DSC Jaccard HD MSD HD95
2D Cavity 0.926 0.863 12.952 0.805 3.402
2D Scar 0.438 0.283 37.166 2.539 13.036
3D full Cavity 0.939 0.884 12.622 0.666 3.088
3D full Scar 0.443 0.288 37.060 2.512 12.620
3D low Cavity 0.937 0.882 13.942 0.711 3.254
3D low Scar 0.411 0.262 37.294 2.789 13.425
3D cas Cavity 0.939 0.885 12.601 0.674 3.138
3D cas Scar 0.449 0.293 38.125 2.530 12.554
Ensem Cavity 0.939 0.886 12.486 0.663 3.041
Ensem Scar 0.439 0.285 37.078 2.590 12.850
Table 3: Performance comparison of dice scores in nnU-Net variations and other models in LAScarQS-Task1 for scar and cavity segmentation.
Paper Scars Cavity
Punithakumar and Noga (2022) 0.660 0.907
Jiang et al. (2022) 0.641 0.902
Arega et al. (2022) 0.634 0.898
Mazher et al. (2022) 0.602 0.875
Zhang et al. (2022b) 0.598 0.880
Lefebvre et al. (2022) 0.553 0.938
nnU (2D) 0.439 0.926
nnU (3D full res) 0.443 0.939
nnU (3D low res) 0.411 0.937
nnU (3D cascade) 0.449 0.939
nnU (Ensemble) 0.439 0.939

4.1 LAScarQS

Table 4: Performance of LAScarQS (Task 2). The best cavity segmentation values are in blue. DSC- Dice Score, HD - Hausdorff Distance, MSD- Mean Surface Distance, HD95-95th percentile of HD
Model DSC Jaccard HD MSD HD95
2D 0.930 0.869 13.971 0.733 3.018
3D full res. 0.937 0.882 12.971 0.672 2.880
3D low res 0.935 0.879 12.741 0.692 3.069
3D cascade 0.937 0.882 12.807 0.667 2.746
Ensemble 0.938 0.883 12.767 0.652 2.737

When comparing our methods to others, the LAScarQS Task 1 scar segmentation exhibited the most significant difference, with other methods surpassing the nnU-Net models by 21.1%. Additionally, the HD values for scar segmentation are notably higher (Tables 2 and 3). However, nnU-Net models achieve superior performance in cavity segmentation, despite their lower results in scar segmentation. This trend is also observed in LAScarQS Task 2 cavity segmentation (Table 5), where even the nnU-Net (2D) model outperforms other methods. In Task 2, the nnU-Net ensemble model achieves the best performance in both Dice score and MSD metrics, while the nnU-Net (3D low res) model achieves the best performance for HD. nnU-Net is able to perform competitively even with lesser data compared to other methods in the challenge. The nnU-Net models achieve higher performance metrics not only in dice scores but also in HD and MSD matrices. Figure 2 provides a qualitative comparison of LAScarQS Task1 performance, visualized using Amira 3D software (Stalling et al., 2005).

Table 5: Performance comparison of nnU-Net variations and other models in LAScarQS-Task2. DSC- Dice Score, HD - Hausdorff Distance, MSD- Mean Surface Distance
Paper DSC HD MSD
Lefebvre et al. (2022) 0.889 26.270 2.179
Tu et al. (2022) 0.890 17.124 1.706
Liu et al. (2022a) 0.866
Zhang et al. (2022b) 0.890 16.450 1.715
Zhang et al. (2022a) 0.878 0.710
Khan et al. (2022) 0.846 105.700 3.390
Xie et al. (2022) 0.872 22.394
Zhou et al. (2022) 0.875 24.731 2.233
Jiang et al. (2022) 0.881 18.755 1.782
Li and Li (2022) 0.883 20.883 1.794
Arega et al. (2022) 0.890 16.907 1.720
Punithakumar and Noga (2022) 0.893 15.860 1.613
Mazher et al. (2022) 0.886 18.389 1.813
Singh et al. (2023a) 0.929 12.960 0.890
Singh et al. (2023b) 0.919 15.430 -
nnU (2D) 0.930 13.971 0.733
nnU (3D full res) 0.937 12.971 0.672
nnU (3D low res) 0.935 12.741 0.692
nnU (3D cascade) 0.937 12.807 0.667
nnU (Ensemble) 0.938 12.767 0.652

4.2 LASC

For the LASC dataset, the ensemble model achieves the highest performance (Table 6). According to Table 7, nnU-Net demonstrates competitive performance with other methods, with only Singh et al. (2023a) surpassing nnU-Net by 0.1%. Interestingly, even the nnU-Net (2D) model shows competitive performance compared to the latest models (Xu et al., 2024). nnU-Net is able to surpass the novel method even without additional configurations. We assess the qualitative performance of the nnU-Nets using ITK-SNAP software (Yushkevich et al., 2016) as shown in Figure 3 for axial, sagittal and coronal views.

Table 6: Performance of LASC dataset. The best cavity segmentation values are in blue. DSC- Dice Score, HD - Hausdorff Distance, MSD- Mean Surface Distance, HD95-95th percentile of HD
Model DSC Jaccard HD MSD HD95
2D 0.926 0.863 17.583 1.052 3.930
3D full res. 0.933 0.875 17.485 0.972 3.681
3D low res 0.931 0.872 16.877 0.991 3.727
3D cascade 0.933 0.874 17.553 0.984 3.756
Ensemble 0.934 0.877 16.873 0.954 3.628
Table 7: Performance comparison of nnU-Net variations and other models in LASC dataset. DSC- Dice Score
Publication DSC
Xia et al. (2019) 0.932
Bian et al. (2018) 0.926
Vesal et al. (2019) 0.925
Yang et al. (2019) 0.925
Li et al. (2019) 0.923
Chen et al. (2022a) 0.920
Chen et al. (2023) 0.932
Li et al. (2023) 0.919
Liu et al. (2019) 0.903
Borra et al. (2019) 0.898
Puybareau et al. (2018) 0.923
Uslu et al. (2021) 0.920
Chen et al. (2021) 0.913
Chen et al. (2022b) 0.923
Qi et al. (2023) 0.921
Zhao et al. (2023) 0.911
Singh et al. (2023a) 0.935
Singh et al. (2023b) 0.934
Milletari et al. (2016) 0.919
Lourenço et al. (2021) 0.910
Zhao et al. (2021) 0.918
Liu et al. (2022b) 0.920
Xu et al. (2024) 0.926
nnU (2D) 0.926
nnU (3D full res) 0.933
nnU (3D low res) 0.931
nnU (3D cascade) 0.933
nnU (Ensemble) 0.934

4.3 ACDC

Performance evaluation of the ACDC dataset is conducted under two main conditions: End-Diastole (ED) (Table 8) and End-Systole (ES) (Table 9). In both cases, the ensemble method demonstrates superior performance compared to other variations of nnU-Nets. Surprisingly, the 2D nnU-Net exhibits better performance than both 3D and ensemble models in RV segmentation of the ACDC-ED phase. When compared to other approaches (Table  10), the nnU-Net lags in LV segmentation in both ED and ES phases, with differences of 2.4% and 4.6%, respectively, for dice score. This pattern is also observed in the MYO segmentation, where other methods surpass the nnU-Net maximum dice score values by 0.8% in both ED and ES phases. However, in RV segmentation, the nnU-Net shows superior performance in both ED and ES phases in both dice score values and HD values. In Figure 4, we compare the performance of ground truth and nnU-net (2D), nnU-Net (3D full res) and Ensemble models in both ED and ES phases.

Refer to caption
Figure 4: Comparison of Ground Truth and Predictions from nnU-Net Variants (2D, 3D Full Resolution, and Ensemble) on the ACDC Dataset for End Systole (ES) and End Diastole (ED) Phases. The right ventricle (RV) is highlighted in red, the myocardium (MYO) is in yellow, and the left ventricle (LV) is in white.
Refer to caption
Figure 5: Comparison of Ground Truth and Predictions from nnU-Net Variants (2D, 3D Full Resolution, and Ensemble) on the MnM Dataset for End Systole (ES) and End Diastole (ED) Phases. The right ventricle (RV) is highlighted in green, the myocardium (MYO) is in yellow, and the left ventricle (LV) is in red.
Table 8: Performance of ACDC Dataset for End-Diastole (ED) phase. The best values for the Left Ventricle (LV), Myocardium (MYO), and Right Ventricle (RV) are highlighted in orange, blue, and red, respectively. DSC- Dice Score, HD - Hausdorff Distance, MSD- Mean Surface Distance, HD95-95th percentile of HD, LV- Left Ventricle, MYO- Myocardium, RV- Right Ventricle.
Model Label DSC Jaccard HD MSD HD95
2D LV 0.942 0.892 10.438 0.467 3.152
2D MYO 0.897 0.814 10.050 0.331 1.583
2D RV 0.965 0.933 6.739 0.347 2.350
3D full res. LV 0.934 0.880 11.494 0.617 3.861
3D full res. MYO 0.889 0.801 8.057 0.367 2.135
3D full res. RV 0.959 0.922 8.486 0.443 2.720
Ensemble LV 0.944 0.896 10.716 0.459 3.110
Ensemble MYO 0.898 0.816 9.884 0.325 1.818
Ensemble RV 0.963 0.930 9.584 0.404 2.474
Table 9: Performance of ACDC Dataset for End-Systole (ES) phase. The best values for the Left Ventricle (LV), Myocardium (MYO), and Right Ventricle (RV) are highlighted in orange, blue, and red, respectively. DSC- Dice Score, HD - Hausdorff Distance, MSD- Mean Surface Distance, HD95-95th percentile of HD, LV- Left Ventricle, MYO- Myocardium, RV- Right Ventricle.
Model Label DSC Jaccard HD MSD HD95
2D LV 0.885 0.799 12.678 0.827 4.318
2D MYO 0.913 0.841 8.231 0.384 1.975
2D RV 0.927 0.868 6.795 0.483 2.713
3D full res. LV 0.882 0.793 12.743 0.911 5.245
3D full res. MYO 0.906 0.829 8.785 0.456 2.551
3D full res. RV 0.901 0.831 9.028 0.972 4.968
Ensemble LV 0.892 0.809 12.200 0.751 4.208
Ensemble MYO 0.915 0.844 8.460 0.384 2.193
Ensemble RV 0.922 0.861 8.321 0.608 3.477
Table 10: Performance comparison of nnU-Net variations and other models in ACDC datset. LV- Left Ventricle, MYO- Myocardium, RV- Right Ventricle, ED- End Dystole, ES - End Systole, DSC- Dice Score, HD - Hausdorff Distance.
Method LV MYO RV
ED ES ED ES ED ES
DSC HD DSC HD DSC HD DSC HD DSC HD DSC HD
Guo et al. (2021) 0.968 5.814 0.935 7.361 0.906 7.469 0.923 7.702 0.955 8.877 0.894 11.649
Isensee et al. (2018) 0.967 5.476 0.928 6.921 0.904 7.014 0.923 7.328 0.951 8.205 0.904 11.665
Simantiris and Tziritas (2020) 0.967 6.366 0.928 7.573 0.891 8.264 0.904 9.575 0.936 13.289 0.889 14.367
Berihu Girum et al. (2021) 0.968 6.422 0.916 9.305 0.894 8.998 0.906 9.922 0.939 11.326 0.893 13.306
Ammar et al. (2021) 0.968 7.993 0.911 10.528 0.891 10.575 0.901 13.891 0.929 14.189 0.886 16.042
Zotti et al. (2018b) 0.964 6.180 0.912 8.386 0.886 9.586 0.902 9.291 0.934 11.052 0.885 12.650
Khened et al. (2018) 0.964 8.129 0.917 8.968 0.889 9.841 0.898 12.582 0.935 13.994 0.879 13.930
Baumgartner et al. (2018) 0.963 6.526 0.911 9.170 0.892 8.703 0.901 10.637 0.932 12.670 0.883 14.691
Painchaud et al. (2020) 0.961 6.152 0.911 8.278 0.881 8.651 0.897 9.598 0.933 13.718 0.884 13.323
Wolterink et al. (2018) 0.961 7.515 0.918 6.603 0.875 11.121 0.894 10.687 0.928 11.879 0.872 13.399
Calisto and Lai-Yuen (2020) 0.958 5.592 0.903 8.644 0.873 8.197 0.895 8.318 0.936 10.183 0.884 12.234
Zotti et al. (2018a) 0.957 6.641 0.905 8.706 0.884 8.708 0.896 9.264 0.941 10.318 0.882 14.053
Singh et al. (2023c) 0.967 5.526 0.935 6.913 0.902 8.094 0.921 7.772 0.949 9.187 0.900 11.556
Singh et al. (2023a) 0.967 5.652 0.938 6.878 0.905 7.389 0.923 7.373 0.950 8.513 0.895 12.167
Singh et al. (2023b) 0.968 5.859 0.937 6.529 0.904 7.723 0.922 7.221 0.952 8.788 0.890 11.926
nnU (2D) 0.942 10.438 0.885 12.678 0.897 10.050 0.913 8.231 0.965 6.739 0.927 6.795
nnU (3D) 0.934 11.494 0.882 12.743 0.889 8.057 0.906 8.785 0.959 8.486 0.901 9.028
nnU (Ens) 0.944 10.716 0.892 12.200 0.898 9.884 0.915 8.460 0.963 9.584 0.922 8.321

4.4 MnM

As in the ACDC dataset, MnM performance is evaluated on both ES (Table 11) and ED (Table 12) phases. The 2D nnU-Net outperforms both 3D and ensemble models in RV segmentation in the ES phase, while the 3D full-resolution model also outperforms LV segmentation in terms of dice score in the ES phase. In the ED phase, the ensemble model demonstrates superior performance. RV segmentation in both phases achieves higher dice scores compared to other approaches (Table 14). In other cases, other approaches surpass the nnU-Net by slight margins, typically less than 1%. In Figure 5, we compare the performance of the nnU-Net models in both ES and ED phases.

Table 11: Performance of MnM Dataset for End-Systole (ES) phase. The best segmentation values for the Left Ventricle (LV), Myocardium (MYO), and Right Ventricle (RV) are highlighted in orange, blue, and red, respectively. DSC- Dice Score, HD - Hausdorff Distance, MSD- Mean Surface Distance, HD95-95th percentile of HD, LV- Left Ventricle, MYO- Myocardium, RV- Right Ventricle.
Model Label DSC Jaccard HD MSD HD95
2D LV 0.888 0.833 12.681 2.368 8.513
2D MYO 0.800 0.689 15.428 1.905 7.304
2D RV 0.893 0.821 14.595 1.450 6.411
3D full res. LV 0.909 0.842 8.576 0.944 4.507
3D full res. MYO 0.841 0.734 11.141 0.812 3.952
3D full res. RV 0.871 0.784 13.130 1.258 5.673
Ensemble LV 0.888 0.803 8.486 0.956 4.432
Ensemble MYO 0.864 0.762 9.872 0.613 3.542
Ensemble RV 0.852 0.751 12.658 1.083 5.366
Table 12: Performance of MnM Dataset for End-Diastole (ED) phase. The best segmentation values for the Left Ventricle (LV), Myocardium (MYO), and Right Ventricle (RV) are highlighted in orange, blue, and red, respectively. DSC- Dice Score, HD - Hausdorff Distance, MSD- Mean Surface Distance, HD95-95th percentile of HD, LV- Left Ventricle, MYO- Myocardium, RV- Right Ventricle.
Model Label DSC Jaccard HD MSD HD95
2D LV 0.936 0.882 7.517 0.728 3.871
2D MYO 0.824 0.706 10.738 0.592 3.676
2D RV 0.909 0.836 11.601 0.900 4.578
3D full res. LV 0.933 0.877 8.199 0.819 4.261
3D full res. MYO 0.819 0.699 10.776 0.580 3.484
3D full res. RV 0.908 0.836 11.520 0.870 4.501
Ensemble LV 0.937 0.883 7.393 0.725 3.761
Ensemble MYO 0.826 0.709 9.944 0.527 3.138
Ensemble RV 0.913 0.843 10.847 0.818 4.208

4.5 MnM2

Deviating from the MnM challenge, we analyze the performance of the MnM2 challenge in four different conditions: Short Axis (ShA) ED phase (Table 13), ShA ES phase (Table 15), Long Axis (LoA) ED phase (Table 16), and LoA ES phase (Table 17). In both phases in ShA, the ensemble method demonstrates superior performance, while the 2D method surpasses the 3D method. For LoA segmentation, images have the shape of H×W×1𝐻𝑊1H\times W\times 1italic_H × italic_W × 1, indicating only one layer in the Z-axis, making the 3D full-resolution method particularly effective, and thus only 3D full-resolution results are reported. The challenge organizers report only the values of RV segmentation (Table 18). In this case, nnU-Net outperforms ShA ES segmentation by 2.4% compared to other models. However, in other cases (ShA ED, LoA ES, and LoA ED), other models surpass the nnU-Net, but the margin is less than 1%.

In summary, ensemble models demonstrate strong performance across all datasets. Surprisingly, in some cases, the 2D models outperform the 3D models and even the ensemble models. The most significant difference where other models surpass nnU-Net occurs in the LAScarQs Task 1 scar segmentation. A summary of the comparison between the highest Dice value obtained from nnU-Net, the highest Dice value from other methods, and the absolute difference (%) is shown in Table 19.

Table 13: Performance of MnM2 Dataset for Short Axis (ShA) End-Diastole (ED) phase. The best values for the Left Ventricle (LV), Myocardium (MYO), and Right Ventricle (RV) are highlighted in orange, blue, and red, respectively. DSC- Dice Score, HD - Hausdorff Distance, MSD- Mean Surface Distance, HD95-95th percentile of HD, LV- Left Ventricle, MYO- Myocardium, RV- Right Ventricle.
Model Label DSC Jaccard HD MSD HD95
2D LV 0.957 0.920 8.268 0.515 3.170
2D MYO 0.867 0.769 12.238 0.442 2.842
2D RV 0.934 0.879 10.050 0.766 4.084
3D full res. LV 0.955 0.916 8.361 0.565 3.571
3D full res. MYO 0.862 0.761 12.035 0.426 2.561
3D full res. RV 0.934 0.878 10.394 0.779 4.200
Ensemble LV 0.958 0.921 8.029 0.496 3.256
Ensemble MYO 0.869 0.772 11.492 0.396 2.371
Ensemble RV 0.937 0.884 11.079 0.742 4.021
Table 14: Performance comparison of nnU-Net variations and other models in MnM dataset. LV- Left Ventricle, MYO- Myocardium, RV- Right Ventricle, ED- End Dystole, ES - End Systole, DSC- Dice Score, HD - Hausdorff Distance.
Method LV MYO RV
ED ES ED ES ED ES
DSC HD DSC HD DSC HD DSC HD DSC HD DSC HD
Full et al. (2021) 0.939 9.1 0.886 9.1 0.839 12.8 0.867 10.6 0.910 11.8 0.860 12.7
Parreño et al. (2021) 0.939 11.3 0.884 11.4 0.826 15.2 0.856 14.0 0.886 15.4 0.829 16.7
Zhang et al. (2021) 0.938 9.3 0.880 9.5 0.830 12.9 0.861 10.8 0.909 12.3 0.850 13.0
Ma (2021) 0.935 9.5 0.875 10.5 0.825 13.3 0.856 11.6 0.906 12.3 0.844 13.0
Saber et al. (2021) 0.933 13.4 0.867 14.0 0.812 17.1 0.839 18.2 0.876 15.7 0.815 18.1
Kong and Shadden (2021) 0.931 10.0 0.877 9.8 0.816 13.7 0.850 11.3 0.893 14.3 0.827 15.2
Singh et al. (2023c) 0.928 7.15 0.890 7.6 0.834 10.2 0.868 9.6 0.902 10.6 0.852 11.7
Corral Acero et al. (2021) 0.927 11.2 0.877 9.7 0.815 14.0 0.852 11.1 0.892 13.6 0.834 15.0
Li et al. (2021a) 0.922 15.5 0.857 17.5 0.809 18.0 0.836 17.2 0.867 16.6 0.802 19.1
Khader et al. (2021) 0.914 12.1 0.853 12.0 0.768 17.2 0.814 15.2 0.850 17.5 0.794 17.0
Carscadden et al. (2021) 0.913 14.5 0.851 13.0 0.776 17.8 0.809 14.5 0.791 30.7 0.732 32.9
Scannell et al. (2021) 0.905 13.6 0.848 15.5 0.772 17.2 0.820 17.5 0.876 16.2 0.809 19.6
Huang et al. (2021) 0.896 15.7 0.772 23.0 0.761 17.9 0.721 20.2 0.820 21.0 0.698 29.5
Liu et al. (2021b) 0.889 16.0 0.835 14.2 0.785 22.1 0.808 18.9 0.814 22.1 0.758 22.0
Li et al. (2021c) 0.797 21.9 0.716 25.8 0.668 31.6 0.673 33.0 0.552 49.1 0.517 52.0
Singh et al. (2023a) 0.940 7.5 0.890 7.7 0.839 10.3 0.870 9.9 0.909 10.2 0.856 11.4
nnU-Net (2D) 0.936 7.5 0.888 12.7 0.824 10.7 0.800 15.4 0.909 11.6 0.893 14.6
nnU-Net (3D) 0.933 8.2 0.909 8.6 0.819 10.8 0.841 11.1 0.908 11.5 0.871 13.1
nnU-Net (Ensemble) 0.937 7.4 0.888 8.5 0.826 9.9 0.864 9.9 0.913 10.8 0.852 12.7
Table 15: Performance of MnM2 Dataset for Short Axis (ShA) End-Systole (ES) phase. The best segmentation values for the Left Ventricle (LV), Myocardium (MYO), and Right Ventricle (RV) are highlighted in orange, blue, and red, respectively. DSC- Dice Score, HD - Hausdorff Distance, MSD- Mean Surface Distance, HD95-95th percentile of HD, LV- Left Ventricle, MYO- Myocardium, RV- Right Ventricle.
Model Label DSC Jaccard HD MSD HD95
2D LV 0.958 0.920 8.350 0.513 3.170
2D MYO 0.867 0.770 11.928 0.428 2.571
2D RV 0.934 0.879 11.228 0.817 4.520
3D full res. LV 0.956 0.916 8.233 0.561 3.481
3D full res. MYO 0.862 0.761 12.004 0.426 2.555
3D full res. RV 0.934 0.878 10.301 0.779 4.302
Ensemble LV 0.958 0.920 8.225 0.503 3.264
Ensemble MYO 0.868 0.771 11.684 0.398 2.341
Ensemble RV 0.938 0.885 11.119 0.722 3.930
Table 16: Performance of MnM2 Dataset for Long Axis (LoA) End-Diastole (ED) phase.
Model Label DSC Jaccard HD MSD HD95
3D full res. LV 0.968 0.938 4.082 0.871 2.977
3D full res. MYO 0.878 0.786 6.504 0.662 2.151
3D full res. RV 0.934 0.878 6.055 1.262 4.075
Table 17: Performance of MnM2 Dataset for Long Axis (LoA) End-Systole (ES) phase.
Model Label DSC Jaccard HD MSD HD95
3D full res. LV 0.948 0.904 4.432 1.076 3.246
3D full res. MYO 0.891 0.809 5.342 0.837 2.809
3D full res. RV 0.899 0.822 6.108 1.457 4.254
Table 18: Performance comparison of nnU-Net variations and other models in MnM2 dataset for Right Ventricle only. ShA- Short Axis, LoA - Long Axis, ED - End Diastole, ES- End Systole, DSC - Dice Score, HD - Hausdorff Distance.
ShA LoA
Method ED ES ED ES
DSC HD DSC HD DSC HD DSC HD
Fulton et al. (2021) 0.934 9.610 0.910 10.032 0.935 6.227 0.904 5.935
Arega et al. (2021) 0.932 10.078 0.910 9.782 0.935 6.028 0.905 6.188
Punithakumar et al. (2021) 0.940 10.122 0.914 9.987 0.931 6.337 0.904 5.976
Li et al. (2021b) 0.933 10.563 0.907 10.050 0.930 6.246 0.902 6.097
Sun et al. (2022) 0.937 10.879 0.913 9.874 0.935 6.056 0.904 6.031
Al Khalil et al. (2021) 0.927 9.941 0.897 10.307 0.907 8.444 0.883 7.265
Liu et al. (2021a) 0.932 10.517 0.903 10.101 0.934 7.721 0.896 6.019
Jabbar et al. (2021) 0.923 11.258 0.897 11.062 0.910 7.757 0.882 6.933
Queirós (2021) 0.924 11.327 0.898 11.447 0.922 7.173 0.900 6.391
Galati and Zuluaga (2021) 0.916 11.681 0.890 11.747 0.924 7.840 0.894 6.978
Mazher et al. (2021) 0.909 15.275 0.880 14.606 0.888 8.333 0.854 8.347
Gao and Zhuang (2022) 0.844 15.495 0.821 16.750 0.887 9.733 0.851 9.659
Beetz et al. (2021) 0.873 16.682 0.820 17.913 0.896 8.570 0.864 7.591
Tautz et al. (2021) 0.883 17.024 0.838 18.003 0.849 13.303 0.809 13.716
Galazis et al. (2021) 0.852 19.430 0.821 19.117 0.814 18.629 0.781 17.198
nnU (2D) 0.934 10.50 0.934 11.228 - - - -
nnU (3D full res) 0.934 10.393 0.934 10.301 0.934 6.055 0.900 6.108
nnU (Ensemble) 0.937 11.079 0.938 11.119 - - - -
Table 19: Comparison of Dice scores of nnUnet and Other methods. ED - End Diastole, ES - End Systole, ShA- Short Axis, LoA - Long Axis, LV - Left Ventricle, MYO - Myocardium, RV - Right Ventricle, LA - Left Atrium.
Dataset Sub Task Anatomical Region nnUnet Other methods Abs. Difference (%)
ACDC ED LV 0.944 0.968 2.4
ES LV 0.892 0.938 4.6
ED MYO 0.898 0.906 0.8
ES MYO 0.915 0.923 0.8
ED RV 0.963 0.963 0.0
ES RV 0.922 0.904 1.8
LAScarQS Task-1 LA Scar 0.449 0.660 21.1
Task-1 LA Cavity 0.939 0.938 0.1
Task-2 LA Cavity 0.938 0.929 0.9
MnM1 ED LV 0.937 0.940 0.3
ES LV 0.909 0.890 1.9
ED MYO 0.826 0.834 0.8
ES MYO 0.864 0.870 0.6
ED RV 0.913 0.910 0.3
ES RV 0.893 0.860 3.3
MnM2 ShA ED RV 0.937 0.940 0.3
ShA ES RV 0.938 0.914 2.4
LoA ES RV 0.934 0.935 0.1
LoA ED RV 0.900 0.905 0.5
LASC - LA Cavity 0.934 0.935 0.1
Refer to caption
Figure 6: Total pixel distribution of 60 images of LAScarQS Task1.

5 Discussion

In this section, we discuss and analyze our findings in detail.

5.1 Lower performance in LAScarQS scar segmentation

In analyzing the performance of nnU-Net for scar segmentation in the LAScarQS Task 1, it is evident that the model underperforms relative to other available models. Several factors contribute to this discrepancy.

Firstly, the primary challenge lies in the nature of the target region. Scar tissues occupy only a small fraction of the LA compared to the LA cavity (As shown in Figure 6 nearly 0.7% occupies the cavity and less than 0.1% occupies the scar). This significant imbalance in the spatial distribution makes it difficult for the model to accurately distinguish and segment the scar regions. The nnU-Net’s architecture, while robust for larger and more continuous regions, struggles with the precision required for such minute and sparse areas.

Secondly, the characteristics of the data further complicate the task. Unlike the LA cavity, which presents as a more continuous and homogenous region, scar tissues are often irregular and dispersed. This non-continuous nature of scar data poses a substantial challenge for segmentation models, particularly those like nnU-Net that rely heavily on spatial continuity and context provided by larger regions.

Additionally, most state-of-the-art methods for scar segmentation adopt a two-stage network approach. These approaches typically involve an initial stage that performs coarse segmentation, identifying potential regions of interest (ROIs), followed by a refinement stage that focuses on enhancing the segmentation accuracy within these regions. This two-step process allows for more focused learning and better handling of small and irregular regions, leading to superior performance in scar segmentation tasks.

In contrast, the nnU-Net framework primarily utilizes a single-stage approach. While this method is advantageous for its simplicity and reduced computational requirements, it may not provide the necessary granularity and focus required for effectively segmenting small and irregular structures like scar tissues. The lack of an initial coarse segmentation stage means that nnU-Net must rely solely on its inherent ability to capture and distinguish fine details within a single pass, which is inherently more challenging for such complex tasks.

Moreover, the non-continuous property of the scar tissue can contribute to higher HD values. The HD metric is particularly sensitive to outliers and disjoint regions, which are characteristic of scar tissue. As a result, even small segmentation errors can lead to disproportionately high HD values, further reflecting the difficulty in accurately segmenting these regions.

Lastly, the standard data augmentation and preprocessing techniques employed by nnU-Net, while effective for general segmentation tasks, might not be sufficiently tailored to the unique challenges presented by scar tissue segmentation. Employing more specialized augmentation techniques that better simulate the variability and appearance of scar tissues could potentially enhance the model’s performance.

5.2 Ensemble Results

When comparing nnU-Net ensemble models to individual 3D and 2D nnU-Net variants, it is essential to understand that while ensemble methods have the potential to enhance model performance, this improvement is not always guaranteed. For an ensemble to significantly outperform a single model, the base classifiers must exhibit diversity. This means they need to make different errors, thereby complementing each other’s weaknesses. However, when the signal in the data is dominated by a few strong predictors, most models, including those within an ensemble, will likely capture and model this dominant information similarly. This can result in highly correlated predictions across the ensemble members, thereby reducing the potential benefits of combining them. In other words, if the nnU-Net ensemble models demonstrate lower performance compared to individual 3D or 2D nnU-Net variants, a lack of diversity among the ensemble members could be a contributing factor. When ensemble models are not sufficiently diverse, they may fail to provide the expected performance boost, leading to a situation where the ensemble’s performance is merely on par with or even inferior to the best individual model.

5.3 Higher performance in 2D model compared to 3D model

In our analysis of the ACDC, MnM, and MnM2 datasets, we observe a trend where 2D nnU-Net implementations demonstrated superior performance, as measured by Dice scores, compared to their 3D counterparts. This can be attributed to several factors inherent to both the nature of MRI data and the architectural differences between 2D and 3D models.

Firstly, the inherent characteristics of MRI data play a crucial role. These images typically exhibit high in-plane resolution but relatively lower through-plane resolution (Upendra et al., 2021). This aligns well with the strengths of 2D models, which can effectively process and leverage the high-resolution in-plane information without being encumbered by the lower resolution in the z-axis.

Secondly, the increased complexity of 3D models presents both advantages and challenges. While 3D architectures have the potential to capture volumetric context, they also introduce a significantly larger number of parameters. This increased parameter count necessitates larger training data-sets to achieve optimal performance. In scenarios where the available data is limited, 2D models may be better suited to generalize effectively from the available samples.

The computational demands of 3D models also impact their performance. These architectures require substantially more GPU memory, which can impose constraints on critical training hyperparameters such as patch size and batch size. Smaller patch sizes, often necessitated by memory limitations, may restrict the spatial context available to the model during training. This reduced context can be particularly detrimental in tasks where long-range spatial dependencies are crucial for accurate segmentation.

Furthermore, the nature of the segmentation task itself may favor 2D approaches. If the key features for accurate segmentation are predominantly visible within individual slices, the additional complexity introduced by 3D models in capturing inter-slice relationships may not provide significant benefits. In fact, this added complexity could potentially introduce noise or irrelevant information into the learning process, leading to suboptimal performance.

5.4 Effect of the configurations of nnU-Net

When utilizing nnU-Net, the selection of loss functions, optimizers, batch sizes, and patch sizes is tailored to the specific characteristics of the dataset. In our case, all the nnU-Nets employ a combination of Dice loss and cross-entropy loss (DiceCE loss) as its default loss function. However, in scenarios with class imbalance, alternative loss functions such as DiceHD loss (combining Dice loss with Hausdorff Distance loss) and DiceFocal loss (combining Dice loss with focal loss) have demonstrated superior performance (Ma et al., 2021). Therefore, incorporating these loss functions into nnU-Net could potentially enhance segmentation results.

Furthermore, nnU-Net traditionally utilizes the SGD optimizer. Nonetheless, recent studies have shown that the Adam optimizer can achieve comparable, if not superior, outcomes in segmentation tasks (Rajinikanth et al., 2022). Consequently, integrating the Adam optimizer into nnU-Net’s framework may lead to improved performance in certain cases.

6 Conclusion

In this study, we evaluated five datasets related to cardiac MRI segmentation using various adaptations of nnU-Nets. Through extensive experimentation over more than 130 training cycles, we conducted a comprehensive performance analysis of these models. Our comparative study against existing methods demonstrated that nnU-Net performs not only competitively but also frequently surpasses current state-of-the-art techniques, even the latest methods in some datasets.

Our findings underscore the robustness and adaptability of nnU-Net for cardiac MRI segmentation tasks. The model’s consistent performance across different datasets highlights its potential as a reliable tool for clinical applications. However, this study also raises an important question: when is it necessary to develop new models specifically tailored for particular cardiac segmentation tasks?

The answer lies in the intricacies and demands of specific scenarios. While nnU-Net provides a strong baseline, certain cases may present unique challenges that require bespoke solutions. For example, in the segmentation of complex anatomical structures such as scars, we observed that nnU-Net faces limitations. In such cases, developing specialized models proved to be beneficial. Additionally, some analyses often require integrating information from various imaging modalities (e.g., combining MRI with CT scans). To effectively merge and interpret such data, specialized models might be necessary.

Our study focused exclusively on cardiac-related datasets and a single imaging modality. Future research should expand to other anatomical regions, such as the brain and abdomen, and incorporate additional imaging modalities such as CT, X-ray, and Ultrasound. This would not only validate the generalizability of nnU-Net but also identify any potential limitations and areas for improvement.

In conclusion, while the nnU-Net framework provides a robust and versatile foundation for cardiac MRI segmentation, the development of new models tailored to specific clinical needs and challenges remains essential. Our work demonstrates that while general-purpose models like nnU-Net offer significant advantages, there is still a critical need for ongoing innovation and customization to address the unique complexities of different medical imaging tasks. Future research should continue to explore and develop these specialized approaches to fully harness the potential of deep learning in medical imaging.

References

  • Akkaya et al. (2013) Mehmet Akkaya, Koji Higuchi, Matthias Koopmann, Nathan Burgon, Ercan Erdogan, Kavitha Damal, Eugene Kholmovski, Chris McGann, and Nassir F Marrouche. Relationship between left atrial tissue structural remodelling detected using late gadolinium enhancement mri and left ventricular hypertrophy in patients with atrial fibrillation. Europace, 15(12):1725–1732, 2013.
  • Al Khalil et al. (2021) Yasmina Al Khalil, Sina Amirrajab, Josien Pluim, and Marcel Breeuwer. Late fusion u-net with gan-based augmentation for generalizable cardiac mri segmentation. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 360–373. Springer, 2021.
  • Allessie et al. (2002) Maurits Allessie, Jannie Ausma, and Ulrich Schotten. Electrical, contractile and structural remodeling during atrial fibrillation. Cardiovascular research, 54(2):230–246, 2002.
  • Ammar et al. (2021) Abderazzak Ammar, Omar Bouattane, and Mohamed Youssfi. Automatic cardiac cine mri segmentation and heart disease classification. Computerized Medical Imaging and Graphics, 88:101864, 2021.
  • Arega et al. (2021) Tewodros Weldebirhan Arega, François Legrand, Stéphanie Bricq, and Fabrice Meriaudeau. Using mri-specific data augmentation to enhance the segmentation of right ventricle in multi-disease, multi-center and multi-view cardiac mri. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 250–258. Springer, 2021.
  • Arega et al. (2022) Tewodros Weldebirhan Arega, Stéphanie Bricq, and Fabrice Meriaudeau. Using polynomial loss and uncertainty information for robust left atrial and scar quantification and segmentation. In Challenge on Left Atrial and Scar Quantification and Segmentation, pages 133–144. Springer, 2022.
  • Baumgartner et al. (2018) Christian F Baumgartner, Lisa M Koch, Marc Pollefeys, and Ender Konukoglu. An exploration of 2d and 3d deep learning techniques for cardiac mr image segmentation. In Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges: 8th International Workshop, STACOM 2017, Held in Conjunction with MICCAI 2017, Quebec City, Canada, September 10-14, 2017, Revised Selected Papers 8, pages 111–119. Springer, 2018.
  • Beetz et al. (2021) Marcel Beetz, Jorge Corral Acero, and Vicente Grau. A multi-view crossover attention u-net cascade with fourier domain adaptation for multi-domain cardiac mri segmentation. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 323–334. Springer, 2021.
  • Berihu Girum et al. (2021) Kibrom Berihu Girum, Gilles Créhange, and Alain Lalande. Learning with context feedback loop for robust medical image segmentation. arXiv e-prints, pages arXiv–2103, 2021.
  • Bernard et al. (2018) Olivier Bernard, Alain Lalande, Clement Zotti, Frederick Cervenansky, Xin Yang, Pheng-Ann Heng, Irem Cetin, Karim Lekadir, Oscar Camara, Miguel Angel Gonzalez Ballester, et al. Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE transactions on medical imaging, 37(11):2514–2525, 2018.
  • Bian et al. (2018) Cheng Bian, Xin Yang, Jianqiang Ma, Shen Zheng, Yu-An Liu, Reza Nezafat, Pheng-Ann Heng, and Yefeng Zheng. Pyramid network with online hard example mining for accurate left atrium segmentation. In international workshop on statistical atlases and computational models of the heart, pages 237–245. Springer, 2018.
  • Bisbal et al. (2014) Felipe Bisbal, Esther Guiu, Pilar Cabanas-Grandío, Antonio Berruezo, Susana Prat-Gonzalez, Bárbara Vidal, Cesar Garrido, David Andreu, Juan Fernandez-Armenta, Jose María Tolosana, et al. Cmr-guided approach to localize and ablate gaps in repeat af ablation procedure. JACC: Cardiovascular Imaging, 7(7):653–663, 2014.
  • Boldt et al. (2004) Andreas Boldt, Ulrike Wetzel, Joerg Lauschke, J Weigl, Jf Gummert, Gerd Hindricks, Hans Kottkamp, and Stefan Dhein. Fibrosis in left atrial tissue of patients with atrial fibrillation with and without underlying mitral valve disease. Heart, 90(4):400–405, 2004.
  • Borra et al. (2019) Davide Borra, Alessandro Masci, Lorena Esposito, Alice Andalò, Claudio Fabbri, and Cristiana Corsi. A semantic-wise convolutional neural network approach for 3-d left atrium segmentation from late gadolinium enhanced magnetic resonance imaging. In Statistical Atlases and Computational Models of the Heart. Atrial Segmentation and LV Quantification Challenges: 9th International Workshop, STACOM 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers 9, pages 329–338. Springer, 2019.
  • Calisto and Lai-Yuen (2020) Maria Baldeon Calisto and Susana K Lai-Yuen. Adaen-net: An ensemble of adaptive 2d–3d fully convolutional networks for medical image segmentation. Neural Networks, 126:76–94, 2020.
  • Campello et al. (2021) Victor M Campello, Polyxeni Gkontra, Cristian Izquierdo, Carlos Martin-Isla, Alireza Sojoudi, Peter M Full, Klaus Maier-Hein, Yao Zhang, Zhiqiang He, Jun Ma, et al. Multi-centre, multi-vendor and multi-disease cardiac segmentation: the m&ms challenge. IEEE Transactions on Medical Imaging, 40(12):3543–3554, 2021.
  • Carscadden et al. (2021) Adam Carscadden, Michelle Noga, and Kumaradevan Punithakumar. A deep convolutional neural network approach for the segmentation of cardiac structures from mri sequences. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 250–258. Springer, 2021.
  • Chen et al. (2019) Chen Chen, Wenjia Bai, and Daniel Rueckert. Multi-task learning for left atrial segmentation on ge-mri. In Statistical Atlases and Computational Models of the Heart. Atrial Segmentation and LV Quantification Challenges: 9th International Workshop, STACOM 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers 9, pages 292–301. Springer, 2019.
  • Chen et al. (2021) Jun Chen, Guang Yang, Habib Khan, Heye Zhang, Yanping Zhang, Shu Zhao, Raad Mohiaddin, Tom Wong, David Firmin, and Jennifer Keegan. Jas-gan: generative adversarial network based joint atrium and scar segmentations on unbalanced atrial targets. IEEE Journal of Biomedical and Health Informatics, 26(1):103–114, 2021.
  • Chen et al. (2022a) Shaolong Chen, Changzhen Qiu, Weiping Yang, and Zhiyong Zhang. Combining edge guidance and feature pyramid for medical image segmentation. Biomedical signal processing and control, 78:103960, 2022a.
  • Chen et al. (2022b) Shaolong Chen, Changzhen Qiu, Weiping Yang, and Zhiyong Zhang. Multiresolution aggregation transformer unet based on multiscale input and coordinate attention for medical image segmentation. Sensors, 22(10):3820, 2022b.
  • Chen et al. (2023) Shaolong Chen, Lijie Zhong, Changzhen Qiu, Zhiyong Zhang, and Xiaodong Zhang. Transformer-based multilevel region and edge aggregation network for magnetic resonance image segmentation. Computers in Biology and Medicine, 152:106427, 2023.
  • Corral Acero et al. (2021) Jorge Corral Acero, Vaanathi Sundaresan, Nicola Dinsdale, Vicente Grau, and Mark Jenkinson. A 2-step deep learning method with domain adaptation for multi-centre, multi-vendor and multi-disease cardiac magnetic resonance segmentation. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 196–207. Springer, 2021.
  • Full et al. (2021) Peter M Full, Fabian Isensee, Paul F Jäger, and Klaus Maier-Hein. Studying robustness of semantic segmentation under domain shift in cardiac mri. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 238–249. Springer, 2021.
  • Fulton et al. (2021) Mitchell J Fulton, Christoffer R Heckman, and Mark E Rentschler. Deformable bayesian convolutional networks for disease-robust cardiac mri segmentation. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 296–305. Springer, 2021.
  • Galati and Zuluaga (2021) Francesco Galati and Maria A Zuluaga. Using out-of-distribution detection for model refinement in cardiac image segmentation. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 374–382. Springer, 2021.
  • Galazis et al. (2021) Christoforos Galazis, Huiyi Wu, Zhuoyu Li, Camille Petri, Anil A Bharath, and Marta Varela. Tempera: Spatial transformer feature pyramid network for cardiac mri segmentation. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 268–276. Springer, 2021.
  • Gao and Zhuang (2022) Zheyao Gao and Xiahai Zhuang. Consistency based co-segmentation for multi-view cardiac mri using vision transformer. In Statistical Atlases and Computational Models of the Heart. Multi-Disease, Multi-View, and Multi-Center Right Ventricular Segmentation in Cardiac MRI Challenge: 12th International Workshop, STACOM 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Revised Selected Papers 12, pages 306–314. Springer, 2022.
  • Guo et al. (2021) Fumin Guo, Matthew Ng, Idan Roifman, and Graham Wright. Cardiac mri left ventricular segmentation and function quantification using pre-trained neural networks. In International Conference on Functional Imaging and Modeling of the Heart, pages 46–54. Springer, 2021.
  • Huang et al. (2021) Xiaoqiong Huang, Zejian Chen, Xin Yang, Zhendong Liu, Yuxin Zou, Mingyuan Luo, Wufeng Xue, and Dong Ni. Style-invariant cardiac image segmentation with test-time augmentation. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 305–315. Springer, 2021.
  • Isensee et al. (2018) Fabian Isensee, Paul F Jaeger, Peter M Full, Ivo Wolf, Sandy Engelhardt, and Klaus H Maier-Hein. Automatic cardiac disease assessment on cine-mri via time-series segmentation and domain specific features. In Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges: 8th International Workshop, STACOM 2017, Held in Conjunction with MICCAI 2017, Quebec City, Canada, September 10-14, 2017, Revised Selected Papers 8, pages 120–129. Springer, 2018.
  • Isensee et al. (2021) Fabian Isensee, Paul F Jaeger, Simon AA Kohl, Jens Petersen, and Klaus H Maier-Hein. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2):203–211, 2021.
  • Isensee et al. (2024) Fabian Isensee, Tassilo Wald, Constantin Ulrich, Michael Baumgartner, Saikat Roy, Klaus Maier-Hein, and Paul F Jaeger. nnu-net revisited: A call for rigorous validation in 3d medical image segmentation. arXiv preprint arXiv:2404.09556, 2024.
  • Jabbar et al. (2021) Sana Jabbar, Syed Talha Bukhari, and Hassan Mohy-ud Din. Multi-view sa-la net: A framework for simultaneous segmentation of rv on multi-view cardiac mr images. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 277–286. Springer, 2021.
  • Jiang et al. (2022) Lei Jiang, Yan Li, Yifan Wang, Hengfei Cui, Yong Xia, and Yanning Zhang. Deep u-net architecture with curriculum learning for left atrial segmentation. In Challenge on Left Atrial and Scar Quantification and Segmentation, pages 115–123. Springer, 2022.
  • Khader et al. (2021) Firas Khader, Justus Schock, Daniel Truhn, Fabian Morsbach, and Christoph Haarburger. Adaptive preprocessing for generalization in cardiac mr image segmentation. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 269–276. Springer, 2021.
  • Khan et al. (2022) Abbas Khan, Omnia Alwazzan, Martin Benning, and Greg Slabaugh. Sequential segmentation of the left atrium and atrial scars using a multi-scale weight sharing network and boundary-based processing. In Challenge on Left Atrial and Scar Quantification and Segmentation, pages 69–82. Springer, 2022.
  • Khened et al. (2018) Mahendra Khened, Varghese Alex, and Ganapathy Krishnamurthi. Densely connected fully convolutional network for short-axis cardiac cine mr image segmentation and heart diagnosis using random forest. In Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges: 8th International Workshop, STACOM 2017, Held in Conjunction with MICCAI 2017, Quebec City, Canada, September 10-14, 2017, Revised Selected Papers 8, pages 140–151. Springer, 2018.
  • Kong and Shadden (2021) Fanwei Kong and Shawn C Shadden. A generalizable deep-learning approach for cardiac magnetic resonance image segmentation using image augmentation and attention u-net. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 287–296. Springer, 2021.
  • Lefebvre et al. (2022) Arthur L Lefebvre, Carolyna AP Yamamoto, Julie K Shade, Ryan P Bradley, Rebecca A Yu, Rheeda L Ali, Dan M Popescu, Adityo Prakosa, Eugene G Kholmovski, and Natalia A Trayanova. Lassnet: A four steps deep neural network for left atrial segmentation and scar quantification. In Challenge on Left Atrial and Scar Quantification and Segmentation, pages 1–15. Springer, 2022.
  • Li et al. (2019) Caizi Li, Qianqian Tong, Xiangyun Liao, Weixin Si, Yinzi Sun, Qiong Wang, and Pheng-Ann Heng. Attention based hierarchical aggregation network for 3d left atrial segmentation. In Statistical Atlases and Computational Models of the Heart. Atrial Segmentation and LV Quantification Challenges: 9th International Workshop, STACOM 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers 9, pages 255–264. Springer, 2019.
  • Li and Li (2022) Feiyan Li and Weisheng Li. Cross-domain segmentation of left atrium based on multi-scale decision level fusion. In Challenge on Left Atrial and Scar Quantification and Segmentation, pages 124–132. Springer, 2022.
  • Li et al. (2023) Feiyan Li, Weisheng Li, Xinbo Gao, Rui Liu, and Bin Xiao. Comprehensive information integration network for left atrium segmentation on lge cmr images. Biomedical Signal Processing and Control, 81:104537, 2023.
  • Li et al. (2021a) Hongwei Li, Jianguo Zhang, and Bjoern Menze. Generalisable cardiac structure segmentation via attentional and stacked image adaptation. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 297–304. Springer, 2021a.
  • Li et al. (2021b) Lei Li, Wangbin Ding, Liqin Huang, and Xiahai Zhuang. Right ventricular segmentation from short-and long-axis mris via information transition. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 259–267. Springer, 2021b.
  • Li et al. (2021c) Lei Li, Veronika A Zimmer, Wangbin Ding, Fuping Wu, Liqin Huang, Julia A Schnabel, and Xiahai Zhuang. Random style transfer based domain generalization networks integrating shape and spatial information. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 208–218. Springer, 2021c.
  • Liu et al. (2021a) Di Liu, Zhennan Yan, Qi Chang, Leon Axel, and Dimitris N Metaxas. Refined deep layer aggregation for multi-disease, multi-view & multi-center cardiac mr segmentation. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 315–322. Springer, 2021a.
  • Liu et al. (2022a) Tianyi Liu, Size Hou, Jiayuan Zhu, Zilong Zhao, and Haochuan Jiang. Ugformer for robust left atrium and scar segmentation across scanners. In Challenge on Left Atrial and Scar Quantification and Segmentation, pages 36–48. Springer, 2022a.
  • Liu et al. (2021b) Xiao Liu, Spyridon Thermos, Agisilaos Chartsias, Alison O’Neil, and Sotirios A Tsaftaris. Disentangled representations for domain-generalized cardiac segmentation. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 187–195. Springer, 2021b.
  • Liu et al. (2019) Yashu Liu, Yangyang Dai, Cong Yan, and Kuanquan Wang. Deep learning based method for left atrial segmentation in ge-mri. In Statistical Atlases and Computational Models of the Heart. Atrial Segmentation and LV Quantification Challenges: 9th International Workshop, STACOM 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers 9, pages 311–318. Springer, 2019.
  • Liu et al. (2022b) Yashu Liu, Wei Wang, Gongning Luo, Kuanquan Wang, Dong Liang, and Shuo Li. Uncertainty-guided symmetric multilevel supervision network for 3d left atrium segmentation in late gadolinium-enhanced mri. Medical Physics, 49(7):4554–4565, 2022b.
  • Lourenço et al. (2021) Ana Lourenço, Eric Kerfoot, Connor Dibblin, Ebraham Alskaf, Mustafa Anjari, Anil A Bharath, Andrew P King, Henry Chubb, Teresa M Correia, and Marta Varela. Left atrial ejection fraction estimation using seganet for fully automated segmentation of cine mri. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 137–145. Springer, 2021.
  • Ma (2021) Jun Ma. Histogram matching augmentation for domain adaptation with application to multi-centre, multi-vendor and multi-disease cardiac image segmentation. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 177–186. Springer, 2021.
  • Ma et al. (2021) Jun Ma, Jianan Chen, Matthew Ng, Rui Huang, Yu Li, Chen Li, Xiaoping Yang, and Anne L Martel. Loss odyssey in medical image segmentation. Medical Image Analysis, 71:102035, 2021.
  • Martín-Isla et al. (2023) Carlos Martín-Isla, Víctor M Campello, Cristian Izquierdo, Kaisar Kushibar, Carla Sendra-Balcells, Polyxeni Gkontra, Alireza Sojoudi, Mitchell J Fulton, Tewodros Weldebirhan Arega, Kumaradevan Punithakumar, et al. Deep learning segmentation of the right ventricle in cardiac mri: The m&ms challenge. IEEE Journal of Biomedical and Health Informatics, 27(7):3302–3313, 2023.
  • Mazher et al. (2021) Moona Mazher, Abdul Qayyum, Abdesslam Benzinou, Mohamed Abdel-Nasser, and Domenec Puig. Multi-disease, multi-view and multi-center right ventricular segmentation in cardiac mri using efficient late-ensemble deep learning approach. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 335–343. Springer, 2021.
  • Mazher et al. (2022) Moona Mazher, Abdul Qayyum, Mohamed Abdel-Nasser, and Domenec Puig. Automatic semi-supervised left atrial segmentation using deep-supervision 3dresunet with pseudo labeling approach for lascarqs 2022 challenge. In Challenge on Left Atrial and Scar Quantification and Segmentation, pages 153–161. Springer, 2022.
  • Milletari et al. (2016) Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), pages 565–571. Ieee, 2016.
  • Painchaud et al. (2020) Nathan Painchaud, Youssef Skandarani, Thierry Judge, Olivier Bernard, Alain Lalande, and Pierre-Marc Jodoin. Cardiac segmentation with strong anatomical guarantees. IEEE transactions on medical imaging, 39(11):3703–3713, 2020.
  • Parreño et al. (2021) Mario Parreño, Roberto Paredes, and Alberto Albiol. Deidentifying mri data domain by iterative backpropagation. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 277–286. Springer, 2021.
  • Peters et al. (2007) Dana C Peters, John V Wylie, Thomas H Hauser, Kraig V Kissinger, René M Botnar, Vidal Essebag, Mark E Josephson, and Warren J Manning. Detection of pulmonary vein and left atrial scar after catheter ablation with three-dimensional navigator-gated delayed enhancement mr imaging: initial experience. Radiology, 243(3):690–695, 2007.
  • Punithakumar and Noga (2022) Kumaradevan Punithakumar and Michelle Noga. Automated segmentation of the left atrium and scar using deep convolutional neural networks. In Challenge on Left Atrial and Scar Quantification and Segmentation, pages 145–152. Springer, 2022.
  • Punithakumar et al. (2021) Kumaradevan Punithakumar, Adam Carscadden, and Michelle Noga. Automated segmentation of the right ventricle from magnetic resonance imaging using deep convolutional neural networks. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 344–351. Springer, 2021.
  • Puybareau et al. (2018) Élodie Puybareau, Zhou Zhao, Younes Khoudli, Edwin Carlinet, Yongchao Xu, Jérôme Lacotte, and Thierry Géraud. Left atrial segmentation in a few seconds using fully convolutional network and transfer learning. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 339–347. Springer, 2018.
  • Qi et al. (2023) Yushi Qi, Chunhu Hu, Liling Zuo, Bo Yang, and Youlong Lv. Cardiac magnetic resonance image segmentation method based on multi-scale feature fusion and sequence relationship learning. Sensors, 23(2):690, 2023.
  • Queirós (2021) Sandro Queirós. Right ventricular segmentation in multi-view cardiac mri using a unified u-net model. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 287–295. Springer, 2021.
  • Rajinikanth et al. (2022) Venkatesan Rajinikanth, Seifedine Kadry, Robertas Damaševičius, D Sankaran, Mazin Abed Mohammed, and Shrinithi Chander. Skin melanoma segmentation using vgg-unet with adam/sgd optimizer: a study. In 2022 Third International Conference on Intelligent Computing Instrumentation and Control Technologies (ICICICT), pages 982–986. IEEE, 2022.
  • Saber et al. (2021) Mina Saber, Dina Abdelrauof, and Mustafa Elattar. Multi-center, multi-vendor, and multi-disease cardiac image segmentation using scale-independent multi-gate unet. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 259–268. Springer, 2021.
  • Scannell et al. (2021) Cian M Scannell, Amedeo Chiribiri, and Mitko Veta. Domain-adversarial learning for multi-centre, multi-vendor, and multi-disease cardiac mr image segmentation. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 228–237. Springer, 2021.
  • Shi et al. (2024) Zhebin Shi, Mingfeng Jiang, Yang Li, Bo Wei, Zefeng Wang, Yongquan Wu, Tao Tan, and Guang Yang. Mlc: Multi-level consistency learning for semi-supervised left atrium segmentation. Expert Systems with Applications, 244:122903, 2024.
  • Simantiris and Tziritas (2020) Georgios Simantiris and Georgios Tziritas. Cardiac mri segmentation with a dilated cnn incorporating domain-specific constraints. IEEE Journal of Selected Topics in Signal Processing, 14(6):1235–1243, 2020.
  • Singh et al. (2023a) Kamal Raj Singh, Ambalika Sharma, and Girish Kumar Singh. Attention-guided residual w-net for supervised cardiac magnetic resonance imaging segmentation. Biomedical Signal Processing and Control, 86:105177, 2023a.
  • Singh et al. (2023b) Kamal Raj Singh, Ambalika Sharma, and Girish Kumar Singh. Madru-net: Multi-scale attention-based cardiac mri segmentation using deep residual u-net. IEEE Transactions on Instrumentation and Measurement, 2023b.
  • Singh et al. (2023c) Kamal Raj Singh, Ambalika Sharma, and Girish Kumar Singh. W-net: Novel deep supervision for deep learning-based cardiac magnetic resonance imaging segmentation. IETE Journal of Research, 69(12):8960–8976, 2023c.
  • Stalling et al. (2005) Detlev Stalling, Malte Westerhoff, Hans-Christian Hege, et al. Amira: A highly interactive system for visual data analysis. The visualization handbook, 38:749–767, 2005.
  • Sun et al. (2022) Xiaowu Sun, Li-Hsin Cheng, and Rob J van der Geest. Right ventricle segmentation via registration and multi-input modalities in cardiac magnetic resonance imaging from multi-disease, multi-view and multi-center. In Statistical Atlases and Computational Models of the Heart. Multi-Disease, Multi-View, and Multi-Center Right Ventricular Segmentation in Cardiac MRI Challenge: 12th International Workshop, STACOM 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Revised Selected Papers 12, pages 241–249. Springer, 2022.
  • Tautz et al. (2021) Lennart Tautz, Lars Walczak, Chiara Manini, Anja Hennemuth, and Markus Hüllebrand. 3d right ventricle reconstruction from 2d u-net segmentation of sparse short-axis and 4-chamber cardiac cine mri views. In International Workshop on Statistical Atlases and Computational Models of the Heart, pages 352–359. Springer, 2021.
  • Tobon-Gomez et al. (2015) Catalina Tobon-Gomez, Arjan J Geers, Jochen Peters, Jürgen Weese, Karen Pinto, Rashed Karim, Mohammed Ammar, Abdelaziz Daoudi, Jan Margeta, Zulma Sandoval, et al. Benchmark for algorithms segmenting the left atrium from 3d ct and mri datasets. IEEE transactions on medical imaging, 34(7):1460–1473, 2015.
  • Tsao et al. (2023) Connie W Tsao, Aaron W Aday, Zaid I Almarzooq, Cheryl AM Anderson, Pankaj Arora, Christy L Avery, Carissa M Baker-Smith, Andrea Z Beaton, Amelia K Boehme, Alfred E Buxton, et al. Heart disease and stroke statistics—2023 update: a report from the american heart association. Circulation, 147(8):e93–e621, 2023.
  • Tu et al. (2022) Can Tu, Ziyan Huang, Zhongying Deng, Yuncheng Yang, Chenglong Ma, Junjun He, Jin Ye, Haoyu Wang, and Xiaowei Ding. Self pre-training with single-scale adapter for left atrial segmentation. In Challenge on Left Atrial and Scar Quantification and Segmentation, pages 24–35. Springer, 2022.
  • Ukwatta et al. (2015) Eranga Ukwatta, Hermenegild Arevalo, Martin Rajchl, James White, Farhad Pashakhanloo, Adityo Prakosa, Daniel A Herzka, Elliot McVeigh, Albert C Lardo, Natalia A Trayanova, et al. Image-based reconstruction of three-dimensional myocardial infarct geometry for patient-specific modeling of cardiac electrophysiology. Medical physics, 42(8):4579–4590, 2015.
  • Upendra et al. (2021) Roshan Reddy Upendra, Richard Simon, and Cristian A Linte. A deep learning framework for image super-resolution for late gadolinium enhanced cardiac mri. In 2021 Computing in Cardiology (CinC), volume 48, pages 1–4. IEEE, 2021.
  • Uslu et al. (2021) Fatmatülzehra Uslu, Marta Varela, Georgia Boniface, Thakshayene Mahenthran, Henry Chubb, and Anil A Bharath. La-net: A multi-task deep network for the segmentation of the left atrium. IEEE transactions on medical imaging, 41(2):456–464, 2021.
  • Vergara and Marrouche (2011) Gaston R Vergara and Nassir F Marrouche. Tailored management of atrial fibrillation using a lge-mri based model: from the clinic to the electrophysiology laboratory. Journal of cardiovascular electrophysiology, 22(4):481–487, 2011.
  • Vesal et al. (2019) Sulaiman Vesal, Nishant Ravikumar, and Andreas Maier. Dilated convolutions in neural networks for left atrial segmentation in 3d gadolinium enhanced-mri. In Statistical Atlases and Computational Models of the Heart. Atrial Segmentation and LV Quantification Challenges: 9th International Workshop, STACOM 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers 9, pages 319–328. Springer, 2019.
  • Wolterink et al. (2018) Jelmer M Wolterink, Tim Leiner, Max A Viergever, and Ivana Išgum. Automatic segmentation and disease classification using cardiac cine mr images. In Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges: 8th International Workshop, STACOM 2017, Held in Conjunction with MICCAI 2017, Quebec City, Canada, September 10-14, 2017, Revised Selected Papers 8, pages 101–110. Springer, 2018.
  • Xia et al. (2019) Qing Xia, Yuxin Yao, Zhiqiang Hu, and Aimin Hao. Automatic 3d atrial segmentation from ge-mris using volumetric fully convolutional networks. In Statistical Atlases and Computational Models of the Heart. Atrial Segmentation and LV Quantification Challenges: 9th International Workshop, STACOM 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers 9, pages 211–220. Springer, 2019.
  • Xie et al. (2022) Tongtong Xie, Zhengeng Yang, and Hongshan Yu. La-hrnet: High-resolution network for automatic left atrial segmentation in multi-center leg mri. In Challenge on Left Atrial and Scar Quantification and Segmentation, pages 83–92. Springer, 2022.
  • Xiong et al. (2021) Zhaohan Xiong, Qing Xia, Zhiqiang Hu, Ning Huang, Cheng Bian, Yefeng Zheng, Sulaiman Vesal, Nishant Ravikumar, Andreas Maier, Xin Yang, et al. A global benchmark of algorithms for segmenting the left atrium from late gadolinium-enhanced cardiac magnetic resonance imaging. Medical image analysis, 67:101832, 2021.
  • Xu et al. (2024) Fangqiang Xu, Wenxuan Tu, Fan Feng, Malitha Gunawardhana, Jiayuan Yang, Yun Gu, and Jichao Zhao. Dynamic position transformation and boundary refinement network for left atrial segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2024.
  • Yang et al. (2019) Xin Yang, Na Wang, Yi Wang, Xu Wang, Reza Nezafat, Dong Ni, and Pheng-Ann Heng. Combating uncertainty with novel losses for automatic left atrium segmentation. In Statistical Atlases and Computational Models of the Heart. Atrial Segmentation and LV Quantification Challenges: 9th International Workshop, STACOM 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers 9, pages 246–254. Springer, 2019.
  • Yushkevich et al. (2016) Paul A Yushkevich, Yang Gao, and Guido Gerig. Itk-snap: An interactive tool for semi-automatic segmentation of multi-modality biomedical images. In 2016 38th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pages 3342–3345. IEEE, 2016.
  • Zhang et al. (2022a) Xuru Zhang, Xinye Yang, Lihua Huang, and Liqin Huang. Two stage of histogram matching augmentation for domain generalization: Application to left atrial segmentation. In Challenge on Left Atrial and Scar Quantification and Segmentation, pages 60–68. Springer, 2022a.
  • Zhang et al. (2021) Yao Zhang, Jiawei Yang, Feng Hou, Yang Liu, Yixin Wang, Jiang Tian, Cheng Zhong, Yang Zhang, and Zhiqiang He. Semi-supervised cardiac image segmentation via label propagation and style transfer. In Statistical Atlases and Computational Models of the Heart. M&Ms and EMIDEC Challenges: 11th International Workshop, STACOM 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, Revised Selected Papers 11, pages 219–227. Springer, 2021.
  • Zhang et al. (2022b) Yuchen Zhang, Yanda Meng, and Yalin Zheng. Automatically segment the left atrium and scars from lge-mris using a boundary-focused nnu-net. In Challenge on Left Atrial and Scar Quantification and Segmentation, pages 49–59. Springer, 2022b.
  • Zhao et al. (2023) Chenji Zhao, Shun Xiang, Yuanquan Wang, Zhaoxi Cai, Jun Shen, Shoujun Zhou, Di Zhao, Weihua Su, Shijie Guo, and Shuo Li. Context-aware network fusing transformer and v-net for semi-supervised segmentation of 3d left atrium. Expert Systems with Applications, 214:119105, 2023.
  • Zhao et al. (2021) Zhou Zhao, Elodie Puybareau, Nicolas Boutry, and Thierry Géraud. Do not treat boundaries and regions differently: An example on heart left atrial segmentation. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 7447–7453. IEEE, 2021.
  • Zhou et al. (2022) Siping Zhou, Kai-Ni Wang, and Guang-Quan Zhou. Edge-enhanced feature guided joint segmentation of left atrial and scars in lge mri images. In Challenge on Left Atrial and Scar Quantification and Segmentation, pages 93–105. Springer, 2022.
  • Zhuang et al. (2023) Xiahai Zhuang, Lei Li, Sihan Wang, and Fuping Wu. Left Atrial and Scar Quantification and Segmentation: First Challenge, LAScarQS 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18, 2022, Proceedings, volume 13586. Springer Nature, 2023.
  • Zotti et al. (2018a) Clément Zotti, Zhiming Luo, Olivier Humbert, Alain Lalande, and Pierre-Marc Jodoin. Gridnet with automatic shape prior registration for automatic mri cardiac segmentation. In Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges: 8th International Workshop, STACOM 2017, Held in Conjunction with MICCAI 2017, Quebec City, Canada, September 10-14, 2017, Revised Selected Papers 8, pages 73–81. Springer, 2018a.
  • Zotti et al. (2018b) Clement Zotti, Zhiming Luo, Alain Lalande, and Pierre-Marc Jodoin. Convolutional neural network with shape prior applied to cardiac mri segmentation. IEEE journal of biomedical and health informatics, 23(3):1119–1128, 2018b.