MAMA-MIA: A Large-Scale Multi-Center Breast Cancer DCE-MRI Benchmark Dataset with Expert Segmentations

Lidia Garrucho Barcelona Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes 585 (08007), Barcelona, Spain Claire-Anne Reidel Barcelona Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes 585 (08007), Barcelona, Spain Kaisar Kushibar Barcelona Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes 585 (08007), Barcelona, Spain Smriti Joshi Barcelona Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes 585 (08007), Barcelona, Spain Richard Osuala Barcelona Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes 585 (08007), Barcelona, Spain Institute of Machine Learning in Biomedical Imaging, Helmholtz Center Munich, Munich, Germany School of Computation, Information and Technology, Technical University of Munich, Munich, Germany Apostolia Tsirikoglou Department of Oncology-Pathology, Karolinska Institutet, Stockholm, Sweden Maciej Bobowicz 2nd Dept. of Radiology, Medical University of Gdansk, Gdansk, Poland Javier del Riego Área de Radiología Mamaria y Ginecológica (UDIAT CD), Parc Taulí Hospital Universitari, Sabadell, Spain Alessandro Catanese Unitat de Diagnòstic per la Imatge de la Mama (UDIM), Hospital Germans Trias i Pujol, Badalona, Spain Katarzyna Gwoździewicz 2nd Dept. of Radiology, Medical University of Gdansk, Gdansk, Poland Maria-Laura Cosaka Centro Mamario Instituto Alexander Fleming, Buenos Aires, Argentina Pasant M. Abo-Elhoda Department of Diagnostic & Interventional Radiology and Molecular Imaging, Faculty of Medicine, Ain Shams University, Cairo, Egypt Sara W. Tantawy Department of Diagnostic & Interventional Radiology and Molecular Imaging, Faculty of Medicine, Ain Shams University, Cairo, Egypt Shorouq S. Sakrana Department of Diagnostic & Interventional Radiology and Molecular Imaging, Faculty of Medicine, Ain Shams University, Cairo, Egypt Norhan O. Shawky-Abdelfatah Department of Diagnostic & Interventional Radiology and Molecular Imaging, Faculty of Medicine, Ain Shams University, Cairo, Egypt Amr Muhammad Abdo-Salem Department of Diagnostic & Interventional Radiology and Molecular Imaging, Faculty of Medicine, Ain Shams University, Cairo, Egypt Androniki Kozana Department of Radiology, University Hospital of Heraklion, Stavrakia, Greece Eugen Divjak Department of Diagnostic and Interventional Radiology, University Hospital Dubrava, Zagreb, Croatia University of Zagreb, School of Medicine, Zagreb, Croatia Gordana Ivanac Department of Diagnostic and Interventional Radiology, University Hospital Dubrava, Zagreb, Croatia University of Zagreb, School of Medicine, Zagreb, Croatia Katerina Nikiforaki Computational BioMedicine Laboratory, Institute of Computer Science, Foundation for Research and Technology—Hellas, Heraklion, Greece Michail E. Klontzas Department of Radiology, School of Medicine, University of Crete, Heraklion, Greece Rosa García-Dosdá Medical Imaging and Radiology, Universitary and Politechnic Hospital La Fe, Valencia, Spain Meltem Gulsun-Akpinar Department of Radiology, Hacettepe University Faculty of Medicine Sihhiye, Ankara, Turkey Oğuz Lafcı Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Vienna, Austria Ritse Mann Department of Radiology and Nuclear Medicine, Radboud University Medical Center, The Netherlands Carlos Martín-Isla Barcelona Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes 585 (08007), Barcelona, Spain Fred Prior University of Arkansas for Medical Sciences, Little Rock, AR, US Kostas Marias Department of Electrical and Computer Engineering, Hellenic Mediterranean University, Heraklion, Greece Computational BioMedicine Laboratory, Institute of Computer Science, Foundation for Research and Technology – Hellas (FORTH), Heraklion, Greece Martijn P.A. Starmans Department of Radiology and Nuclear Medicine, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands Department of Pathology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands Fredrik Strand Department of Oncology-Pathology, Karolinska Institutet, Stockholm, Sweden Breast Radiology, Karolinska University Hospital, Stockholm, Sweden Oliver Díaz Barcelona Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes 585 (08007), Barcelona, Spain Laura Igual Barcelona Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes 585 (08007), Barcelona, Spain Karim Lekadir Barcelona Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes 585 (08007), Barcelona, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, Barcelona, Spain

Abstract

Current research in breast cancer Magnetic Resonance Imaging (MRI), especially with Artificial Intelligence (AI), faces challenges due to the lack of expert segmentations. To address this, we introduce the MAMA-MIA dataset, comprising 1506 multi-center dynamic contrast-enhanced MRI cases with expert segmentations of primary tumors and non-mass enhancement areas. These cases were sourced from four publicly available collections in The Cancer Imaging Archive (TCIA). Initially, we trained a deep learning model to automatically segment the cases, generating preliminary segmentations that significantly reduced expert segmentation time. Sixteen experts, averaging 9 years of experience in breast cancer, then corrected these segmentations, resulting in the final expert segmentations. Additionally, two radiologists conducted a visual inspection of the automatic segmentations to support future quality control studies. Alongside the expert segmentations, we provide 49 harmonized demographic and clinical variables and the pretrained weights of the well-known nnUNet architecture trained using the DCE-MRI full-images and expert segmentations. This dataset aims to accelerate the development and benchmarking of deep learning models and foster innovation in breast cancer diagnostics and treatment planning.

Background & Summary

Magnetic Resonance Imaging (MRI) emerges as a highly sensitive imaging modality for breast cancer assessment, particularly in preoperative staging and treatment response evaluation. Breast MRI, specifically T1-weighted dynamic contrast-enhanced imaging (DCE-MRI), utilizes contrast agents to enhance blood vessels and tissues within the breast, aiding in the localization of tumors, often identified through angiogenesis [1]. The precise delineation of the tumor boundary, or tumor segmentation, allows for accurate quantitative evaluation of tumor characteristics such as its shape, size, and volume and can help to monitor the disease progression and treatment effectiveness over time. In addition to its clinical value, gold-standard segmentations enable a more nuanced analysis of breast cancer characteristics and contributes to the development of AI models for improved diagnosis and prognosis. Radiomics [2], a method widely employed in machine learning applied to radiology, involves extracting numerous quantitative features from images to unveil hidden information and highly depends on gold-standard segmentations [3]. Even though radiomics has proven effective in predicting treatment response and survival status in breast cancer research, particularly using breast DCE-MRI images [4, 5, 6], current studies that use public datasets include only a small number of subjects (up to 300) due to the lack of public expert segmentations [7]. Besides the lack of expert segmentations, open-access DCE-MRI datasets are scarce and, currently, all the available collections are part of The Cancer Imaging Archive [8] (TCIA) from the United States. Within the existing collections, only 163 tumor segmentations from the I-SPY1/ACRIN 6657 trial [9] from a study by Chitalia et. al. [10] are available in TCIA. The existing collections are not homogenized in terms of folder structure, file naming and clinical variables. Similarly to M&Ms benchmark dataset in cardiac imaging [11, 12] and BRATS dataset in brain imaging [13], our initiative introduces a multi-center breast cancer DCE-MRI dataset with 1506 expert segmentations designed to facilitate the benchmark of advanced medical imaging models involving Artificial Intelligence (AI).

The main contributions of our work are shown in Figure 1.

Refer to caption — Figure 1: Summary of the main contributions in the MAMA-MIA dataset. The dataset includes three tables with the harmonized clinical and imaging data, the train and test splits and the automatic segmentation quality scores alongside the images and segmentations. Each case has the DCE-MRI phases in a folder under the images folder and two different segmentations, one expert corrected and other automatic without corrections.

First, we collected pre-treatment DCE-MRI cases from four different collections in TCIA, sourcing a total of 1506 cases. The selection criteria, shown in Figure 2, were to select pre-treatment DCE-MRI cases where the patients underwent neoadjuvant chemotherapy (NAC) in the months following the diagnosis and the corresponding clinical data was available: either the pathological complete response (pCR) to NAC or the five-year survival information.

Second, the clinical and imaging data of the selected cases from four collections were harmonized into a single table, containing a total of 21 clinical and 6 demographic variables, and 22 imaging parameters.

Third, a total of 16 experts participated in the segmentation of the primary tumors and non-mass enhanced areas present in the 1506 DCE-MRI cases. Manual segmentation of breast tumors in 3D MR images is both tedious and time-consuming. To facilitate this process, we initially trained a standard state-of-the-art deep learning model using private expert segmentations of DCE-MRI. This model enabled the generation of preliminary segmentations that the 16 experts inspected, manually corrected, and verified, resulting in 1506 expert segmentations. Additionally, to support future benchmarking of AI-driven quality control models, we had two expert clinicians perform a visual assessment of the preliminary automatic segmentations used as a baseline and corrected later by the experts.

Forth, the dataset folder structure was designed for easy retrieval and harmonized to train AI models in a plug-and-play manner.

Last, an additional contribution of this work is the pretrained weights of a vanilla nnUNet [14] tumor segmentation model, trained using the 1506 expert segmentations in the MAMA-MIA dataset. These pretrained weights can be used for inference or to fine-tune models for a wide variety of segmentation tasks involving MRI or other 3D medical image modalities.

We note that our dataset may have a potential bias introduced by the preliminary automatic segmentations and the inter-annotator variability among the 16 experts performing the manual corrections. Typically, radiologists use similar functionalities in annotation software (e.g., thresholding) to start from an approximation of the lesion and save time. Despite these potential biases, our dataset represents the largest collection of expert segmentations in breast cancer MRI to date, with harmonized imaging and clinical data. This addresses a significant gap in the availability of gold-standard segmentation and thus adds substantial value to breast cancer research. It is important to note, as a limitation of our dataset, that experts were requested to segment only the primary lesion in cases of multi-focal or multi-center breast cancers. This is because clinical information, such as tumor subtype and pathologic complete response, was only available for the primary lesion, and also to reduce the segmentation time.

In the following paragraphs some of the most important potential applications of the MAMA-MIA dataset are introduced in detail.

Treatment Response and Survival Prediction.

Despite its benefits, NAC has associated side effects, making it desirable to predict patient response before treatment planning. Most deep learning methods predicting pCR to NAC using MRI data have been developed with fewer than 300 samples and are difficult to benchmark due to the lack of a standard dataset with expert segmentations [7]. The inclusion of treatment and survival outcomes together with the other clinical variables allows the MAMA-MIA dataset to be used as a benchmark and to develop AI models to predict treatment response and patient survival.

Automatic Segmentation of Breast Cancer in MRI.

Automated segmentation algorithms can process medical images much faster than manual methods, minimizing inter-observer variability and providing more reliable results. The 1506 expert segmentations in this dataset enable the development of large-scale, generalizable, and robust automatic tumor segmentation models. In fact, we provide the pre-trained weights of a vanilla nnUNet segmentation model, trained with the 1506 DCE-MRI cases and the expert segmentations, to facilitate further improvement and for use as a baseline.

Segmentation Quality Control.

Visual inspection by expert radiologists is the gold standard for quality control, but it is challenging to apply on a large scale [15]. The expert segmentations, along with expert evaluations of automatic segmentations, serve as a foundation for robust quality control mechanisms in breast cancer MRI.

Image Synthesis.

The synthesis of realistic and diverse 2D MRI slices as well as full 3D DCE-MRI volumes not only enhances the optimization of image analysis algorithms, e.g., via data augmentation, domain adaptation, or privacy preservation, but can also contribute to improved radiologist diagnostic decision making, e.g., via simulation of treatment response, prediction of disease progression or synthetic contrast media inpainting [16, 17, 18, 19]. Patient demographics and clinical data in our dataset can further be utilized in training of generative models as conditioning or to further analyze their effect on the generated images and, together with respective prediction results, to enable AI fairness analysis and bias mitigation (e.g. age or ethnicity).

Image Standardization.

The dataset includes both bilateral and unilateral images, with variations in magnetic field strengths, number of slices, slice thicknesses, and scanner manufacturers, making it a valuable resource for developing new domain generalization and image standardization techniques. Additionally, exploring the dynamics of contrast evolution in tumors and its correlation with acquisition times, included in the harmonized imaging data, represents a promising avenue for future studies.

Fine-tuning of Foundational Models.

Foundation models like MedSAM [20], based on SAM [21], can address numerous segmentation tasks across various imaging modalities and showed superior performance to some specialist models in medical image segmentation. However, it had some limitations, such as modality representation imbalances in training data and challenges in segmenting vessel-like structures. In breast cancer imaging, the SAM model has been investigated for the task of interactive segmentation of breast tumors in ultrasound images [22] and in mass segmentation in mammography [23]. Our dataset may contribute to the fast ingestion of three-dimensional medical imaging data to train or fine-tune data hungry foundational models to breast MRI specific tasks.

Methods

Data Collection and Preprocessing

The steps to collect the DCE-MRI cases forming the MAMA-MIA dataset are illustrated in Figure 2. The initial selection criterion was to gather all the open-access DCE-MRI studies of breast cancer patients who underwent NAC treatment. Four collections available on The Cancer Imaging Archive [8] (TCIA) met this requirement: the level 2b cohort [24] from the I-SPY1/ACRIN 6657 trial (I-SPY1) [9], the I-SPY2/ACRIN 6698 trial [25, 26], NACT-Pilot [27], and Duke-Breast-Cancer-MRI [28], referred to as ISPY1, ISPY2, NACT, and DUKE, respectively.

The second criterion was to select the DCE-MRI series captured before the NAC treatment commenced, often referred to as pre-treatment or timepoint T0. The third and crucial criterion was to exclude cases lacking information on treatment response or survival status. The final criterion involved quality control by the experts during cancer segmentation in the DCE-MRI images. To conclude, experts discarded cases without sufficient contrast enhancement or with artifacts that significantly impeded segmentation.

The final MAMA-MIA collection comprises 1506 DCE-MRI cases that meet all the selection criteria. Figure 3 shows some sequences from the four different collections included in the dataset. Both bilateral and unilateral images from axial and sagittal views are present in the dataset. In the first post-contrast images, the visibility of malignant tissues is enhanced after contrast injection.

The dataset harmonization steps included data curation, image quality control, extraction of clinical and imaging data from DICOM headers, and establishing a standardized naming and folder structure for all the sequences in the new dataset. To ensure uniformity in image orientation, the sagittal images from NACT and ISPY1 were reoriented to the PSR (posterior-superior-right) coordinate system, while the axial images from DUKE and ISPY2 were reoriented to the LAS (left-anterior-superior) coordinate system. Maintaining a common image orientation per acquisition plane (axial and sagittal) is crucial for facilitating the integration of images into computational models and preventing undesired rotations.

Expert Segmentations

The dataset cohort comprises a highly heterogeneous group of locally advanced breast cancers, including patients with single tumors, multiple tumors (multi-focal cases), non-mass enhanced areas where the cancer has spread, and bilateral breast cancers. In our dataset, both automatic and expert segmentations were performed within the Volume of Interest (VOI), excluding other cancerous findings outside the VOI. The reason for segmenting only the tissues within the VOI is related to the clinical outcomes and tumor subtype information, which is available only for the primary tumor (delineated VOI) and not for bilateral or multi-focal breast cancers.

Selection of the Volume of Interest

The Volume of Interest (VOI) is a 3D rectangular box, drawn manually, including the entire enhanced region. The VOI strongly depends on the tumor morphology, and can cover few centimeters up to the full breast for more advanced tumors. The correct selection of the VOI is important for segmentation because the clinical information relative to the tumor subtype and the treatment response can only be linked to the volume within the VOI. In DUKE, the bounding boxes are provided in the clinical information of the collection. However, not all the datasets provide the 3D coordinates of the cancer in a straightforward manner. NACT, ISPY1, and ISPY2 collections provide tumor volumetric analysis images in most of the DCE-MRI. The volumetric analysis images contain various annotations of the breast tissue, and the pixel-level annotations of the peak enhanced region after the contrast injection, also known as the Functional Tumor Volume or FTV. The filtering steps to obtain the FTV consist of combining the percent enhancement (PE) image and the signal enhancement ratio (SER) image after applying a certain threshold to their pixel values [4]. Figure 4 shows the FTV in comparison with the manual segmentations of the tumor. As can be seen, the FTV segmentation of the tumor results in a region that may contain the malignant tissues, but in most cases, it does not represent precisely the tumor volume. In the cases where the analysis mask was not available, an approximate VOI was extracted using the same filtering steps from the SER and PE images available. With these procedures, we obtained the 3D bounding boxes encapsulating the tumor or non-mass enhanced area for all the 1506 cases.

Automatic Segmentations

In this study, the preliminary automatic tumor segmentations were generated using the popular nnUNet framework [14]. A segmentation model was trained using a total of 331 primary tumor and non-mass enhanced (NME) segmentations from DUKE[28] and the TCGA-BRCA collection [29]. The training dataset encompasses 251 axial DCE-MRI cases from DUKE with expert segmentations shared by the authors from a treatment response study[30, 6] and other 80 sagittal DCE-MRI cases with expert-validated automatic tumor segmentations [31] (Chicago Dynamic MRI Explorer 2005 Version) from TCGA-BRCA, increasing the heterogeneity of the training data. The 331 expert-validated tumor segmentations were performed on the first post-contrast image; however, patient movement was typically negligible, allowing the same segmentation to be applied to all phases, including both pre- and post-contrast phases, serving as additional data for training.

Prior to training, the preprocessing steps included cropping the images to the Volume of Interest (VOI) and resampling to $1\times 1\times 1\ mm^{3}$ isotropic pixel spacing. Additional data augmentation per patient involved cropping the images with 0 and 25% pixel margin and random flipping. The final automatic segmentations were upsampled and pasted into the original image space to get the full image segmentation masks. The nnUNet model achieved a mean validation Dice coefficient of $0.8287\pm 0.0112$ in a 5-fold cross-validation setting. As a note, the DCE-MRI cases from TCGA-BRCA collection were not included in the final MAMA-MIA dataset because there was no clinical information available, either tumor subtype, treatment response or survival status of the patient.

Visual Quality Control of the Automatic Segmentations

Two expert breast radiologists evaluated the quality of the preliminary automatic segmentations using an in-house graphical user interface (GUI). For each case, different 2D slices from the first post-contrast image across the axial, sagittal, and coronal planes were displayed with the segmentation contours highlighted in red. On one hand, two full image slices were displayed to help the experts to identify faster if the primary tumor or NME region was missed by the segmentation model. On the other hand, the cropped images helped the experts to assess the precision of the automatic segmentation within the VOI.

Based on the different images displayed, the experts were asked to assess the 1506 automatic segmentations as Good, Acceptable, Poor, or Missed. A Good segmentation indicated precision with no need for major corrections. An Acceptable segmentation captured the tumor but required improvement, with only a few incorrect pixels. A Poor segmentation lacked precision and contained numerous pixels outside the tumor region. Last, a segmentation categorized as Missed corresponded to an area of the breast unrelated to the tumor.

Expert Corrections

From the 1506 DCE-MRI cases forming the MAMA-MIA dataset only a total of 160 manual segmentations from the ISPY1 collection [10] were available in the TCIA platform. Additionally, the authors from a treatment response study using DUKE dataset [30, 6] shared the expert manual segmentations from an additional 251 cases included in MAMA-MIA dataset. Therefore, a total of 411 out of 1506 cases had expert manual segmentations. Our main contribution, together with the dataset harmonization and the expert assessed automatic segmentations, are the manual segmentations of the missing 1095 cases. A total of 16 experts from nine different institutions from Europe and Africa participated in the manual correction of the 3D segmentations. The group, with an average of 9 years of expertise in breast cancer radiology, was formed by fourteen breast radiologists, one surgical oncologist and one medical physicist. The automatic segmentation quality scores from one of the experts were used to stratify the automatic segmentations and assign each expert an evenly distributed set of 70 cases. Along with the automatic segmentations, each case consisted of the pre-contrast and the first post-contrast phase. The experts were asked to segment the tumor in the first post-contrast phase but the subtracted image or a later phase could be used as a support in the manual correction process. The Mango viewer [32] was the tool selected to correct the automatic segmentations. The guidelines provided to the experts included: 1) segment only the primary tumor if the secondary tumors are not within the FTV volume in multi-focal cases, 2) avoid as much as possible the inclusion of healthy tissue in non-mass enhanced cases, 3) exclude tissue markers (or clips) from the segmentations, 4) include tumor necrosis in the segmentation, 5) do not include intra-mammary lymph nodes, 6) verify the segmentation is consistent in all views, not only in the highest resolution view. In Figure 5 some examples of first post contrast images and the corresponding manual segmentations are shown.

Baseline Segmentation Model using the Expert Segmentations

An additional contribution of this work is the pretrained weights of a vanilla nnUNet [14] tumor segmentation model, trained using the 1506 expert segmentations. The nnUNet model was trained with DCE-MRI full-images as input over a total of 1000 epochs in a five-fold cross-validation setting. The model achieved a mean validation Dice coefficient of $0.70304\pm 0.0187$ .

The preprocessing steps included z-scoring the DCE-MRI images using the mean and standard deviation of all its phases (pre- and post-contrast) and resampling to $1\times 1\times 1\ mm^{3}$ isotropic pixel spacing. In the training pipeline, all post-contrast phases and the subtraction MRI image (computed by subtracting the pre-contrast image from the first post-contrast image) were included as data augmentation. The model was evaluated only on the first post-contrast phases, which are the images used by the experts to perform the segmentations.

Data Records

All data records, including the DCE-MRI images, the automatic and expert segmentations for each of the 1506 cases in the MAMA-MIA dataset and the weights of the pretrained segmentation model, are available online in the MAMA-MIA Synapse repository (https://doi.org/10.7303/syn60868042). Data records also include three tables, one that contains all the clinical and imaging information, another with the automatic segmentation quality scores from two experts who evaluated the automatic segmentations using the GUI, and a table with the train and test split to promote reproducibility in future studies using the dataset. Figure 1 illustrates the file content and folder structure of the MAMA-MIA dataset. Each case identifier consists of the original collection/dataset acronym and the corresponding patient identification number (patient ID). For instance, the ISPY1_1221 case corresponds to the pre-treatment DCE-MRI sequences of patient ID 1221 from the ISPY-1 collection. The different phases are named using the same case ID plus the corresponding phase number (ISPY1_1221_000X). For example, ISPY1_1221_0000 represents the pre-contrast phase and ISPY1_1221_0002 represents the second post-contrast phase.

Table 1 and Table 2 summarize the most representative dataset demographics, clinical variables, and image acquisition parameters. As described in Table 1, age and ethnicity information is available for more than 98% of the cases, while Body Mass Index (BMI) is available for 83% of cases. The MAMA-MIA dataset comprises 314 cases from women younger than 40 years old, constituting 21% of the total patients, and half of the dataset patients were younger than 50 years old at diagnosis. Therefore, the MAMA-MIA dataset can be considered well-balanced in terms of the young versus older population. Ethnicity distributions in the dataset are reflective of United States demographics [33], with 16% African American patients, less than 6% Asian and other ethnicities, and a majority Caucasian population (74.9%).

Clinical information available in more than 90% of the cases includes the presence of bilateral cancer at diagnosis, multi-focal cancer, tumor subtype, and pathological complete response (pCR) after neoadjuvant chemotherapy (NAC) treatment. Other relevant information included in the dataset, albeit not present in all cases, comprises survival status in over 450 cases, different tumor receptors, days to recurrence or metastasis, agents prescribed during NAC, the necessity of mastectomy after treatment, and more. A comprehensive list of clinical and imaging variables included in the dataset can be found in the Excel Table as part of the Supplementary Material.

Table 2 presents the most common imaging information included in the dataset: date of original collections, acquisition plane, magnetic field strength (Tesla) used for DCE-MRI acquisition, scanner manufacturers and models, number of bilateral and fat-suppressed DCE-MRIs, mean number of slices, slice thickness, pixel spacing, number of phases, and total number of cases obtained from each original collection. In the DCE-MRI medical imaging modality, the acquisition time interval between contrast administration (pre-contrast MRI) and subsequent post-contrast phases is an important factor. Table 3 summarizes the average time intervals between phases per dataset. It is notable that older datasets like ISPY1 and NACT have longer acquisition intervals than later collections.

Technical Validation

In this section, we provide a detailed validation of the segmentation quality in our dataset. Expert segmentations are inherently validated through expert review, but we also use preliminary automatic segmentations as a baseline for further validation. To assess the quality of these automatic segmentations, two external expert radiologists evaluated them, categorizing each segmentation into four quality levels: Missed, Poor, Acceptable, and Good. We present a comprehensive analysis of these quality assessments, using distance-based metrics such as the Dice Similarity Coefficient (DSC) and the 95 percentile Hausdorff Distance (HD), to compare the automatic segmentations with the final expert segmentations. Additionally, we explore the implications of these assessments and suggest potential improvements for future segmentation quality control.

A comprehensive list of expert quality scores and corresponding distance metrics is included in the Data Records as a CSV file, facilitating exhaustive analysis of automatic segmentation quality and expert agreement. Figure 6 illustrates the automatic segmentation distributions across different quality scores between the two experts, alongside with the DSC and HD between the final expert segmentations and the preliminary automatic segmentations.

Overall, Expert 1 deemed 669 (44.4%) automatic segmentations as Good quality, and Expert 2 a total of 652 (43.3%) (Figure 6 (a)). Good automatic segmentations correlated with notably high DSC values and minimal HD, suggesting robust agreement with manual segmentations. The efficiency and adequacy of GUI inspection for Good automatic segmentations were validated by these metrics. Automatic segmentations rated as Acceptable showed similar DSC and HD ranges across both experts, closely resembling manual segmentations (Figure 6 (b) and (c)). Expert 1 labeled 192 (12.7%) as Acceptable, while Expert 2 labeled 376 (24.9%). For automatic segmentations categorized as Poor, Expert 1 was more stringent than Expert 2, labeling as Poor a total of 325 (21.5%) of automatic segmentations compared to Expert 2’s 151 (10%). However, the standard deviation of DSC for Poor segmentations by Expert 1 exceeded that of Expert 2, at times surpassing the DSC range of Acceptable segmentations by the same expert. Both experts, considered that Missed tumors are less than 2% of the automatic segmentations, exhibiting lower DSC values and larger HD, indicating large differences from manual segmentations and confirming the bad quality of these segmentations. The disparity in the number of automatic segmentations categorized as Missed between experts suggests that additional information beyond the displayed GUI images may have been necessary to confirm tumor omission. The primary discrepancy between experts in evaluating automatic segmentation quality stemmed from categorization as Poor or Acceptable. Future revisions of segmentation quality control may benefit from reducing categories to three: Missed, Corrections Needed, and Good. This approach could minimize intra-observer variability while identifying automatic segmentations requiring manual correction. We consider the quality scores provided by the two experts for the automatic segmentations to be a valuable resource for studies on automatic segmentation quality control.

Additionally, to illustrate the corrections made by experts to the automatic segmentation, we grouped some examples in Figure 7 based on quality scores. The expert additions to the preliminary automatic segmentations are shown in green, while the voxels removed are depicted in pink. We can observe that for automatic segmentations assessed as Good in the visual quality control, the corrections are minimal, mainly focusing on refining cancer margins and other fine-grain details. Apart from using the Dice and the 95 percentile Hausdorff Distance (HD) metrics to evaluate the quality of the automatic segmentations and the extent of corrections needed, it is interesting to visualize Missed examples like ISPY2_566011, where the main lesion was not segmented, and the expert had to segment the missed lesion from scratch.

Usage Notes

The dataset and data records are hosted in Synapse (https://doi.org/10.7303/syn60868042). In addition, to facilitate the use of this dataset and to get familiarized with the data structure, we have released a Github repository. The repository contains Jupyter notebooks to read the images and visualize them and code to run the inference with the pretrained nnUNet model.

Code availability

The GitHub repository of this dataset is available in: https://github.com/LidiaGarrucho/MAMA-MIA. The automatic segmentation models were trained using the code from nnUNet GitHub repository in a 5-fold cross validation setting. The original DICOM images were transformed to NifTI using the pycad Python library. The Dice and 95% Hausdorff Distance metrics are computed using seg-metrics 1.2.7 [34] Python library.

Acknowledgements

The authors would like to express gratitude to all the participants who contributed to this study. Special appreciation is extended to the experts who performed manual segmentations for 1095 cases and the two experts who conducted quality control on the automatic segmentations. Additionally, our thanks go to Ritse Mann and Marco Caballo for facilitating the 251 manual segmentations from DUKE, as derived from their study [6]. We also acknowledge The Cancer Imaging Archive team for making the imaging and clinical data used in this study publicly available, as well as the additional 160 ISPY1 tumor segmentation data from the University of Chicago lab of Maryellen Giger, whose members participated in the TCGA Breast Phenotype Research Group.

Author contributions statement

L.G. designed and led the study. C.R. and L.G. collected the dataset, preprocessed the images, curated the clinical and scanner information, verified all the manual segmentations, and contributed to manuscript writing. S.J. and R.O. assisted in curating the manual segmentations from Caballo et al.’s study. S.J. assisted in training the automatic segmentation models and contributed to writing the corresponding section in the paper. R.O. led the section on image synthesis. K.K. designed the evaluation tool used to assess the automatic segmentations. A. T. assisted the experts during the manual segmentation process and built the dataset tables in the paper. J. R. and A.C. provided the expert quality scores of the automatic segmentations in this study. R.M. manually annotated 251 from DUKE dataset. F.P. helped with the data managing in TCIA. M.C., P.M.A., S.W.T., S.S.S., N.O.S., A.M.A, A.K., E.D., G.I., K.N., M.E.K., R.G., M.G., O.L. K.G., M.B. were the sixteen experts that performed the manual segmentations of the missing 1095 cases. K.K., A. T., C. M., M.P.S., K.M., F.S., O.D., L.I., and K.L. supervised the study. K.L. is the PI of the project that funded this research. All authors reviewed the manuscript.

Competing interests

The authors declare no competing interests.

Funding Statement

This project has received funding from the European Union’s Horizon 2020 research and innovation programmes under grant agreement No 952103 (EUCanImage) and No 101057699 (RadioVal). Also, this work was partially supported by the project FUTURE-ES (PID2021-126724OB-I00) from the Ministry of Science and Innovation of Spain. The co-author K.K. holds the Juan de la Cierva fellowship with a reference number FJC2021-047659-I.

Tables

Table 1: Social (upper half) and clinical (bottom half) variables of the accumulated dataset, MAMA-MIA, consisting of four breast cancer datasets: ISPY1 [9], ISPY2 [25], DUKE [28], and NACT [27]. Age is measured in years, Ethnicity is categorized in Caucasian/White, African American/Black, Asian and Other (Hispanic, American Indian/Alaskan native, Hawaiian/Pacific Islander, Multiple race) groups, while BMI (Body Mass Index) is categorized in the indicated groups using the patient weight and height (if patient height was missing, 1.65cm was used as default). pCR stands for pathological Complete Response and N/A for not available. Last row summarizes the number of cases per dataset and in total.

		MAMA-MIA
		ISPY1		ISPY2		DUKE		NACT		Total
Country		United States		United States		United States		United States		United States
Studies time-period		2002 – 2006		2010 – 2016		2000 – 2014		1995 – 2002		1995 – 2016
		#	(%)	#	(%)	#	(%)	#	(%)	#	(%)
Age	$<$ 40	35	(20.5)	208	(21.2)	61	(21.0)	13	(20.3)	317	(21.0)
	40-49	62	(36.3)	300	(30.6)	104	(35.7)	25	(39.1)	491	(32.6)
	50-59	56	(32.7)	317	(32.3)	74	(25.4)	18	(28.1)	465	(30.9)
	60-69	18	(10.5)	134	(13.7)	41	(14.1)	6	(9.4)	199	(13.2)
	$>=$ 70	0	(0.0)	18	(1.8)	11	(3.8)	2	(3.1)	31	(2.1)
	N/A	0	(0.0)	3	(0.3)	0	(0.0)	0	(0.0)	3	(0.2)
Ethnicity	Caucasian	129	(75.4)	777	(79.3)	177	(60.8)	45	(70.3)	1128	(74.9)
	African American	31	(18.1)	116	(11.8)	91	(31.3)	3	(4.7)	241	(16.0)
	Asian	7	(4.1)	68	(6.9)	7	(2.4)	4	(6.2)	86	(5.7)
	Other	2	(1.2)	16	(1.6)	14	(4.8)	3	(4.7)	35	(2.3)
	N/A	2	(1.2)	3	(0.3)	2	(0.7)	9	(14.1)	16	(1.1)
BMI	Underweight	6	(3.5)	17	(1.7)	7	(2.4)	5	(7.8)	35	(2.3)
	Normal	57	(33.3)	301	(30.7)	73	(25.1)	39	(60.9)	470	(31.2)
	Overweight	40	(23.4)	224	(22.9)	71	(24.4)	15	(23.4)	350	(23.2)
	Obesity class I	23	(13.5)	155	(15.8)	50	(17.2)	5	(7.8)	233	(15.5)
	Obesity class II	10	(5.8)	62	(6.3)	19	(6.5)	0	(0.0)	91	(6.0)
	Obesity class III	26	(15.2)	39	(4.0)	13	(4.5)	0	(0.0)	78	(5.2)
	N/A	9	(5.3)	182	(18.6)	58	(19.9)	0	(0.0)	249	(16.5)
Implants	Yes	1	(0.6)	29	(3.0)	0	(0.0)	0	(0.0)	30	(2.0)
Implants	No	170	(99.4)	951	(97.0)	291	(100.0)	64	(100.0)	1476	(98.0)
Bilateral cancer	Yes	3	(1.8)	20	(2.0)	7	(2.4)	0	(0.0)	30	(2.0)
Bilateral cancer	No	168	(98.2)	960	(98.0)	284	(97.6)	64	(100)	1476	(98.0)
Multifocal cancer	Yes	4	(2.3)	389	(39.7)	139	(47.8)	7	(10.9)	539	(35.8)
	No	7	(4.1)	591	(60.3)	152	(52.2)	57	(89.1)	807	(53.6)
	N/A	160	(93.6)	0	(0.0)	0	(0.0)	0	(0.0)	160	(10.6)
Tumor subtype	Luminal	67	(39.2)	381	(38.9)	123	(42.3)	21	(32.8)	592	(39.3)
	HER2 pos.	54	(31.6)	241	(24.6)	83	(28.5)	15	(23.4)	393	(26.1)
	Triple neg.	45	(26.3)	358	(36.5)	85	(29.2)	11	(17.2)	499	(33.1)
	N/A	5	(2.9)	0	(0.0)	0	(0.0)	17	(26.6)	22	(1.5)
pCR	Yes	49	(28.7)	316	(32.2)	64	(22.0)	11	(17.2)	440	(29.2)
	No	118	(69.0)	664	(67.8)	216	(74.2)	53	(82.8)	1051	(69.8)
	N/A	4	(2.3)	0	(0.0)	11	(3.8)	0	(0.0)	15	(1.0)
		171		980		291		64		1506

Table 2: Image acquisition variables of the accumulated dataset, MAMA-MIA, consisting of four breast cancer datasets: ISPY1 [9], ISPY2 [25], DUKE [28], and NACT [27]. The upper half shows general acquistion characteristics, while the bottom half shows specifics of the acquired sequences and slices. Magnetic field strength is measured in Tesla (T), while Slice thickness and Pixel spacing in mm. Other Scanner models include MAGNETOM Symphony, SymphonyTim, TrioTim, Verio, Skyra, Sonata, Vision, Vision plus, Prisma fit, Espree from Siemens; Signa Excite, Discovery MR750w, Discovery MR750, Signa HDx, Optima from GE; and Gyroscan Intera, Ingenia, Achieva, Intera from Philips. Other Image matrices or slice sizes (measured in pixels) include [320, 320], [400, 400], [416, 416], [432, 432], [448, 448], [480, 480], [528, 528], [560, 560], [576, 576], [640, 640], and [1024, 1024]. The Number of phases include the pre-contrast and all post-contrast phases. To have a broader overview, the mean value, as well as the minimum and maximum values (in []) are given for a selection of variables. Last row summarizes the number of cases per dataset and in total.

		MAMA-MIA
		ISPY1		ISPY2		DUKE		NACT		Total
		#	(%)	#	(%)	#	(%)	#	(%)	#	(%)
Acquisition plane	Axial	0	(0.0)	980	(100.0)	291	(100.0)	0	(0.0)	1271	(84.4)
Acquisition plane	Sagittal	171	(100.0)	0	(0.0)	0	(0.0)	64	(100.0)	235	(15.6)
Margnetic field strength	1.5	171	(100.0)	715	(73.0)	136	(46.7)	64	(100.0)	1086	(72.1)
Margnetic field strength	3.0	0	(0.0)	265	(27.0)	155	(53.3)	0	(0.0)	420	(27.9)
Fat suppression	Yes	170	(99.4)	976	(99.6)	290	(99.7)	64	(100.0)	1500	(99.6)
Fat suppression	No	1	(0.6)	4	(0.4)	1	(0.3)	0	(0.0)	6	(0.4)
Scanner manufacturer	SIEMENS (S)	44	(25.7)	252	(25.7)	115	(39.5)	0	(0.0)	411	(27.3)
	GE	115	(67.3)	611	(62.3)	176	(60.5)	64	(100.0)	966	(64.1)
	PHILIPS (P)	12	(7.0)	117	(11.9)	0	(0.0)	0	(0.0)	129	(8.6)
Scanner model	Avanto (S)	0	(0.0)	123	(12.6)	70	(24.1)	0	(0.0)	193	(12.8)
	SIGNA HDxt (GE)	0	(0.0)	536	(54.7)	59	(20.3)	0	(0.0)	595	(39.5)
	SIGNA GENESIS (GE)	103	(60.2)	0	(0.0)	0	(0.0)	64	(100)	167	(11.1)
	Other (S, GE, P)	68	(39.8)	321	(32.8)	162	(55.7)	0	(0.0)	551	(36.6)
Bilateral MRI	Yes	3	(1.8)	171	(17.4)	291	(100.0)	0	(0.0)	465	(30.9)
Bilateral MRI	No	168	(98.2)	809	(82.6)	0	(0.0)	64	(100.0)	1041	(69.1)
Image matrix	[256, 256]	156	(91.2)	33	(3.4)	0	(0.0)	62	(96.9)	251	(16.7)
	[384, 384]	0	(0.0)	149	(15.2)	0	(0.0)	0	(0.0)	149	(9.9)
	[512, 512]	15	(8.8)	721	(73.6)	176	(60.5)	2	(3.1)	914	(60.7)
	Other	0	(0.0)	77	(7.9)	115	(39.5)	0	(0.0)	192	(12.7)
Number of phases	3	146	(85.4)	0	(0)	4	(1.4)	58	(90.6)	208	(13.8)
	4-6	25	(14.6)	190	(19.4)	287	(98.6)	5	(7.8)	507	(33.7)
	>= 7	0	(0.0)	790	(80.6)	0	(0.0)	1	(1.6)	791	(52.5)
	mean [min, max]	3	[3, 6]	7	[4, 11]	4	[3, 6]	3	[3, 7]	6	[3, 11]
Number of slices	< 100	166	(97.1)	593	(60.5)	3	(1.0)	64	(100.0)	826	(54.8)
	100-199	1	(0.6)	369	(37.7)	257	(88.3)	0	(0.0)	627	(41.6)
	>= 200	4	(2.3)	18	(1.8)	31	(10.7)	0	(0.0)	53	(3.5)
	mean [min, max]	64	[44, 256]	106	[52, 256]	169	[60, 256]	60	[46, 64]	111	[44, 256]
Slice thickness	< 2.0	5	(2.9)	183	(18.7)	287	(98.6)	0	(0.0)	475	(31.5)
	2.0-2.9	131	(76.6)	796	(81.2)	4	(1.4)	64	(100.0)	995	(66.1)
	>= 3.0	35	(20.5)	1	(0.1)	0	(0.0)	0	(0.0)	36	(2.4)
	mean [min, max]	2.4	[1.5, 4.0]	2.0	[0.8, 3.0]	1.1	[1.0, 2.5]	2.0	[2.0, 2.4]	1.9	[0.8, 4.0]
Pixel spacing	< 0.5	15	(8.8)	2	(0.2)	0	(0.0)	2	(3.1)	19	(1.3)
	0.5-0.9	150	(87.7)	939	(95.8)	274	(94.2)	62	(96.9)	1425	(94.6)
	>= 1.0	6	(3.5)	39	(4.0)		(5.8)	0	(0.0)		(4.1)
	mean [min, max]	0.8	[0.4, 1.2]	0.7	[0.3, 1.4]	0.7	[0.5, 1.3]	0.7	[0.4, 0.9]	0.7	[0.3, 1.4]
		171		980		291		64		1506

Table 3: Average time intervals, measured in seconds, between the pre-contrast and the subsequent post-contrast phases with the corresponding minimum and maximum values in brackets. Here, details of up to sixth post-contrast phase are shown.

	Average Time Intervals
	pre to 1^st post		1^st to 2^nd post		2^nd to 3^rd post		3^rd to 4^th post		4^th to 5^th post		5^th to 6^th post
ISPY1	390	[27, 915]	284	[20, 531]	324	[24, 899]	142	[121, 162]	142	[121, 162]	–	–
ISPY2	145	[77, 761]	92	[59, 206]	92	[58, 206]	92	[58, 217]	91	[58, 206]	90	[59, 198]
DUKE	241	[94, 922]	124	[75, 354]	114	[74, 169]	118	[88, 391]	–	–	–	–
NACT	442	[331, 752]	362	[290, 901]	314	[283, 435]	288	[286, 289]	286	[286, 286]	308	[308, 308]
MAMA-MIA	203	[27, 922]	131	[20, 901]	100	[24, 899]	96	[58, 391]	91	[58, 286]	92	[59, 308]

References

[1] Mann, R. M., Cho, N. & Moy, L. Breast MRI: state of the art. \JournalTitleRadiology 292, 520–536, https://doi.org/10.1148/radiol.2019182947 (2019).
[2] Lambin, P. et al. Radiomics: extracting more information from medical images using advanced feature analysis. \JournalTitleEuropean Journal of Cancer 48, 441–446, 10.1016/j.ejca.2011.11.036 (2012).
[3] Poirot, M. G. et al. Robustness of radiomics to variations in segmentation methods in multimodal brain MRI. \JournalTitleScientific reports 12, 16712, https://doi.org/10.1038/s41598-022-20703-9 (2022).
[4] Hylton, N. M. et al. Neoadjuvant chemotherapy for breast cancer: functional tumor volume by MR imaging predicts recurrence-free survival—results from the ACRIN 6657/CALGB 150007 I-SPY 1 trial. \JournalTitleRadiology 279, 44–55, 10.1148/radiol.2015150013 (2016).
[5] O’Donnell, J. et al. The accuracy of breast MRI radiomic methodologies in predicting pathological complete response to neoadjuvant chemotherapy: A systematic review and network meta-analysis. \JournalTitleEuropean Journal of Radiology 157, 110561, https://doi.org/10.1016/j.ejrad.2022.110561 (2022).
[6] Caballo, M. et al. Four-dimensional machine learning radiomics for the pretreatment assessment of breast cancer pathologic complete response to neoadjuvant chemotherapy in dynamic contrast-enhanced MRI. \JournalTitleJournal of Magnetic Resonance Imaging 57, 97–110, 10.1002/jmri.28273 (2023).
[7] Khan, N., Adam, R., Huang, P., Maldjian, T. & Duong, T. Q. Deep learning prediction of pathologic complete response in breast cancer using MRI and other clinical data: a systematic review. \JournalTitleTomography 8, 2784–2795, https://doi.org/10.3390/tomography8060232 (2022).
[8] Clark, K. et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. \JournalTitleJournal of digital imaging 26, 1045–1057, https://doi.org/10.1007/s10278-013-9622-7 (2013).
[9] Newitt, D., Hylton, N. et al. Multi-center breast DCE-MRI data and segmentations from patients in the I-SPY 1/ACRIN 6657 trials. \JournalTitleThe Cancer Imaging Archive https://doi.org/10.7937/K9/TCIA.2016.HdHpgJLK (2016).
[10] Chitalia, R. et al. Expert tumor annotations and radiomics for locally advanced breast cancer in DCE-MRI for ACRIN 6657/I-SPY1. \JournalTitleScientific data 9, 440, 10.1038/s41597-022-01555-4 (2022).
[11] Campello, V. M. et al. Multi-centre, multi-vendor and multi-disease cardiac segmentation: the M&Ms challenge. \JournalTitleIEEE Transactions on Medical Imaging 40, 3543–3554, https://doi.org/10.1109/TMI.2021.3090082 (2021).
[12] Martín-Isla, C. et al. Deep learning segmentation of the right ventricle in cardiac MRI: The M&Ms challenge. \JournalTitleIEEE Journal of Biomedical and Health Informatics https://doi.org/10.1109/JBHI.2023.3267857 (2023).
[13] Bakas, S. et al. Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. \JournalTitleScientific data 4, 1–13, https://doi.org/10.1038/sdata.2017.117 (2017).
[14] Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. \JournalTitleNature methods 18, 203–211, https://doi.org/10.1038/s41592-020-01008-z (2021).
[15] Robinson, R. et al. Automated quality control in image segmentation: application to the uk biobank cardiovascular magnetic resonance imaging study. \JournalTitleJournal of Cardiovascular Magnetic Resonance 21, 1–14, https://doi.org/10.1186/s12968-019-0523-x (2019).
[16] Osuala, R. et al. Data synthesis and adversarial networks: A review and meta-analysis in cancer imaging. \JournalTitleMedical Image Analysis 102704, https://doi.org/10.1016/j.media.2022.102704 (2022).
[17] Osuala, R. et al. Pre-to post-contrast breast MRI synthesis for enhanced tumour segmentation. In Medical Imaging 2024: Image Processing, vol. 12926, 226–237 (SPIE, 2024).
[18] Osuala, R. et al. Towards learning contrast kinetics with multi-condition latent diffusion models. \JournalTitlearXiv preprint arXiv:2403.13890 https://doi.org/10.48550/arXiv.2403.13890 (2024).
[19] Müller-Franzes, G. et al. Using machine learning to reduce the need for contrast agents in breast MRI through synthetic images. \JournalTitleRadiology 307, e222211, https://doi.org/10.1148/radiol.222211 (2023).
[20] Ma, J. & Wang, B. Segment anything in medical images. \JournalTitlearXiv preprint arXiv:2304.12306 https://doi.org/10.48550/arXiv.2304.1230 (2023).
[21] Kirillov, A. et al. Segment anything. \JournalTitlearXiv preprint arXiv:2304.02643 https://doi.org/10.48550/arXiv.2304.02643 (2023).
[22] Hu, M., Li, Y. & Yang, X. BreastSAM: A study of segment anything model for breast tumor detection in ultrasound images. \JournalTitlearXiv preprint arXiv:2305.12447 https://doi.org/10.48550/arXiv.2305.12447 (2023).
[23] Xiong, X., Wang, C., Li, W. & Li, G. Mammo-SAM: Adapting foundation segment anything model for automatic breast mass segmentation in whole mammograms. In International Workshop on Machine Learning in Medical Imaging, 176–185, https://doi.org/10.1007/978-3-031-45673-2_18 (Springer, 2023).
[24] Hylton, N. M. et al. Locally advanced breast cancer: MR imaging for prediction of response to neoadjuvant chemotherapy—results from ACRIN 6657/I-SPY trial. \JournalTitleRadiology 263, 663–672, https://doi.org/10.1148/radiol.12110748 (2012).
[25] Li, W. et al. I-SPY 2 breast dynamic contrast enhanced MRI trial (version 1) [data set]. \JournalTitleThe Cancer Imaging Archive https://doi.org/10.7937/TCIA.D8Z0-9T85 (2022).
[26] Newitt, D. C. et al. ACRIN 6698/I-SPY2 breast DWI [data set]. \JournalTitleThe Cancer Imaging Archive https://doi.org/10.7937/TCIA.KK02-6D95 (2021).
[27] Newitt, D. & Hylton, N. Single site breast DCE-MRI data and segmentations from patients undergoing neoadjuvant chemotherapy (version 3) [data set]. \JournalTitleThe Cancer Imaging Archive https://doi.org/10.7937/K9/TCIA.2016.QHsyhJKy (2016).
[28] Saha, A. et al. Dynamic contrast-enhanced magnetic resonance images of breast cancer patients with tumor locations [data set]. \JournalTitleThe Cancer Imaging Archive https://doi.org/10.7937/TCIA.e3sv-re93 (2021).
[29] Morris, E. et al. Using computer-extracted image phenotypes from tumors on breast MRI to predict stage [data set], https://doi.org/10.7937/K9/TCIA.2014.8SIPIY6G (2014).
[30] Saha, A. et al. A machine learning approach to radiogenomics of breast cancer: a study of 922 subjects and 529 DCE-MRI features. \JournalTitleBritish journal of cancer 119, 508–516, https://doi.org/10.1038/s41416-018-0185-8 (2018).
[31] Li, H. et al. Quantitative MRI radiomics in the prediction of molecular classifications of breast cancer subtypes in the TCGA/TCIA data set. \JournalTitleNPJ breast cancer 2, 16012, https://doi.org/10.1038/npjbcancer.2016.12 (2016).
[32] Lancaster, J. L. & Martinez, M. J. Mango Viewer. http://rii.uthscsa.edu/mango/ (2019). Accessed: 2024-05-27.
[33] Bureau, U. C. Quickfacts. \JournalTitleHouston Texas/Harris County: United States Census (2016).
[34] Jia, J. Jingnan-jia/segmentation_metrics: V1.2.7, 10.5281/zenodo.12094185 (2024).