Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

A Survey of Publicly Available MRI Datasets For Potential Use in Artificial Intelligence Research

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/375042348

A Survey of Publicly Available MRI Datasets for Potential Use in Artificial


Intelligence Research

Article in Journal of Magnetic Resonance Imaging · October 2023


DOI: 10.1002/jmri.29101

CITATIONS READS

6 840

8 authors, including:

Katja Pinker Sarah Eskreis-Winkler


Columbia University Weill Cornell Medical College
427 PUBLICATIONS 11,434 CITATIONS 33 PUBLICATIONS 642 CITATIONS

SEE PROFILE SEE PROFILE

Joseph N Stember
Memorial Sloan Kettering Cancer Center
51 PUBLICATIONS 473 CITATIONS

SEE PROFILE

All content following this page was uploaded by Joseph N Stember on 29 January 2024.

The user has requested enhancement of the downloaded file.


REVIEW

A Survey of Publicly Available MRI Datasets


for Potential Use in Artificial Intelligence
Research
Katharine A. Dishner, BA,1,2* Bala McRae-Posani, MSc,1,3 Arka Bhowmik, PhD,1
Maxine S. Jochelson, MD,1 Andrei Holodny, MD,1,4,5 Katja Pinker, MD, PhD,1
Sarah Eskreis-Winkler, MD, PhD,1 and Joseph N. Stember, MD, PhD1,4

Artificial intelligence (AI) has the potential to bring transformative improvements to the field of radiology; yet, there are
barriers to widespread clinical adoption. One of the most important barriers has been access to large, well-annotated,
widely representative medical image datasets, which can be used to accurately train AI programs. Creating such datasets
requires time and expertise and runs into constraints around data security and interoperability, patient privacy, and appro-
priate data use. Recognizing these challenges, several institutions have started curating and providing publicly available,
high-quality datasets that can be accessed by researchers to advance AI models. The purpose of this work was to review
the publicly available MRI datasets that can be used for AI research in radiology. Despite being an emerging field, a simple
internet search for open MRI datasets presents an overwhelming number of results. Therefore, we decided to create a sur-
vey of the major publicly accessible MRI datasets in different subfields of radiology (brain, body, and musculoskeletal), and
list the most important features of value to the AI researcher. To complete this review, we searched for publicly available
MRI datasets and assessed them based on several parameters (number of subjects, demographics, area of interest, techni-
cal features, and annotations). We reviewed 110 datasets across sub-fields with 1,686,245 subjects in 12 different areas of
interest ranging from spine to cardiac. This review is meant to serve as a reference for researchers to help spur advance-
ments in the field of AI for radiology.
Level of Evidence: Level 4
Technical Efficacy: Stage 6
J. MAGN. RESON. IMAGING 2024;59:450–480.

M uch has been published about the transformative


potential of artificial intelligence (AI) for medicine.1–6
In radiology, AI advocates argue that adopting AI could facili-
quality out is an abiding limiting principle, as the output of
the algorithms can only be as good as the data used to train,
validate, and test it.
tate workflow efficiencies, shorten reading times, facilitate ear- The limited availability of high-quality and accessible
lier disease detection, and enhance diagnostic accuracy.4 Yet, datasets is accounted for by many factors. As mentioned above,
widespread clinical adaptability and adoption of AI in radiology effective training of neural networks, especially those based on
continues to lag behind its potential.7 Barriers include algorith- traditional neural network optimizing approaches such as sto-
mic and hardware limitations, regulatory hurdles, and, most chastic gradient descent, demand hundreds, if not thousands of
importantly, the limited availability of appropriate training images, and hardware capabilities advanced enough to process
data. Deep learning usually requires large-scale, high-quality, such large amounts of data. This amount of imaging data are
and widely representative medical image datasets with clinical unlikely to exist at any single research institution. Issues of data
annotations. When it comes to AI algorithms, quality in security, interoperability, and proprietary concerns complicate

View this article online at wileyonlinelibrary.com. DOI: 10.1002/jmri.29101

Received Jul 31, 2023, Accepted for publication Oct 16, 2023.
*Address reprint requests to: K.A.D., SUNY Downstate College of Medicine, Brooklyn, NY 11203, USA. E-mail: katharine.dishner@downstate.edu
The first two authors are the Co-first authors.

From the 1Department of Radiology, Memorial Sloan Kettering Cancer Center, New York City, New York, USA; 2SUNY Downstate College of Medicine,
Brooklyn, New York, USA; 3Weill Cornell Medicine, New York City, New York, USA; 4Department of Radiology, Weill Cornell Medicine, New York City, New
York, USA; and 5Department of Neuroscience, Weill Cornell Graduate School of the Medical Sciences, New York City, New York, USA

450 © 2023 International Society for Magnetic Resonance in Medicine.


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Dishner et al.: MRI in AI Research

the pooling of data between universities and hospitals, which PubMed or Google presents a bewildering number of datasets
might otherwise allow them to reach a critical mass.8,9 The that are highly variable in size, curation, credibility, and acces-
highest quality datasets, which would allow for accurate classifi- sibility. Through this work, we hope to provide the reader
cation and detection, are those with plentiful labels and author- with a ready review of the major publicly accessible MRI
itative annotations, such as ground truth disease status, datasets in different subfields of radiology (brain, body, and
segmentations, and demographics. Such annotations require musculoskeletal), and to list their most important features
time and domain expertise from board-certified radiologists.2,9 which might prompt further inquiry.
An additional issue to consider is the dearth of data on rare dis-
eases. Such data are inherently limited in their amount and
Materials and Methods
likely only exist in specialized tertiary care institutions and aca-
demic hospitals that have the resources and expertise to treat Data Collection
Between July 1 and 15, 2023, Google, Dataset Search, Scientific
these patients. Another factor is data heterogeneity, which is
Data, and PubMed searches were performed with the following
critical for the AI applications to be generalizable beyond the
phrases: “Publicly available MRI databases for AI research,” “Open
training datasets.10 Data heterogeneity can stem from images access MRI databases for AI research,” and “Publicly accessible MRI
being captured on equipment made by different vendors at dif- databases for AI research.” The above searches were then repeated
ferent institutions, as well as the variability in the protocols with the term “database” replaced with term “dataset.” The searches
used during image capture, including choice of sequence were not limited to any specific time-period. Each search generated
parameters such as TE, TR, TI, length of acquisition, 2D millions of results. Successive pages of search results were reviewed
vs. 3D, and so on. Equally important is the lack of diverse and each result was identified as a potential dataset to review. The
datasets containing images representative of a range of coun- concept of ‘theoretical saturation’ from qualitative research was
tries, ethnicities, and other socioeconomic variables, which can borrowed to guide the data collection.15 “Data saturation” was
lead to unintended bias in the algorithms and ultimately affect reached when enough dataset sources were collected such that no
new information was being discovered by clicking on successive sea-
patient care.11
rch result pages.
Complicating these issues is the concern around
protecting patient privacy and ensuring appropriate data
use.12,13 Collating large datasets, especially across institutional Criteria for Inclusion and Exclusion
boundaries, needs an increased emphasis on ensuring institu- Among the datasets, some had their own dedicated websites
tional review board (IRB) approvals and rigorous de- (e.g., ABIDE). All such datasets were included in this review. Others
were part of a portal that hosted multiple (from 2 to >575) datasets
identification of image labels, and images themselves, where
(e.g., TCIA, OpenNeuro). For portals with 25 datasets or less, all
sometimes patients’ personal effects could creep in as artifacts.
datasets with human MR images were included. For portals with
This again is a resource-intensive activity, both in terms of 26–50 datasets, all datasets with 50 or more subjects were included.
time and labor, requiring specialized skills. The number of subjects was chosen as inclusion criteria because AI
In recognition of the importance of having access to applications train best on larger datasets. While 50 is still a relatively
high-quality medical image datasets, and of the challenges low number, some small data AI approaches could still be used for
involved in collating them, several admirable initiatives have training on them. Further, smaller datasets could be pooled together
been undertaken by universities, hospitals, government agen- to create larger and potentially more heterogeneous datasets. For por-
cies, and other research institutes toward curating the data- tals with more than 50 datasets, only the 11 largest datasets were
bases and making them accessible to anyone who needs them included. Finally, as a special case, given the abundance of brain
for research purposes. These include institutional efforts such MRI datasets, any dataset with n < 50 was excluded.
as The Cancer Imaging Archive (TCIA) in the US or The list of search results was filtered down to 110 datasets
with 1,686,245 subjects and 12 different areas of interest (see
Observatoire Française de la Sclérose en Plaque in France,
Fig. 1). Some of these datasets showed overlaps regarding subjects
and individual researchers who made their published research
(e.g., some of the entries in the NKI Rockland Sample dataset were
datasets accessible as best practice to advance other also part of 1000 Functional Connectomes Project dataset).
research using the data. An additional avenue for well-curated
datasets has been public image analysis competitions, such as
those conducted by Radiological Society of North America Parameters Reported
For the choice of parameters to report on for each dataset, variables
and the brain tumor segmentation (BraTS) Challenge, which
of interest for AI research were considered. Reported parameters
encourage collaboration and community building through included: the number of subjects, demographic information, type of
competition, while advancing AI research in radiology.14 access (e.g., if registration is required), single or multi-center study,
The purpose of this work was to review the publicly organizing principle of the dataset (disease, anatomy, and healthy
available MRI datasets, which can be used for AI research in volunteers), technical specification of the images (sequence of the
radiology. Despite being relatively nascent nature of the field, MRI and co-registration status), and availability of clinical metadata.
a simple search for open MRI datasets on search engines like Additionally, any labels/annotations relating to the heterogeneity of

February 2024 451


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Magnetic Resonance Imaging

FIGURE 1: 44.9% of the selected datasets were brain, 43.1% were body, and 11.9% were musculoskeletal.

the datasets, and/or country of origin were included in the “notable (n = 10, Table 11). The musculoskeletal datasets were
features” column. divided into spine (n = 9, Table 12) and knee (n = 4,
For each dataset, a hyperlink was included to allow readers to Table 13).
locate datasets of interest. While all of these are freely available
open-access datasets, some of them are one-click downloads and
some require registration with the website and agreeing to their data TABLE 1. Overview of Reviewed Datasets
sharing policy before being allowed access. Some datasets are Anatomy Category # Datasets # Subjects
restricted for academic or research use only, while others allow com-
mercial use. We highlighted these restrictions as applicable. Brain Neurodegeneration 9 15,261
Development/ 11 19,717
Results psychology

The 110 datasets were organized into three major categories OpenNeuro 11 4744
by anatomy: brain, body, and musculoskeletal. Figure 1 Miscellaneous 18 86,050
shows the breakdown of datasets by areas of interest within brain
these categories. Table 1 presents an overview of the number Total 49 125,772
of datasets and subjects in each sub-category within brain,
Body Prostate 13 3787
body, and musculoskeletal categories.
Breast 11 3605
Brain Datasets Cardiac 7 3329
The brain datasets were divided into neurodegeneration Kidney/liver 6 577
(n = 9, Table 2), development/psychology (n = 11,
General 11 1,540,623
Table 3), healthy brain (n = 8, Table 4), and miscellaneous,
which included brain cancer, traumatic brain injury (TBI), Total 48 1,551,921
and COVID-19 (n = 10, Table 5). OpenNeuro is a portal that Musculoskeletal Knee 4 7821
hosts close to 600 (as of July 2023 when we last accessed it) dif- Spine 9 731
ferent brain MRI datasets covering many topics in neuroscience.
Total 13 8552
Of these, the 11 largest datasets were included (Table 6).
Total 110 1,686,245
Body and Musculoskeletal Datasets
This table summarizes the number of datasets and number of
The body datasets were divided into kidney/liver subjects per each subcategory within brain, body, and musculo-
(n = 6, Table 7), prostate (n = 13, Table 8), cardiac skeletal system. There are a total of 110 datasets. Forty-nine
(n = 7, Table 9), breast (n = 11, Table 10), and general were brain, 48 were body, and 13 were musculoskeletal. There
were 125,772 subjects in the brain datasets; 1,551,921 subjects
(n = 11, Table 11). The Cancer Imaging Archive (TCIA) in the body datasets; and 8,552 subjects in the musculoskeletal
houses 49 human, publicly available MRI databases. datasets.
We selected all datasets with 50 or more subjects to include

452 Volume 59, No. 2


TABLE 2. Neurodegeneration Datasets

Sequence Co-
Portal Datasets Access Type Centers Subjects Theme Sequence/Modality registration Notable Features Clinical Metadata

February 2024
Alzheimer’s ADNI-1 Academic and Multicenter 1921 Alzheimer’s disease DTI, fMRI, DWI, All sequences are not Originated from Available TA.
disease research (+normal PWI, resting-state co-registered USA
ANDI-GO
neuroimaging purpose only controls) fMRI
initiative ADNI-2 TA Age, inclusion and
exclusion criteria
ANDI-3
differ for NC,
MCI, and AD.
Dementias See notea Academic, Multicenter 10,310 Neurodegeneration Resting-state fMRI, Not reported This is a global Available TA.
Platform research, (+normal task-based fMRI, dataset started in
Age (>18 years),
and controls) T1w, T2w, and the UK
inclusion and
commercial DWI
exclusion criteria
purposes TA
differ for each
cohort.
NITRC— Dallas lifespan brain Open access to Single 315 Alzheimer disease MP-RAGE, DTI, All sequences are not Originated from Not reported.
neuroimaging study all DWI, task-based co-registered USA
Age (20–89 years),
tools and fMRI, resting-
inclusion and
resources state fMRI
exclusion criteria
collaboratory
not reported.
NITRC High-quality DWI Open access to Single 53 Parkinson’s disease DWI (b-values 1000 Co-registered Originated from Limited metadata
of Parkinson’s all (+normal and 2500 sec/ Belgium and the available
disease controls) mm2) dataset contains
Age (47–81 years),
NC26 and PD27
inclusion and
exclusion criteria
by Ziegler
et al.17
OASIS brains OASIS-1: cross- Open access to Single 416 Alzheimer’s disease T1w Co-registered Originated from Limited metadata
datasets sectional MRI all (+normal USA available.
data in young, controls)
Age (18–96 years),
middle aged,
inclusion and
nondemented and
exclusion criteria
demented older
by Marcus
adults
et al.18
OASIS brains OASIS-2: Open access to Single 150 Alzheimer’s disease T1w Co-registered Originated from Limited metadata
datasets longitudinal MRI all (+normal USA available.
data in controls)

453
Dishner et al.: MRI in AI Research

15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 2. Continued

454
Sequence Co-
Portal Datasets Access Type Centers Subjects Theme Sequence/Modality registration Notable Features Clinical Metadata
nondemented and Age (60–
demented older 96 years),
adults inclusion and
exclusion
criteria by
Marcus et al.19

OASIS brains OASIS-3: Academic and Single 1379 Alzheimer’s disease T1w, T2w, FLAIR, All sequences are not Originated from Available TA.
datasets longitudinal research (+normal ASL, SWI, time co-registered but USA
Age (42–95 years),
Journal of Magnetic Resonance Imaging

multimodal purpose only controls) of flight, resting- registered during


inclusion and
neuroimaging, TAb state fMRI, and postprocessing
exclusion criteria
clinical, and DTI
by LaMontagne
cognitive dataset
et al.20
for normal aging
and Alzheimer’s
disease
OASIS brains OASIS-4: clinical Academic and Single 663 Alzheimer’s disease T1w Co-registered Originated from Clinical data
datasets cohort research (+normal USA available TA.
purpose only controls)
Age (21–94 years),
TAb
inclusion and
exclusion criteria
by Koenig
et al.21
BrainLife MRI dataset of Academic and Multicenter 54 Dementia and T1w, T2w, FLAIR, NR Originated from Available.
Nigerian brains research Parkinson’s disease mask for T2w Africa
Age (41–84 years),
purpose only (+normal
inclusion and
TA controls)
exclusion criteria
not reported.

This table summarizes the neurodegenerative datasets. There were 9 datasets with 15,261 subjects across the datasets. Eleven different sequences/modalities were reported.
AD = Alzheimer disease; ANDI = Alzheimer’s disease neuroimaging imitative; ASL = arterial spin labeling; DWI = diffusion-weighted imaging; DTI = diffusion tensor imaging;
FLAIR = fluid attenuated inversion recovery; fMRI = functional MRI; MCI = mild cognitive impairment; MP-RAGE = magnetization prepared–rapid gradient echo T1w; NC = normal
control; OASIS = open access series of imaging studies; PD = Parkinson’s disease, PWI = proton weighted imaging; SWI = susceptibility-weighted imaging; TA = through application;
T1w = T1-weighted; T2w = T2-weighted.
a
This database consists 19 smaller cohorts. The access to each cohort is only possible through application and upon agreement of usage terms.
b
Access to these databases is only possible through application and upon agreement of usage terms.

Volume 59, No. 2


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 3. Development/Psychology Related Datasets

Sequence/ Sequence Co-


Portal Datasets Access Type Centers Subjects Theme Modality registration Notable Features Clinical Metadata

February 2024
NITRC— The 1000 Functional Open access to all Multicenter 1288 Further our Resting-state All sequences are Data comes from Limited metadata
Neuroimaging Connectomes understanding of the fMRI, MP- not co- 33 different available.
Tools and Project development of RAGE registered institutions.
Age (Children and
Resources psychopathologies, Data originated
adult), inclusion/
Collaboratory neurocognitive from USA,
exclusion criteria
impairment, China,
site dependent.
developmental delays, Germany,
etc. Netherland,
Canada,
Finland, UK,
Taiwan,
Austria
NITRC CANDI Share: Academic and Multicenter 103 Bipolar (with and T1w, T2w Co-registered Dataset also Available.
Schizophrenia research without psychosis), Contains
Age (6–17 years),
Bulletin 2008 purpose only schizophrenia, and anatomic
inclusion and
TA healthy controls segmentation
exclusion criteria
of brain
provided by
Frazier et al.22
NITRC Autism Brain Academic and Multicenter 1112 Autism (+normal MP-RAGE, DTI, All sequences are Data comes from Available.
Imaging Data research controls) and resting- not co- 17 different
Age (7–64 years),
Exchange I purpose only state fMRI registered institutions.
inclusion and
TA Data originated
exclusion criteria
from USA,
provided by Di
Germany,
Martino et al.23
Netherlands,
Ireland,
Belgium
NITRC Autism Brain Academic and Multicenter 1114 Autism (+normal MP-RAGE, DTI All sequences are Data comes Available.
Imaging Data research controls) and resting- not co- from19
Age (5–64 years),
Exchange II purpose only state fMRI registered different
inclusion and
TA institutions.
exclusion criteria
Data originated
provided by Di
from USA,
Martino et al.24
Switzerland,
France,
Belgium,
Ireland

455
Dishner et al.: MRI in AI Research

15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 3. Continued

456
Sequence/ Sequence Co-
Portal Datasets Access Type Centers Subjects Theme Modality registration Notable Features Clinical Metadata
NY Langone Health ABIDE corpus Open access to all Multicenter 1112 mask Autism (+normal Same as ABIDE I Same as ABIDE I Same as ABIDE I Only mask
callosum and brain controls) dataset dataset dataset available.25 All
segmentation data other details are
same as ABIDE I
dataset
NITRC ADHD-200 Open access to all Multicenter 776 ADHD (+normal T1w, MP-RAGE, All sequences are Data comes from Available.
controls) and resting- not co- eight different
Age (7–27 years),
state fMRI registered. institutions.
inclusion and
Postprocessed Data originated
Journal of Magnetic Resonance Imaging

exclusion criteria
data include from USA and
provided Bellec
denoised and China.
et al.26
co-registered
volume.
NITRC CMI Healthy Brain Restricted access Not reported 10,000a Diagnosis and T1w, T2w, DKI, Not reported Data originated Restricted access.
Network management of mental and resting- from New
Age (5–21 years),
health and learning state fMRI York City area
inclusion and
disorders children and
exclusion by
adolescents
Alexander et al.27
NITRC Brain Genomics Restricted access Multicenter 1570 Brain and behavior of T1w, MP-RAGE, Not reported Each functional Available.
Superstructure healthy subjects and resting- acquisition is
Age (18–35 years),
Project state fMRI accompanied
inclusion and
by a fully
exclusion by
automated
Holmes et al.28
quality
assessment and
precomputed
brain
morphometrics.
Data originated
from USA.
NITRC Pediatric Imaging, Restricted access Multicenter 1400 Individual differences in Not reported Not reported Data comes from Restricted access
Neurocognition, brain structure and 10 different
Age (3–20 years),
and Genetics connectivity, institutions and
inclusion and
Study cognition, and includes data
exclusion NR.
personality from children
and
adolescents.

Volume 59, No. 2


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 3. Continued

Sequence/ Sequence Co-


Portal Datasets Access Type Centers Subjects Theme Modality registration Notable Features Clinical Metadata

February 2024
Data originated
from USA.
OpenNeuro TractoInferno: A Open access to all Multicenter 354 Individuals with T1w, DTI, DWI, All sequences are Data came from Restricted access
large-scale, open- bilingual, field maps, not co- six different
Age (21–58 years),
source, multisite schizophrenia, bipolar tractogram registered. institutions.
inclusion and
database disorder, ADHD, and Data originated
exclusion by
tractography sleepy brain from France,
Poulin et al.29
UK, USA,
Sweden,
Canada.
Mri-share database A brain MRI study of Academic and Single 2000 University student T1w, FLAIR, Not reported Data originated Restricted access
Bordeaux research (neuropsychiatric SWI, and from France
Age (18–35 years),
university students purpose only condition such as resting-state
Inclusion and
TA migraine, depression fMRI
exclusion criteria
and anxiety disorders,
by Tsuchida
and substance abuse)
et al.30

This table represents the 11 datasets focused on brain MRIs that would aid researchers interested in development/psychology. There were 19,717 total subjects across the datasets.
ADHD = attention deficit hyperactivity disorder; CANDI = the child and adolescent neuro development initiative; CMI = child mind institute; DKI = diffusion kurtosis imaging;
DTI = diffusion tensor imaging; DWI = diffusion-weighted imaging; FLAIR = fluid attenuated inversion recovery; fMRI = functional MRI; MP-RAGE = magnetization prepared–rapid
gradient echo T1w; SWI = susceptibility weighted imaging; TA = through application; T1w = T1-weighted; T2w = T2-weighted.
a
This study is prospective in nature and the number represent the set goal. The current volume of dataset for the study is not reported.

457
Dishner et al.: MRI in AI Research

15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
458
TABLE 4. Healthy Neuro Datasets

Sequence/ Sequence Co- Clinical


Portal Datasets Access Type Centers Subjects Theme Modality registration Notable Features Metadata
NITRC Beijing Open access to Single 256 Healthy subjects DTI, (MP-RAGE Not co-registered This data comes Available.
University– all with and from a
Age (18–
Enhanced, Eyes without skull), community
26 years),
Open Eyes resting-state sample at
inclusion and
Closes, Short fMRI Beijing Normal
exclusion by
TR University in
Tian et al. and
China
Yan et al.31,32
Journal of Magnetic Resonance Imaging

NITRC Consortium for Academic and Multicenter 1629 Test–retest, Resting-state Not reported This data comes Available.
reliability and research reliability, fMRI, DTI, from 33
Age (6–
reproducibility purpose only reproducibility ASL different
88 years),
TA institutions
inclusion and
from Germany,
exclusion not
China, USA,
reported.
Canada
NITRC INDI Open access to Single 207 Exploratory study Resting-state Not reported This data comes Available.
NKI/Rockland all of healthy fMRI, DTI, from subjects
Age (4–
Sample subjects MP-RAGE, from USA
85 years),
T2w
Inclusion and
exclusion not
reported.
NITRC IXI Dataset Open access to Multicenter 584 Healthy subjects T1w, T2w, PD, Co-registered This data Available.
all MRA, DWI originated from
Age (19–
UK and comes
86 years),
from three
inclusion/
different
exclusion
scanners
criteria study/
atlas
dependent.
NY fastMRI (Brain) Academic and Multicenter 6,970 Healthy and non- T1w, T2w, Not reported Dataset originated Restricted access.
Langone research healthy brain FLAIR from USA
Other details not
Health purpose only MRIs
reported.
TA

Volume 59, No. 2


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 4. Continued

Sequence/ Sequence Co- Clinical


Portal Datasets Access Type Centers Subjects Theme Modality registration Notable Features Metadata

February 2024
Synapse Cuban Human Open access to Single 282 Brain mapping of T1w, DWI, field All sequences are Dataset originated Available.
Brain Mapping all healthy subjects map not co- from cuba
Age (18–
Project registered
68 years),
inclusion and
exclusion
criteria by
Valdes-Sosa
et al.33
Figshare Pediatric Open access to Single 120 Perfusion study of T1w, pCASL Co-registered Dataset originated Available.
Template of all pediatric brain images, DTI, from USA
Age (7–
Brain Perfusion BOLD
18 years),
Dataset
inclusion and
exclusion
criteria by
Avants et al.34
Zenodo M4Raw dataset Open access to Single 183 Includes Only T1w, T2w, Not reported Dataset originated Available.
all k-space data of FLAIR from China
Age (18–
healthy subjects wherein an
32 years),
open MRI is
Inclusion and
used
exclusion
criteria by Lyu
et al.35

This table represents the 8 datasets consisting of brain MRIs of healthy patients. There were 10,231 subjects across the datasets. Eleven sequences/modalities were reported.
ASL = arterial spin labeling; BOLD = blood oxygen level-dependent; DTI = diffusion tensor imaging; DWI = diffusion-weighted imaging; FLAIR = fluid attenuated inversion recovery;
fMRI = functional MRI; INDI = international neuroimaging data-sharing initiative; IXI = information extraction from images; MPRAGE = magnetization-prepared rapid gradient-echo;
MRA = MR angiography; NITRC = Neuroimaging Tools and Resources Collaboratory; NKI = Nathan Kline Institute; pCASL = pseudo continuous arterial spin labeled; PD = proton-
density; TA = through application; T1w = T1-weighted; T2w = T2-weighted.

459
Dishner et al.: MRI in AI Research

15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
460
TABLE 5. Miscellaneous Brain Dataset (Cancer, Traumatic brain injury, and COVID-19)

Sequence Co-
Portal Datasets Access Type Centers Subjects Theme Sequence/Modality registration Notable Features Clinical Metadata

TCIA The University of Open access to all Single 495 Glioblastoma, T2w, FLAIR, SWI, All sequences are not Images underwent Available.
California San astrocytoma, DWI, pre- and post- co-registered automated
Age (17–94 years),
Francisco oligodendroglioma contrast T1w, ASL, which were co- segmentation and
Inclusion criteria
Preoperative and HARDI registered in post- then were
brain tumor.36
Diffuse Glioma processed data. manually
MRI corrected by
(UCSF-PDGM) trained
radiologists and
approved by two
expert reviewers.
Segmentation
Journal of Magnetic Resonance Imaging

included three
major tumor
compartments:
enhancing tumor,
non-enhancing/
necrotic tumor,
and surrounding
FLAIR
abnormality
(edema).

TCIA Multi-parametric Open access to all Single 630 Glioblastoma T1w, T2w, DTI, FLAIR All sequences are not This dataset has Available.
MRI (mpMRI) (+segmentation co-registered labels that were
Age (18–70 years),
scans for de novo mask) which were co- used to highlight
Inclusion criteria
Glioblastoma registered in post- the following
brain tumor37
(GBM) patients processed data. imaging features:
from the intensity,
University of volumetric,
Pennsylvania morphologic,
Health System histogram-based,
(UPENN-GBM) and textural

Stanford BrainMetShare Academic and Single 156 Brain metastasis T1 spin-echo pre- Co-registered 105 cases in this Available.
University research purpose contrast, T1 spin-echo dataset have
Age (29–92 years),
Center for only TA post-contrast, T1 radiologist-drawn
Inclusion criteria
Artificial gradient-echo post, segmentations of
brain lesion by
Intelligence in T2 FLAIR post the metastatic
Grøvik et al.38
Medicine and lesions
Imaging (AIMI)

Center for BraTS Challenge Open access to all Multicenter 4500 Broad brain data T1w, T2w, T2-FLAIR All sequences are not This dataset has Limited metadata by
Biomedical 2021 including cancer co-registered ground truth Baid et al.39
Image which were co- annotations of the
Computing & registered in post- tumor sub-regions
Analytic processed data.

Not reported 66,935a TBI Not reported Not reported Restricted access

Volume 59, No. 2


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 5. Continued

Sequence Co-
Portal Datasets Access Type Centers Subjects Theme Sequence/Modality registration Notable Features Clinical Metadata

February 2024
National Federal Interagency Academic and Dataset originated
Institutes of Traumatic Brain research purpose from USA
Health Injury Research only TA
(FITBIR)

TCIA COVID-19-NY- Open access to all Single 1384 COVID-19 brain Not reported Not reported Dataset originated Available.
SBU MRIs from USA
Age (18–90 years),
inclusion and
exclusion criteria not
reported.

Figshare Annotated brain Open access to all Multicenter 75 Harboring brain T1w Only one sequence Dataset originated Limited metadata
metastasis images metastasis from Spain and available.
with clinical and (+segmentation are from five
Age not reported,
radiomic data mask) different institutes
Inclusion criteria
brain lesion40

American College ACRIN-DSC-MR- Open access to all Multicenter 123 Recurrent glioblastoma Post treatment FLAIR All sequences are not Dataset originated Available.
of Radiology Brain and T1w co-registered from USA
Age (23–87 years),
Imaging
Inclusion and
network
exclusion41

Advanced Research ATLAS v 2.0 Academic and Single 1271 exams Stroke patient only T1w (+stroke region Only one sequence Data originated from Limited metadata
on Disability– research purpose (number of segmentation mask) USA available. Other
Anatomical only TA subjects NR) details by
Tracings of Liewet al.16
Lesions after
Stroke (data
from 20 cohort)

ISLES 2022 Stroke ISLES 2022 Open access to all Multicenter 250 Stroke patient only FLAIR, DWI, ADC All sequences are not Data originated from Only raw sequence and
lesion Grand map co-registered Germany, mask available. All
Challenge Switzerland other details by
Dataset (data Hernandez
from three Petzsche42
institutes and
segmentation
mask included)

This table represents the 10 datasets focused on brain MRIs of patients with brain tumors, traumatic brain injuries, and COVID-19. There were 75,819 subjects across the datasets. Nine dif-
ferent sequences/modalities were reported.
ADC = apparent diffusion coefficient; ASL = arterial spin labeling; ATLAS = anatomical tracings of lesions after stroke; COVID-19-NY-SBU = Stony Brook University COVID-19 positive
cases; DTI = diffusion tensor imaging; DWI = diffusion-weighted imaging; FLAIR = fluid attenuated inversion recovery; HARDI = high angular resolution diffusion imaging;
ISLES = ischemic stroke lesion segmentation challenge; SWI = susceptibility-weighted imaging; TA = through application; TCIA = the cancer imaging archive; T1w = T1-weighted;
T2w = T2-weighted.
a
The current volume of dataset indicates available brain MRIs as of June 2023. Additionally, it was unclear if this number was individual subjects or scans.

461
Dishner et al.: MRI in AI Research

15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
462
TABLE 6. OpenNeuro Brain Datasets

Sequence Co-
Portal Datasets Access Type Centers Subjects Theme Sequence/Modality registration Notable Features Clinical Metadata
OpenNeuro Queensland Twin Open access Multicenter 1202 Neuroimaging T1w and fMRI Not co-registered Data comes from Available.
Imaging (QTIM) to all dataset of young Queensland,
Age (18–30 years),
adult healthy Australia
inclusion and
twins and siblings
exclusion
criteria43–45
OpenNeuro A longitudinal Open access Single 322 Language processing T1w and fMRI All sequences are Data originated Available.
neuroimaging dataset to all in healthy not co-registered from USA
Age (5–9 years),
on language children
Journal of Magnetic Resonance Imaging

inclusion and
processing in children
exclusion criteria
ages 5, 7, and 9 years
available.
old
OpenNeuro Neurocognitive aging Open access Multicenter 301 Aging of healthy T1w, FLAIR, fMRI Co-registered Data originated Available.
data release with to all subjects from USA and
Age (18–34 years and
behavioral, structural, Canada
60–89 years),
and multi-echo
inclusion and
functional MRI
exclusion criteria46
measures
OpenNeuro Amsterdam Open MRI Open access Single 928 Healthy subjects T1w, DWI, task- Not co-registered The study Available.
Collection to all (assessment of based fMRI, rest- population came
Age (19–26 years),
(AOMIC)–ID1000 vision, emotion, state fMRI from Amsterdam
inclusion and
memory,
exclusion criteria47
cognitive control,
and response
inhibition)
OpenNeuro Narratives Open access Not reported 345 Naturalistic T1w and fMRI Not co-registered Dataset originated Available.
to all language from USA
Age (18–47 years),
comprehension of
inclusion and
healthy subjects
exclusion not
reported
OpenNeuro MPI-Leipzig_Mind- Open access Multicenter 318 Mind, brain, body T1w, T2w, FLAIR, All sequences are Dataset originated Limited metadata
Brain-Body to all connection of SWI, fMRI, DWI not co-registered from Germany available.
healthy subjects
Age (20–35 years and
59–77 years),
inclusion and
exclusion.48

Volume 59, No. 2


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 6. Continued

Sequence Co-
Portal Datasets Access Type Centers Subjects Theme Sequence/Modality registration Notable Features Clinical Metadata

February 2024
OpenNeuro The human Voice Open access Single 217 Auditory areas of T1w and fMRI Not co-registered Dataset originated Limited metadata
Areas: spatial to all the brain healthy from UK available.
organization and subjects
Age (17–31 years),
inter-individual
inclusion and
variability in
exclusion.49
temporal and extra-
temporal cortices
OpenNeuro UCLA Consortium for Open access Single 272 Neuropsychiatric T1w, DWI, fMRI Not co-registered Dataset originated Available.
Neuropsychiatric to all patients (+normal from USA
Age (21–50 years),
Phenomics LA5c controls)
inclusion and
Study
exclusion50
OpenNeuro Lausanne_TOF- Open access Single 284 Aneurysms T1w and MRA Not co-registered Data comes from Available.
MRA_Aneurysm_ to all (+normal the Lausanne
Age (35–67 years),
Cohort controls) University
inclusion and
Hospital in
exclusion criteria51
Switzerland
OpenNeuro Queensland Twin Open access Multicenter 417 Neuroimaging T1w, MP-RAGE, All sequences are Data comes from Available.
Adolescent Brain to all dataset of T2w, FLAIR, not co-registered Queensland,
Age (9–14 years and
(QTAB) adolescent healthy TSE, SWI, Australia
10–16 years),
twins resting-state
inclusion and
fMRI, DWI, ASL
exclusion criteria52
OpenNeuro SUDMEX_CONN: Open access Single 138 Subjects with T1w, DWI, resting No co-registered Data originated Available.
The Mexican dataset to all cocaine use state fMRI from Mexico
Age (22–39 years),
of cocaine use disorder
inclusion and
disorder patients (+normal
exclusion criteria53
controls)

This table represents the 11 largest datasets from OpenNeuro. There were 4744 subjects across these datasets. Ten different sequences/modalities were reported.
ASL = arterial spin labeling; DWI = diffusion weighted imaging; FLAIR = fluid attenuated inversion recovery; fMRI = functional MRI; MPI = Max Planck Institute; MP-
RAGE = magnetization prepared–rapid gradient echo T1w; MRA = MR angiography; SWI = susceptibility-weighted imaging; T1w = T1-weighted; T2w = T2-weighted; TSE = turbo spin
echo; UCLA = University of California Los Angeles.

463
Dishner et al.: MRI in AI Research

15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
464
TABLE 7. Kidney and Liver Datasets

Sequence/ Sequence Co- Clinical


Portal Datasets Access Type Centers Subjects Theme Modality registration Notable Features Metadata
TCIA The Cancer Genome Open access Multicenter 97 Liver T1w, T2w Co-registered Dataset came Available image
Atlas Liver to all hepatocellular from multiple matched
Hepatocellular carcinoma institutes having clinical,
Carcinoma difference in genetic and
Collection MR protocols pathological
(TCGA-LIHC) data from
NIH TCGA
portal.
Journal of Magnetic Resonance Imaging

TCIA The Cancer Genome Open access Multicenter 267 Renal clear cell T1w, in phase All sequences are Dataset came Available image
Atlas Kidney Renal to all carcinoma sequence, T2w not co- from multiple matched
Clear Cell FRFSE, T2w registered institutes having clinical,
Carcinoma SSFSE difference in genetic and
Collection MR protocols pathological
(TCGA-KIRC) data from
NIH TCGA
portal.
TCIA The Clinical Open access Not reported 60 Clear cell renal T1w, in phase All sequences are Dataset is Available image
Proteomic Tumor to all carcinoma sequence, out not co- heterogeneous matched
Analysis phase sequence, registered in terms of clinical and
Consortium Clear T2w FRFSE, acquisition proteomics
Cell Renal Cell T2w SSFSE, protocol and data from
Carcinoma DWI, ADC scanner. NIH OCCPR
Collection portal.
(CPTAC-CCRCC)
TCIA The Cancer Genome Open access Multicenter 15 Kidney cancer T1w, in phase All sequences are Dataset is Available image
Atlas Kidney to all genotypes sequence, out not co- heterogeneous matched
Chromophobe phase sequence, registered in terms of clinical,
Collection T2w FatSat, acquisition genetic and
(TCGA-KICH) T2w SSFSE, protocol and pathological
DWI scanner. data from
NIH TCGA
portal

Volume 59, No. 2


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 7. Continued

Sequence/ Sequence Co- Clinical


Portal Datasets Access Type Centers Subjects Theme Modality registration Notable Features Metadata

February 2024
TCIA The Cancer Genome Open access Multicenter 33 Renal papillary T1w, in phase Co-registered Dataset is Available image
Atlas Cervical to all cell carcinoma sequence, T2w heterogeneous matched
Kidney Renal in terms of clinical,
Papillary Cell acquisition genetic and
Carcinoma protocol and pathological
Collection scanner. data from
(TCGA-KIRP) NIH TCGA
portal
Zenodo Duke Liver Database Restricted Single center 105 Routine liver MRI Axial in-phase, Not reported Dataset originated Limited
Access protocols axial opposed, from USA, and metadata
axial precontrast includes liver available. MR
fat-suppressed segmentation series keys and
T1w, and details of 95 classification
contrast- subjects. keys are
enhanced portal available.
venous T1w

This table represents the 6 datasets focused on kidney and/or liver. The datasets came from patients with cancer. There were 577 subjects across these datasets. Seven different sequences/
modalities were reported in these datasets.
ADC = apparent diffusion coefficient; DWI = diffusion weighted imaging; FatSat = fat saturated; FR-FSE = fast relaxation fast spin echo; OCCPR = office of cancer clinical proteomics
research; SSFSE = single-shot fast spin-echo sequence; TCGA = The Cancer Genome Atlas; TCIA = the cancer imaging archive; T1w = T1-weighted; T2w = T2-weighted.

465
Dishner et al.: MRI in AI Research

15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
466
TABLE 8. Prostate Datasets

Sequence Co-
Portal Datasets Access Type Centers Subjects Theme Sequence/Modality registration Notable Features Clinical Metadata
Initiative for Initiative for Open access to all Not reported 12 Prostate cancer T2w, DCE, DWI, All sequences are Dataset includes Not reported
Collaborative Collaborative MRS not co- ground-truth
Computer Vision Computer registered images of prostate
Benchmarking Vision gland, peripheral
Benchmarking zone, central gland,
and cancer
NY Langone Health fastMRI Academic and Multicenter 312a Abnormal and T2w, DWI Not reported Dataset has labels Restricted access
(Prostate) research normal prostate that indicate the
Age, inclusion and
purpose only MRIs presence and grade
Journal of Magnetic Resonance Imaging

exclusion criteria
TA of prostate cancer
not reported
TCIA Prostate- Open access to all Single 92 Prostate cancer T1w, T2w Co-registered Dataset includes Detailed pathology
Diagnosis segmentation report is available
masks for prostate
gland, peripheral
zone; and other
important
structures
TCIA PROSTATEx Open access to all Single 346 Prostate cancer T2w, PD, DCE, All sequences are Dataset originated Not reported
DWI, ADC not co- from Netherlands
registered and includes
Ktrans maps
TCIA Prostate-3T Open access to all Single 64 Prostate cancer T2w Single sequence Dataset originated Not reported
from Netherlands
and segmentation
mask
TCIA Prostate Fused- Open access to all Multicenter 28 Prostate cancer T1w, T2w, DWI, Co-registered Dataset originated Not reported
MRI-Pathology and DCE from USA,
includes pathology
images and
mapping of extent
of prostate cancer
TCIA Prostate-MRI Open access to all Single 26 Prostate cancer T1w, T2w, DWI Co-registered Dataset originated Not reported
from USA includes
pathology images
TCIA QIN- Restricted access Single 22 Prostate cancer T1w, T2w, DWI, Not co-registered Dataset originated Not reported
PROSTATE and ADC from USA

Volume 59, No. 2


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 8. Continued

Sequence Co-
Portal Datasets Access Type Centers Subjects Theme Sequence/Modality registration Notable Features Clinical Metadata

February 2024
TCIA QIN- Open access to all Single 15 Prostate cancer T1w, T2w, DWI, Not co-registered This dataset has Not reported
PROSTATE- and ADC manual
Repeatability segmentations of
the total prostate
gland, peripheral
zone, suspected
tumor, and normal
regions (where
applicable).
TCIA Prostate-MRI- Open access to all Single 1151 Prostate cancer T2w, DWI, and Co-registered Dataset originated Not reported
US-Biopsy PWI from USA
Grand Challenge Prostate158 Restricted access Single 139 Prostate cancer T2w, DWI, ADC Not reported Dataset originated Restricted access
from Germany and
Age (<50 years),
includes
inclusion and
segmentation mask
exclusion
of central gland,
criteria54
peripheral zone,
and lesions
Grand Challenge PI-CAI Challenge Academic and Multicenter 1500 Prostate cancer T2w, DWI, ADC Not reported Dataset originated Restricted access
research from Netherlands
purpose only and includes
TA segmentation
details
Grand Challenge PROMISE12 Open access to all Multicenter 80 Prostate cancer T2w Only one Dataset originated Not reported
challenge (+prostate sequence from Norway,
volume) USA, Netherlands,
and consists
segmentation
details

This table represents the 13 datasets focused on prostate MRIs. The data was from patients with prostate cancer of various stages and disease progress. There were 3787a reported subjects
across the datasets. There are eight reported MRI sequences/modalities.
a
The dataset included “321 MRI exams” but did not specify if they were different patients.
ADC = apparent diffusion coefficient; DCE = dynamic contrast-enhanced; DWI = diffusion-weighted imaging; TA = through application; MRS = MR spectroscopy; PD = proton density;
PWI = perfusion weighted imaging; QIN = quantitative imaging network; TCIA = the cancer imaging archive; T1w = T1-weighted; T2w = T2-weighted.

467
Dishner et al.: MRI in AI Research

15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
468
TABLE 9. Cardiac Datasets

Sequence/ Sequence Co-


Portal Datasets Access Type Centers Subjects Theme Modality registration Notable Features Clinical Metadata
Cardiac Atlas Stonybrook Open access to Single 45 Heterogeneous cases Cine-cardiac MR Single sequence Dataset originated Limited metadata
Project Cardiac Data all such as healthy, from USA available.
hypertrophy,
Age (23–88 years),
heart failure with
inclusion and
and without
exclusion details
infarction
are not reported,
pathology keys
are available
Journal of Magnetic Resonance Imaging

Cardiac Atlas MESA Academic and Multicenter 2450 Clinical MRA, T2w Not reported Dataset useful for Restricted access.
Project research cardiovascular subclinical
Age (45–84 years),
purpose only disease (+normal cardiovascular
heterogeneous
TA controls) disease and
population such
originated from
as white, African
six institutions
American,
within USA
Hispanic,
Chinese,
inclusion and
exclusion criteria
not reported
Cardiac Atlas DETERMINE Academic and Multicenter 450 Coronary artery SSFP cine- Not reported Dataset originated Restricted access.
Project research diseases and mild- cardiac MR from four
Age, inclusion and
purpose only to-moderate left (breathe hold institutes within
exclusion criteria
TA ventricular 8–15 sec), USA
not reported
dysfunction Sufficient
(+normal short axis and
controls) long axis
cardiac MR
Cardiac Atlas Society of Academic and Single 15 Myocardial Short axis and Not reported Dataset includes Limited metadata
Project Cardiovascular research infarction, heart long axis myocardial available.
MR purpose only failure, Left cardiac MR contours by
Age (42–77 years),
Consensus TA ventricular expert readers,
inclusion and
Contours hypertrophy and deduced
exclusion
(+normal consensus
criteria.55
control) contours from
expert annotation.

Volume 59, No. 2


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 9. Continued

Sequence/ Sequence Co-


Portal Datasets Access Type Centers Subjects Theme Modality registration Notable Features Clinical Metadata

February 2024
Cardiac Atlas MITEA Academic and Single 134 Cardiac disease and Multiplanar cine Single sequence Dataset includes Limited metadata
Project research healthy controls cardiac MR cardiac MR and available.
purpose only 3D segmentation
Age (18–74 years),
TA mask of the left
inclusion and
ventricular
exclusion
myocardium and
criteria.56
cavity.
Cardiac Atlas Congenital Academic and Multicenter 202 Congenital heart SSFP cine- Not reported Dataset originated Available.
Project Heart Disease research disease cardiac MR from USA and
Age (<62 years),
purpose only (breathe hold), New Zealand
inclusion and
TA Sufficient
exclusion
short axis and
criteria.57
long axis
cardiac MR
Laboratory for York University Open access to Single 33 Left ventricles’ Short axis Single sequence Dataset includes Limited metadata
Active and Cardiac MRI all endocardial and cardiac MR ground truth of available.
Attentive Vision Dataset epicardial their left
Age (2–17 years),
(LAAV) segmentations ventricles’
disease keys
endocardial and
provided,
epicardial
inclusion and
segmentations
exclusion criteria
are not available.

This table represents the 7 datasets focused on cardiac MRIs. There were 3329 reported subjects across the datasets.
DETERMINE = defibrillators to reduce risk by MRI evaluation; MESA = multiethnic study of atherosclerosis; MITEA = MR-informed three-dimensional echocardiography analysis;
MRA = MR angiography; SSFP = steady-state free precession; TA = through application; T2w = T2-weighted.

469
Dishner et al.: MRI in AI Research

15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
470
TABLE 10. Breast Datasets

Access Sequence/ Sequence Co- Notable Clinical


Portal Datasets Type Centers Subjects Theme Modality registration Features Metadata
TCIA RIDER Breast Open access Single 5 Primary Breast T1w, DWI (b- Co-registered None Not available
MRI to all cancer values 0 and
800 sec/mm2),
ADC
TCIA ACRIN- Open access Multicenter 984 Breast cancer T1w pre- and Co-registered Dataset Available.
Contralateral- to all post-contrast, originated
Age (>18 years),
Breast-MR T2w from multiple
inclusion and
Journal of Magnetic Resonance Imaging

(ACRIN 6667) institutes


exclusion
within USA
criteria are
available in
protocol
TCIA QIN Breast DCE- Open access Multicenter 10 Breast cancer DCE, T2w, T1w Co-registered Dataset consists Sequence keys
MRI to all non-FatSat pre and post for
chemotherapy chemotherapy
responder and
non-responder
provided, age,
inclusion and
exclusion
criteria are not
reported.
TCIA TCGA-BRCA Open access Multicenter 139 Breast cancer T1w, T2w Co-registered Dataset came Available image
to all from multiple matched
institutes clinical,
having genetic and
difference in pathological
MR protocols data from
NIH TCGA
portal.
TCIA QIN-Breast Open access Single 67 Breast cancer DWI, DCE, Co-registered Dataset consists Sequence keys
to all multiflip T1 pre and post for
map chemotherapy chemotherapy
MRI responder and

Volume 59, No. 2


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 10. Continued

Access Sequence/ Sequence Co- Notable Clinical


Portal Datasets Type Centers Subjects Theme Modality registration Features Metadata

February 2024
non-responder
provided, age,
inclusion and
exclusion
criteria are not
reported.
TCIA BREAST- Open access Not reported 88 Breast cancer, T2w, STIR, All sequences are Dataset includes Detailed clinical
DIAGNOSIS to all benign BLISS (may not co- position of and pathology
(+normal include other registered mass, HER2 report is
controls) sequences as and Oncotype available
well) score
TCIA Duke-Breast- Open access Single 922 Breast cancer DCE, T1w pre- Co-registered Dataset includes Detailed clinical
Cancer-MRI to all and post- FGT and and pathology
contrast breast report is
segmentation, available
radiomic
features, and
annotation
boxes.
TCIA ACRIN 6698/I- Open access Multicenter 385 Breast cancer DWI, ADC Co-registered Dataset includes Age, race,
SPY2 Breast to all maps, DCE, pre and post clinical,
DWI T2w chemotherapy pathology, and
MRI and treatment
manually response are
tumor available
contours
TCIA I-SPY2 Trial Open access Multicenter 719 Breast cancer DCE, T2w Co-registered Dataset Age, race,
to all originated clinical,
from 22 pathology, and
institutes and treatment
includes pre response are
and post available
chemotherapy
MRI.

471
Dishner et al.: MRI in AI Research

15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 10. Continued

472
Access Sequence/ Sequence Co- Notable Clinical
Portal Datasets Type Centers Subjects Theme Modality registration Features Metadata
TCIA ISPY1 (ACRIN Open access Multicenter 222 Breast cancer DCE, T2w Co-registered Dataset includes Age, race,
6657) to all pre and post clinical,
chemotherapy pathology, and
MRI treatment
response, are
available
TCIA Breast-MRI- Open access Single 64 Breast cancer T1w Fat Not co-registered Dataset includes Age, clinical,
Journal of Magnetic Resonance Imaging

NACT-Pilot to all Suppressed, pre and post pathology, and


DCE chemotherapy treatment
MRI response, are
available

This table represents the 11 datasets focused on breast MRIs. This data came from patients with several types of breast cancers. There were 3605 reported subjects across the datasets. Seven
different MRI sequences/modalities were reported.
ACRIN = American College of Radiology Imaging Network, ADC = apparent diffusion coefficient, BLISS = Bilateral breast imaging in the sagittal view with SeNSe, DCE = Dynamic
contrast-enhanced, DWI = diffusion-weighted imaging, FatSat = Fat saturated, I-SPY = Investigation of Serial studies to Predict Your Therapeutic Response with Imaging and Molecular
Analysis, NACT = neoadjuvant chemotherapy, QIN = Quantitative Imaging Network, RIDER = Reference Image Database to Evaluate Therapy Response, STIR = Short Tau Inversion
Recovery, TCGA = The Cancer Genome Atlas, TCIA = The Cancer Imaging Archive, T1w = T1-weighted, and T2w = T2-weighted.

Volume 59, No. 2


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 11. General Body Datasets

Sequence Co-
Portal Datasets Access Type Centers Subjects Theme Sequence/Modality registration Notable Features Clinical Metadata

February 2024
TCIA The Clinical Proteomic Open access Multicenter 88 Sarcomas of the abdomen, T1w, T2w, DWI, Co-registered Dataset is Available image
Tumor Analysis to all arm, bladder, chest, ADC heterogeneous in matched clinical
Consortium Sarcomas head–neck, kidney, leg, terms of and proteomics
Collection (CPTAC-SAR) retroperitoneum, acquisition data.
stomach, and uterus protocol and
scanner
TCIA The Cancer Genome Atlas Open access Multicenter 120a Bladder endothelial T1w, T2w, DWI, Co-registered Dataset is Available image
Urothelial Bladder to all carcinoma ADC heterogeneous in matched clinical,
Carcinoma Collection terms of genetic and
(TCGA-BLCA) acquisition pathological data
protocol and from NIH
scanner TCGA portal.
TCIA Vestibular-Schwannoma- Open access Multicenter 242 Vestibular schwannoma T1w, T2w Co-registered Dataset includes Age (>18 years),
SEG** to all segmentation inclusion and
contours, exclusion criteria
radiotherapy dose are available.
and plan
TCIA The Clinical Proteomic Open access Multicenter 244a Adenocarcinoma of the T1w, T2w, DWI, All sequences are not Dataset is Available image
Tumor Analysis to all lung ADC co-registered heterogeneous in matched clinical,
Consortium Lung terms of modality, genomic and
Adenocarcinoma acquisition proteomics data.
Collection protocol and
(CPTAC-LUAD) scanner
TCIA The Cancer Genome Atlas Open access Multicenter 143a Ovarian serous T1w, T2w, DWI, Co-registered Dataset is Available image
Ovarian Cancer to all cystadenocarcinoma ADC heterogeneous in matched clinical,
Collection (TCGA-OV) terms of modality, genetic and
acquisition pathological data
protocol and from NIH
scanner TCGA portal.
TCIA The Clinical Proteomic Open access Multicenter 168a Ductal adenocarcinoma of T1w, T2w, DWI, Co-registered Dataset is Available image
Tumor Analysis to all the pancreas ADC heterogeneous in matched clinical,
Consortium Pancreatic terms of modality, genomic and
Ductal Adenocarcinoma acquisition proteomics data.
Collection (CPTAC-PDA) protocol and
scanner
TCIA Soft-tissue-Sarcoma Open access Multicenter 51 Soft-tissue sarcoma of the T1w, T2w FatSat, Co-registered Dataset also includes Detail metadata
to all extremities STIR pre-treat FDG- available.
PET/CT and 19

473
Dishner et al.: MRI in AI Research

15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 11. Continued

474
Sequence Co-
Portal Datasets Access Type Centers Subjects Theme Sequence/Modality registration Notable Features Clinical Metadata
subjects has lung Age (16–83 years),
metastasis inclusion and
exclusion
criteria.58

TCIA The Cancer Genome Atlas Open access Single 54 Cervical squamous cell T1w, T2w Co-registered Dataset is Available image
Cervical Squamous Cell to all carcinoma and heterogeneous in matched clinical,
Carcinoma and endocervical terms of genetic and
Endocervical adenocarcinoma acquisition pathological data
Journal of Magnetic Resonance Imaging

Adenocarcinoma protocol and from NIH


Collection (TCGA-CESC) scanner TCGA portal.
TCIA The Cancer Genome Atlas Open access Muticenter 65a Uterine cancer T1w, T2w, PD Not co-registered Dataset is Available image
Uterine Corpus to all heterogeneous in matched clinical,
Endometrial Carcinoma terms of genetic and
Collection acquisition pathological data
(TCGA-UCEC) protocol and from NIH
scanner TCGA portal.
TCIA The Clinical Proteomic Open access Multicenter 259a Uterine cancer T1w, T2w, DWI Not co-registered Dataset is Available image
Tumor Analysis to all heterogeneous in matched clinical,
Consortium Uterine terms of modality, genomic and
Corpus Endometrial acquisition proteomics data.
Carcinoma Collection protocol and
(CPTAC-UCEC) scanner
SIM Scottish Medical Imaging Academic, Multicenter 1,539,189b Lung, Chest (COVID T1w, T2w, Others Not reported Dataset originated Clinical metadata
(SIM) Archive research, positive and negative), sequences available from 14 Scottland (such as
and cancer group and other but not reported centers and electronic
commercial category includes healthcare
purposes annotations records, age, and
TA ground truth)
available upon
request.59

This table represents the 11 datasets focused on general body MRIs (not specific to kidney, liver, prostate, breast, or cardiac). There were 1,540,865 reported subjects across the datasets.
ACRIN = American College of Radiology Imaging Network; ADC = apparent diffusion coefficient; DWI = diffusion-weighted imaging; FatSat = fat saturated; FDG-
PET = fluorodeoxyglucose-positron emission tomography; NSCLC = non-small cell lung cancer; SEG = segmentations; PD = proton-density weighted; STIR = short tau inversion recovery;
TA = through application; T1w = T1-weighted; T2w = T2-weighted; TCIA = the cancer imaging archive; TCGA = The Cancer Genome Atlas.
a
Not all subjects have MR images taken.
b
This dataset represents total number of MRI exams. The details on unique MRI patients are not reported.

Volume 59, No. 2


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 12. Spine Datasets

Sequence Co-
Portal Datasets Access Type Centers Subjects Theme Sequence/Modality registration Notable Features Clinical Metadata

February 2024
SpineWeb Cross modality Academic and Not reported 30 Lumbar region Not reported Not reported None None
spinal images research
for spine purpose only
workshop TA
SpineWeb Intervertebral Academic and Single 8 Lower spine Not reported Not reported In this dataset, each Available TA
disc research (T11-L5) at least intervertebral disc
localization purpose only seven has a reference
and TA intervertebral manual
segmentation discs of the lower segmentation in
multimodality spine the form of
MRI spine binary mask
image database
SpineWeb Intervertebral Academic and Single 15 At least seven 3D T2-weighted Not reported In this dataset, each Available TA
disc research intervertebraone turbo spin echo intervertebral disc
localization purpose only discs (IVDs) of MR has a reference
and TA the lower spine manual
segmentation: (T11-L5) segmentation in
3D the form of
T2-weighted binary mask
Turbo Spin
Echo MR
image database
SpineWeb Multi-modality Academic and Not reported 20 Spine T1w, T2w Not reported This dataset Available TA,
vertebra research includes manually other details by
recognition in purpose only annotated ground Cai et al.60
arbitrary views TA truth images
using 3D
Deformab le
Hierarchical
Model
SpineWeb High anisotropy Academic and Multicenter 17 Spine/lower back T1w, T2w, and Not reported Dataset includes Available TA.
MRIs of the research TIRM manual
Age (21–79 years),
lower back purpose only segmentations
other details by
TA
Zukic et al.61
SpineWeb Intervertebral Academic and Not reported 12 Lower spine Co-registered In this dataset, each Available TA.
Disc (IVD) research intervertebral disc

475
Dishner et al.: MRI in AI Research

15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 12. Continued

476
Sequence Co-
Portal Datasets Access Type Centers Subjects Theme Sequence/Modality registration Notable Features Clinical Metadata
Localization purpose only In-phase, opposed- has a reference
and TA phase, fat and manual
Segmentation water MR images segmentation in
from 3D the form of
Multi- binary mask
Modality MR
(M3) Images
Mendeley Lumbar Spine Open access to Multicenter 515 Lower spine T1w and T2w Not reported None Available
Journal of Magnetic Resonance Imaging

MRI alla radiologist


clinical report.
GTU Vision Lab Lumbar MRI Open access to Not reported 80 Mid-sagittal MRI of 2D views of T1w Not co-registered Each mid-sagittal Not reported
Dataset all spine and T2w view contains five
lumbar vertebrae
and six lumbar
intervertebral
discs.
OSF Lumbar vertebral Open access to Single 34 Lower spine T1w, T2w, T2w Co-registered Dataset includes Limited metadata
body and all FatSat binary masks of available
intervertebral lumbar vertebral
Age (30–88 years),
disc bodies (L1-L5),
inclusion and
segmentation intervertebral
exclusion
dataset discs (L1_2,
criteria.62
L2_3, L3_4, and
L4_5)

This table represents the 9 datasets focused on MRIs of the spine, specifically the intervertebral discs. There were 1246 reported subjects and four reported sequences/modalities.
FatSat = fat saturated; TA = through application; T1w = T1-weighted; T2w = T2-weighted; TIRM = turbo inversion recovery magnitude; TSE = turbo spin echo.
a
Dataset download file is broken.

Volume 59, No. 2


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 13. Knee Datasets

Sequence/ Sequence Co-


Portal Datasets Access Type Centers Subjects Theme Modality registration Notable Features Clinical Metadata

February 2024
NY Langone fastMRI (Knee) Academic and Multicenter 1500 MRI Normal and PD, T2w Not reported None Restricted access
Health research purpose scansa abnormal knee
Age, inclusion and
only TA
exclusion criteria not
reported
Stanford MRNet-Knee Academic and Single 1370 Abnormal knee, T1w, T2w FatSat, Not reported The data includes labels Restricted access
University MRIs research purpose anterior cruciate PD created based on the
Age (38.3  16.9 years),
Center for only TA ligament tears clinical reports
inclusion and
Artificial and meniscal
exclusion criteria not
Intelligence in tears (+normal
reported.
Medicine and control)
Imaging
(AIMI)
Stanford Stanford Knee Academic and Single 155 Abnormal knee k-space data, T2w Not reported The data includes Restricted access
University MRI with research purpose (+normal segmentations of six
Age, inclusion and
Center for Multi-Task only TA control) tissues and bounding
exclusion criteria are
Artificial Evaluation boxes for 16 pathologies
not reported.
Intelligence in (SKM-TEA)
Medicine and
Imaging
(AIMI)
NIMH Data Osteoarthritis Academic and Not reported 4796 Osteoarthritis Restricted access Restricted access None Restricted access
Archive Initiative research purpose
Metadata is available TA
only TA

This table represents the 4 datasets focused on MRIs of the knee. There were 7821a reported subjects and three reported sequences.
PD = proton-density weighted; FatSat = fat saturated; TA = through application; T1w = T1-weighted; T2w = T2-weighted.
a
Dataset lacks the details of total number of subjects.

477
Dishner et al.: MRI in AI Research

15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Magnetic Resonance Imaging

Discussion were created using the data from 1000 Functional Con-
We reviewed a total of 110 datasets with 1,686,245 subjects, nectomes. Thus, there is likely an overlap between the
covering major sub-fields within radiology, including neuro, datasets. Additionally, OpenNeuro is a large portal that
body, musculoskeletal. We covered datasets from large imag- includes close to 600 different datasets, and the list is contin-
ing database portals (such as TCIA, NITRC, and SpineWeb), ually growing. We chose to include only the largest
from Open-Access Journals (such as Scientific Data), and 11 datasets which is not a strictly representative sample.
from Open Data Science Competitions (such as BraTS There is a wealth of information available on OpenNeuro
and ISLES). We note that as expected, data heterogeneity portal, and interested researchers should review the site
and the amount and quality of annotations that are most rele- directly to find datasets that might meet their specific neuro-
vant to training AI algorithms continue to be limited overall. research needs.
In a review format such as this, it is neither possible nor The body category included 48 individual datasets cov-
desirable to be exhaustive; instead, the goal was to review a ering prostate, breast, cardiac, kidney/liver, and general body.
representative range of datasets. The objective was to be Here, we found TCIA to be a robust resource with 49 pub-
inclusive, and thus a range of anatomical regions and patholo- licly available human MRI datasets. Given the amount of
gies were sampled. At times, this limited the coverage on any data, we chose to include only datasets with 50 subjects or
one area of interest, as compared to what a dedicated review more. Perusing the TCIA portal directly might lead interested
on that topic might provide. Some areas of interest researchers to find additional datasets of value to them.
(e.g., neuroimaging) have more datasets available than others The musculoskeletal category was the smallest with only
(e.g., musculoskeletal imaging). Where there was an abun- 13 datasets. There are other areas with musculoskeletal
dance of datasets, the largest datasets were reviewed because radiology—such as shoulder, ankle, elbow, wrist, or hip—that
AI algorithms generally train better with more data. A dedi- were not represented in our review. Interested readers might
cated review of larger portals might have covered more of the find it useful to perform a dedicated search for publicly avail-
datasets in greater detail, but this limited sampling may allow able musculoskeletal MRI databases.
interested readers to identify portals of interest and delve
deeper into other available datasets there. Conclusion
This review used Google, Dataset Search, Scientific Data, While acknowledging the limitations, it is our hope that this
and PubMed search engines to locate the datasets. There are review of the major publicly available datasets highlights the
other alternative methods that a researcher might be able to use types of datasets available for use now. More importantly, we
to further expand on the search. Many journals require or hope it encourages continued efforts by individuals and insti-
encourage authors to make their study data available publicly as tutions to prepare and share high quality datasets openly with
a matter of best practice. Researchers could feasibly find MRI the aim of advancing the field of AI in radiology.
datasets from such open access journal articles. For example,
Nature released a publication by Liew et al. in 2022 describing Funding Information
an open-source MRI database with 955 T1-weighted brain NIH-NCI P30CA008748 (Vickers, PI), ASNR AI Research
MRIs with manually segmented diverse lesions and metadata.16 Grant (Stember, PI), MSKCC Internal Seed Research
Databases can also be obtained from data science competitions Grant (Stember, PI), MSKCC Summer Medical Student
not reviewed here, such as MICCIA and Kaggle. Many of these Research Fellowship Grant, and NIH-NCI R25CA020449
datasets remain open to the public after the competitions have (Wolchok, PI).
ended, and therefore, are a great resource for AI researchers.
Further, it may be feasible to directly reach out to various References
institutions and inquire about open MRI databases and possi- 1. Bi WL, Hosny A, Schabath MB, et al. Artificial intelligence in cancer
ble collaboration. These and potentially other methods of imaging: Clinical challenges and applications. CA Cancer J Clin 2019;
69(2):127-157.
locating open-access datasets exist and could be a topic for
additional review articles in the future. 2. Soffer S, Ben-Cohen A, Shimon O, Amitai MM, Greenspan H, Klang E.
Convolutional neural networks for radiologic images: A radiologist’s
guide. Radiology 2019;290(3):590-606.

3. Lee JH, Hong H, Nam G, Hwang EJ, Park CM. Effect of human-AI inter-
Limitations action on detection of malignant lung nodules on chest radiographs.
The brain category included 49 individual datasets, which Radiology 2023;307(5):e222976.

covered topics from neurodegeneration to developmental/ 4. van Leeuwen KG, de Rooij M, Schalekamp S, van Ginneken B,
psychiatric conditions. One of the largest datasets in this Rutten MJ. How does artificial intelligence in radiology improve
efficiency and health outcomes? Pediatr Radiol 2022;52(11):2087-2093.
section was the 1000 Functional Connectomes Project. Sev-
5. Uzun Ozsahin D, Ikechukwu Emegano D, Uzun B, Ozsahin I. The sys-
eral other datasets—such as ABIDE, ADHD-200, CORR, tematic review of artificial intelligence applications in breast cancer
NKI-Rockland, CMI Healthy Brain Network, and COBRE diagnosis. Diagnostics 2022;13(1):45.

478 Volume 59, No. 2


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Dishner et al.: MRI in AI Research

6. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJ. Artificial 27. Alexander LM, Escalera J, Ai L, et al. An open resource for trans-
intelligence in radiology. Nat Rev Cancer 2018;18(8):500-510. diagnostic research in pediatric mental health and learning disorders.
Scientific Data 2017;4(1):1-26.
7. Goldfarb A, Teodoridis F. Why is AI adoption in health care lagging?
Brookings, Washington DC; 2022. 28. Holmes AJ, Hollinshead MO, O’keefe TM, et al. Brain genomics Super-
struct project initial data release with structural, functional, and behav-
8. Chatterjee AR, Stalcup S, Sharma A, et al. Image sharing in radiology— ioral measures. Scientific Data 2015;2(1):1-16.
A primer. Acad Radiol 2017;24(3):286-294.
29. Poulin P, Theaud G, Rheault F, et al. TractoInferno-A large-scale, open-
9. Willemink MJ, Koszek WA, Hardell C, et al. Preparing medical imaging source, multi-site database for machine learning dMRI tractography.
data for machine learning. Radiology 2020;295(1):4-15. Scientific Data 2022;9(1):725.

10. Bluemke DA, Moy L, Bredella MA, et al. Assessing radiology research 30. Tsuchida A, Laurent A, Crivello F, et al. The MRi-share database: Brain
on artificial intelligence: A brief guide for authors, reviewers, and imaging in a cross-sectional cohort of 1870 university students. Brain
readers—From the radiology editorial board. Radiol Soc N Am 2020; Struct Funct 2021;226(7):2057-2085.
294:487-489.
31. Tian L, Wang J, Yan C, He Y. Hemisphere-and gender-related differ-
11. Celi LA, Cellini J, Charpignon M-L, et al. Sources of bias in artificial ences in small-world brain networks: A resting-state functional MRI
intelligence that perpetuate healthcare disparities—A global review. study. Neuroimage 2011;54(1):191-202.
PLoS Digital Health 2022;1(3):e0000022. 32. Yan C, Gong G, Wang J, et al. Sex-and brain size–related small-world
structural cortical networks in young adults: A DTI tractography study.
12. Sardanelli F, Alì M, Hunink MG, Houssami N, Sconfienza LM, Di Leo G.
Cereb Cortex 2011;21(2):449-458.
To share or not to share? Expected pros and cons of data sharing in
radiological research. Eur Radiol 2018;28:2328-2335. 33. Valdes-Sosa PA, Galan-Garcia L, Bosch-Bayard J, et al. The Cuban
human brain mapping project, a young and middle age population-
13. Geis JR, Brady AP, Wu CC, et al. Ethics of artificial intelligence in radi- based EEG, MRI, and cognition dataset. Scientific Data 2021;8(1):45.
ology: Summary of the joint European and north American multisociety
statement. Radiology 2019;293(2):436-440. 34. Avants BB, Duda JT, Kilroy E, et al. The pediatric template of brain per-
fusion. Scientific Data 2015;2(1):1-17.
14. Prevedello LM, Halabi SS, Shih G, et al. Challenges related to artificial
intelligence research in medical imaging and the importance of image 35. Lyu M, Mei L, Huang S, et al. M4Raw: A multi-contrast, multi-repetition,
analysis competitions. Radiology 2019;1(1):e180031. multi-channel MRI k-space dataset for low-field MRI research. Scientific
Data 2023;10(1):264.
15. Glaser B, Strauss A. Discovery of grounded theory: Strategies for quali-
tative research: New York, Routledge; 2017. 36. Calabrese E, Villanueva-Meyer JE, Rudie JD, et al. The university of
California San Francisco preoperative diffuse glioma mri dataset. Radi-
16. Liew S-L, Lo BP, Donnelly MR, et al. A large, curated, open-source ology. Artif Intell 2022;4(6):e220058.
stroke neuroimaging dataset to improve lesion segmentation algo-
37. Bakas S, Sako C, Akbari H, et al. The University of Pennsylvania glio-
rithms. Sci Data 2022;9(1):320.
blastoma (UPenn-GBM) cohort: Advanced MRI, clinical, genomics, &
17. Ziegler E, Rouillard M, André E, et al. Mapping track density changes radiomics. Scientific Data 2022;9(1):453.
in nigrostriatal and extranigral pathways in Parkinson’s disease. 38. Grøvik E, Yi D, Iv M, Tong E, Rubin D, Zaharchuk G. Deep learning
Neuroimage 2014;99:498-508. enables automatic detection and segmentation of brain metastases on
18. Marcus DS, Wang TH, Parker J, Csernansky JG, Morris JC, Buckner RL. multisequence MRI. J Magn Reson Imaging 2020;51(1):175-182.
Open access series of imaging studies (OASIS): Cross-sectional MRI 39. Baid U, Ghodasara S, Mohan S, et al. The rsna-asnr-miccai brats 2021
data in young, middle aged, nondemented, and demented older benchmark on brain tumor segmentation and radiogenomic classifica-
adults. J Cogn Neurosci 2007;19(9):1498-1507. tion. arXiv 2021;2107.02314.
19. Marcus DS, Fotenos AF, Csernansky JG, Morris JC, Buckner RL. Open 40. Ocaña-Tienda B, Pérez-Beteta J, Villanueva-García JD, et al. A compre-
access series of imaging studies: Longitudinal MRI data in nonde- hensive dataset of annotated brain metastasis MR images with clinical
mented and demented older adults. J Cogn Neurosci 2010;22(12): and radiomic data. Scientific Data 2023;10(1):208.
2677-2684.
41. Boxerman JL, Zhang Z, Safriel Y, et al. Early post-bevacizumab progres-
20. LaMontagne PJ, Benzinger TL, Morris JC, et al. OASIS-3: Longitudinal sion on contrast-enhanced MRI as a prognostic marker for overall sur-
neuroimaging, clinical, and cognitive dataset for normal aging and vival in recurrent glioblastoma: Results from the ACRIN 6677/RTOG
Alzheimer disease. medRxiv 2012. 0625 central reader study. Neuro Oncol 2013;15(7):945-954.

21. Koenig LN, Day GS, Salter A, et al. Select atrophied regions in 42. Hernandez Petzsche MR, de la Rosa E, Hanning U, et al. ISLES 2022: A
Alzheimer disease (SARA): An improved volumetric model for identify- multi-center magnetic resonance imaging stroke lesion segmentation
ing Alzheimer disease dementia. Neuroimage Clin 2020;26:102248. dataset. Scientific Data 2022;9(1):762.

43. Strike LT, Hansell NK, Couvy-Duchesne B, et al. Genetic complexity of


22. Frazier JA, Hodge SM, Breeze JL, et al. Diagnostic and sex effects on
cortical structure: Differences in genetic and environmental factors
limbic volumes in early-onset bipolar disorder and schizophrenia.
influencing cortical surface area and thickness. Cereb Cortex 2019;
Schizophr Bull 2008;34(1):37-46.
29(3):952-962.
23. Di Martino A, Yan C-G, Li Q, et al. The autism brain imaging data
44. Blokland GA, McMahon KL, Thompson PM, Martin NG, de
exchange: Towards a large-scale evaluation of the intrinsic brain archi-
Zubicaray GI, Wright MJ. Heritability of working memory brain activa-
tecture in autism. Mol Psychiatry 2014;19(6):659-667.
tion. J Neurosci 2011;31(30):10882-10890.
24. Di Martino A, O’connor D, Chen B, et al. Enhancing studies of the 45. Sinclair B, Hansell NK, Blokland GA, et al. Heritability of the network
connectome in autism using the autism brain imaging data exchange architecture of intrinsic brain functional connectivity. Neuroimage
II. Scientific Data 2017;4(1):1-15. 2015;121:243-252.
25. Kucharsky Hiess R, Alter R, Sojoudi S, Ardekani B, Kuzniecky R, 46. Spreng RN, Setton R, Alter U, et al. Neurocognitive aging data release
Pardoe H. Corpus callosum area and brain volume in autism spectrum with behavioral, structural and multi-echo functional MRI measures. Sci-
disorder: Quantitative analysis of structural MRI from the ABIDE data- entific Data 2022;9(1):119.
base. J Autism Dev Disord 2015;45:3107-3114.
47. Snoek L, van der Miesen MM, Beemsterboer T, Van Der Leij A,
26. Bellec P, Chu C, Chouinard-Decorte F, Benhajali Y, Margulies DS, Eigenhuis A, Steven SH. The Amsterdam open MRI collection, a set of
Craddock RC. The neuro bureau ADHD-200 preprocessed repository. multimodal MRI datasets for individual difference analyses. Scientific
Neuroimage 2017;144:275-286. Data 2021;8(1):85.

February 2024 479


15222586, 2024, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29101 by Test, Wiley Online Library on [29/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Magnetic Resonance Imaging

48. Babayan A, Erbey M, Kumral D, et al. A mind-brain-body dataset of 56. Zhao D, Ferdian E, Maso Talou GD, et al. MITEA: A dataset for
MRI, EEG, cognition, emotion, and peripheral physiology in young and machine learning segmentation of the left ventricle in 3D echocardiog-
old adults. Scientific Data 2019;6(1):1-21. raphy using subject-specific labels from cardiac magnetic resonance
imaging. Front Cardiovasc Med 2023;9:1016703.
49. Pernet CR, McAleer P, Latinus M, et al. The human voice areas: Spatial
organization and inter-individual variability in temporal and extra- 57. Govil S, Mauger C, Hegde S, et al. Biventricular shape modes discrimi-
temporal cortices. Neuroimage 2015;119:164-174. nate pulmonary valve replacement in tetralogy of Fallot better than
50. Poldrack RA, Congdon E, Triplett W, et al. A phenome-wide examina- imaging indices. Sci Rep 2023;13(1):2335.
tion of neural and cognitive function. Scientific Data 2016;3(1):1-12.
58. Vallières M, Freeman CR, Skamene SR, El Naqa I. A radiomics model
51. Di Noto T, Marie G, Tourbier S, et al. Towards automated brain aneu- from joint FDG-PET and MRI texture features for the prediction of lung
rysm detection in TOF-MRA: Open data, weak labels, and anatomical metastases in soft-tissue sarcomas of the extremities. Phys Med Biol
knowledge. Neuroinformatics 2023;21(1):21-34. 2015;60(14):5471-5496.

52. Strike LT, Hansell NK, Chuang K-H, et al. The Queensland twin adoles- 59. Baxter R, Nind T, Sutherland J, et al. The Scottish medical imaging
cent brain project, a longitudinal study of adolescent brain develop- archive: 57.3 million radiology studies linked to their medical records.
ment. Scientific Data 2023;10(1):195. Radiol Artif Intell 2023;e220266.
53. Angeles-Valdez D, Rasgado-Toledo J, Issa-Garcia V, et al. The Mexican
60. Cai Y, Osman S, Sharma M, Landis M, Li S. Multi-modality vertebra rec-
magnetic resonance imaging dataset of patients with cocaine use dis-
ognition in arbitrary views using 3D deformable hierarchical model.
order: SUDMEX CONN. Scientific Data 2022;9(1):133.
IEEE Trans Med Imaging 2015;34(8):1676-1693.
54. Adams LC, Makowski MR, Engel G, et al. Prostate158-an expert-
annotated 3T MRI dataset and algorithm for prostate cancer detection. 61. Zukic D, Vlas
ak A, Egger J, Hořínek D, Nimsky C, Kolb A. Robust
Comput Biol Med 2022;148:105817. detection and segmentation for diagnosis of vertebral diseases using
routine MR images. Comput Graph Forum 2014;33:190-204.
55. Suinesiaputra A, Bluemke DA, Cowan BR, et al. Quantification of LV
function and mass by cardiovascular magnetic resonance: Multi-center 62. Khalil YA, Becherucci EA, Kirschke JS, et al. Multi-scanner and multi-
variability and consensus contours. J Cardiovasc Magn Reson 2015;17: modal lumbar vertebral body and intervertebral disc segmentation
1-8. database. Scientific Data 2022;9(1):97.

480 Volume 59, No. 2

View publication stats

You might also like