The astounding success made by artificial intelligence in healthcare and other fields proves that it can achieve human-like performance. However, success always comes with challenges. Deep learning algorithms are data dependent and require large datasets for training. Many junior researchers face a lack of data for a variety of reasons. Medical image acquisition, annotation, and analysis are costly, and their usage is constrained by ethical restrictions. They also require several other resources, such as professional equipment and expertise. That makes it difficult for novice and non-medical researchers to have access to medical data. Thus, as comprehensively as possible, this article provides a collection of medical image datasets with their associated challenges for deep learning research. We have collected the information of approximately 300 datasets and challenges mainly reported between 2007 and 2020 and categorized them into four categories: head and neck, chest and abdomen, pathology and blood, and others. The purpose of our work is to provide a list, as up-to-date and complete as possible, that can be used as a reference to easily find the datasets for medical image analysis and the information related to these datasets.

1 Introduction

Medical imaging technology has marked a new era in medicine since its inception [30]. The development of medical imaging began with the introduction of x-rays and has since evolved to include various other imaging techniques [72, 119], such as 3D Computed Tomography (CT) [1], Magnetic Resonance Imaging (MRI) [3, 6], nuclear medicine [2], ultrasound, endoscopy [4], and Optical Coherence Tomography (OCT) [169]. These medical image modalities, directly or indirectly, have contributed to the diagnosis and treatment of various diseases [232] and to the research on the human body’s structure and intrinsic mechanisms [217].

Medical images can provide critical insight into the diagnosis and treatment of many diseases. The human body’s different reactions to imaging modalities are used to produce scans of the body. Reflection and transmission are commonly used in medical imaging because the reflection or transmission ratio of different body tissues and substances are different. Some other methods acquire images by changing the energy transferred to the body, such as magnetic field changes or the rays radiated from a chemical agent.

Before modern Artificial Intelligence (AI) was applied in medical image analysis, radiologists and pathologists needed to manually look for the critical “biomarkers” in the patient’s scans. These biomarkers, such as tumors and nodules, are the basis for the medics to diagnose and devise treatment plans. Such a diagnostic process needs to be performed by medics with extensive medical knowledge and clinical experience. However, problems such as diagnostic bias and the lack of medical resources are prevalent and cannot be avoided. With the recent advancements in AI, which have achieved human-like performance in image recognition [33, 85, 164, 165, 166, 191, 198] and even won games like Go [190] and real-time strategy games [215], there has been a growing interest in developing AI-based automatic medical image analysis algorithms. As a result, AI has become a major research focus in medical image analysis and has achieved significant success in recent years [65, 98, 149, 163, 168, 172, 176].

Many researchers brought their focus to AI-based medical image analysis methods believing that it might be one of the solutions to the challenges (e.g., medical resource scarcity) and taking advantage of the technological progress [106, 137, 143, 189, 216]. Traditional medical image analysis focuses on detecting and classifying biomarkers for diagnosis and treatment. AI imitates the medic’s diagnosis through classification, segmentation, detection, regression, and other AI tasks in an automated or semi-automated way.

AI has achieved a significant performance for many computer vision tasks. This success is yet to be translated to the medical image analysis domain. Deep Learning (DL), a branch of AI, is a data-dependent method because it needs massive training data. However, when DL is applied to medical image analysis, the paucity of labeled data becomes a major challenge and a bottleneck.

Data scarcity is a common problem when applying DL methods to a specific domain, and this problem becomes more severe in the case of medical image analysis. Researchers who apply DL methods to medical image analysis research do not usually have a medical background, commonly computer scientists. They cannot collect data independently because of the lack of access to medical equipment and patients. They cannot also annotate the acquired data either because they lack the relevant medical knowledge. Furthermore, medical data is owned by institutions, which cannot easily make it public due to privacy and ethics restrictions.

To address some of these problems, MICCAI, ISBI, AAPM, and other conferences and institutions have launched many DL-related medical image analysis challenges. These aim to design and develop automatic or semi-automatic algorithms and promote medical image analysis research with computer-aided methods. Concurrently, some researchers and institutions also organize projects to collect medical datasets and publish them for research purposes.

Despite all these works, it is still challenging for novices of medical image analysis to find medical data. Therefore, we present this comprehensive survey of medical datasets and relevant challenges with the aim to help researchers easily find the required datasets for their research.

This article refers to other research papers with a number between square brackets and refers to the datasets listed in the tables with numbers between parentheses.

The article is organized as follows. Section 2 summarizes the datasets and challenges at a macro level and focuses on information relevant to medical data such as the years, body parts, and tasks. Sections 3 through 5 discuss the datasets and challenges of the head and neck, the chest and abdomen organs, and the pathology and blood. For other datasets, such as bone, skin, phantom, and animals, respectively, we place them into supplementary materials, because of the space limitation. We have also created a website with a git repo,¹ which shows the list of these datasets and their respective challenges.

2 Medical Image Datasets

In this section, we provide an overview of the datasets and challenges. Our collection contains more than 300 medical image datasets and challenges organized between 2004 and 2020. This article focuses mainly on the ones between 2013 and 2020. Sections 2.1 through 2.4 provide information about the year, body parts, modalities, and tasks, respectively. In Section 2.5, we introduce the sources from where we have collected these datasets and the challenges. Details about the categorization of these datasets and challenges into four groups are provided in subsequent sections. We provide a taxonomy of our work in Figure 1 to help the reader navigate through the different sections.

Fig. 1.

2.1 Dataset Timeline

The timeline of these medical image datasets can be split into two, starting from 2013 as the watershed, since the excellent success of AlexNet [116] in the ILSVRC competition from 2012. The success of AlexNet motivated more researchers to solve a variety of problems [60, 66, 115, 132, 139] with different DL-based methods [84, 85, 91, 93, 174, 191]. In addition, it motivated many researchers to create and publish medical datasets. Consequently, many researchers focused on analyzing medical images with DL, which is one of the driving factors and reasons for the release of medical datasets. Another reason for us to mainly focus on datasets released after 2012 is that some of the datasets developed before 2012 are not aimed at computer-aided diagnosis with DL-based methods, such as Alzheimer’s Disease Neuroimaging Initiative (ADNI) 1 (55), although those data could be used for DL.

Figure 2(A) shows the statistics of the datasets and challenges per year between 2013 and 2020. As shown in Figure 2(A), the number of related datasets and challenges increased year by year because of the progress and success of DL in computer vision and medical image analysis. That led more and more researchers to focus on medical image analysis with DL-based methods, and more and more datasets and challenges with different body parts and tasks started to appear.

Fig. 2.

Figure 2(B) shows the number of datasets related to the brain, heart, lung, liver, and eye. As can be noted, the proportion of the brain-related datasets is initially large. Then, the number of datasets related to these body parts (i.e., heart, lung, liver, and eye) is increased, but the proportion of the brain-related datasets starts to decrease.

Figure 2(C) shows that the number of datasets related to segmentation, classification, detection and other tasks is increased. The research mainly focused on the segmentation (28, 29, 54) task in the early stages. It then diversified to a variety of tasks or combinations of tasks that are also essential for clinical needs, including classification (154, 197, 259), segmentation (1, 2), detection (199, 232), regression (213, 214), generation (14, 19), tracking (220, 245), and registration (24, 25, 166), as time progressed (62).

Figure 2(D) shows no visible changes in the proportion of different modalities over the years, but the number of datasets related to different modalities increased.

2.2 Body Parts

As Figure 2(E) shows, the top-5 focused organs, in these datasets and challenges, include the brain, lung, heart, eye, and liver.

One of reasons datasets and challenges focus on these organs is that the tasks of datasets, such as a classification and segmentation, are helpful in the diagnosis and treatment of cancer (a major threat to human life) or other diseases. For example, eye-related diseases, which cause blindness, incited the collection of eye-related datasets and the release of challenges. Some other datasets and challenges focus on the small organs, such as the prostate, which are challenging to analyze due to the low resolution of images.

As AI has been shown to be more accurate than humans to tackle some complex tasks [79, 86, 188], researchers have attempted to solve the imaging problem of an organ with the AI methods successfully tested on other other organs. That has led the researchers to focus on more and more organs.

2.3 Modalities

There are several types of medical image modalities. As shown in Figure 2(F), the frequently used modalities to acquire medical datasets include MRI, CT, ultrasound, endoscopy, Positron Emission Tomography (PET), Computed Radiography (CR), electrocardiography, and OCT. Limited by space, we introduce the main modalities in the supplementary materials.

Magnetic Resonance (MR), CT, and other modalities are the most commonly used imaging modalities. MR can provide sharp images without harmful radiations of soft tissues. It is therefore widely used in the imaging of brain, heart, and many other small organs. CT is an economical and simple imaging approach, and it is widely used for the diagnosis of cancer (e. g., of the the neck, chest, and abdomen). A pathology image is different from MR and CT, because it is a cell-level imaging method. Pathology is widely used in cancer-related diagnosis.

2.4 Tasks

As Figure 2(G) shows, we categorize these tasks into seven categories: classification, segmentation, detection, regression, generation, registration, and tracking. Limited by space, we introduce these tasks in the supplementary materials.

2.5 Source and Destination

To conduct a systematic search for relevant datasets and challenges, we first reviewed papers that addressed medical image segmentation and classification. This enabled us to identify several sources for datasets, including Grand Challenge, a website that hosts medical imaging related competitions, and The Cancer Imaging Archive(TCIA) [56], a website for cancer-related competitions. Additionally, we utilized Google to search for other relevant websites and collections that featured datasets and challenges related to medical image analysis with DL. Through this process, we discovered Kaggle, Codalab, OpenNeuro, PhysioNet [77], and scientific journals such as Scientific Data, which were inspired by Grand Challenge and TCIA.

Next, we crawled through these websites to gather information about the datasets/competitions featured on these sites, carefully selecting datasets and challenges based on the available information.

The article lists datasets and challenges primarily sourced from Grand Challenge and Codalab, which are predominantly related to medical imaging. These were directly included in our work. However, the datasets and challenges from TCIA, Kaggle, PhysioNet, OpenNeuro, and journals were not all related to medical imaging or DL. Therefore, we filtered these datasets and challenges using specific keywords such as “machine learning,” “deep learning,” “segmentation,” and “classification.” Some datasets were removed because they were not suitable for DL and AI methods, due to the lack of annotations or other factors. It is worth noting that not all medical image analysis tasks require annotations. For example, the generation task, such as generating synthetic CT images from MR ones [44], does not require annotations. Furthermore, for senior researchers, data in TCIA, OpenNeuro, UK Biobank, and other databases can be useful after manual annotations.

We then categorized the remaining datasets and challenges into different groups. Systematically categorizing the datasets and challenges is challenging. Thus, we used an asymmetric categorization to group these datasets and challenges into four groups, as shown in Figure 1. This means that we did not use the same subtaxonomy in each category or subcategory.

First, we simply split the medical datasets and challenges into two groups: body level (Sections 3 and 4) and cell level (Section 5), according to the imaged body part. Second, we grouped the datasets and challenges of the brain, eye, and neck into one group (Section 3), because these are parts of the head. Third, we organized the datasets and challenges related to the chest and abdomen into the same group (Section 4). Finally, for the datasets and challenges that could not be categorized into the preceding groups, we grouped them under “other” and placed them in the supplementary materials due to the limitation of space. These datasets and challenges are related to the skin, bone, phantom, and animals.

The introduction of each group and subgroup includes mainly the type of modality, the task, the disease, and the body part. However, not all groups of datasets can be introduced in that way. For some groups, we introduce the datasets and challenges according to the domain-specific problems. For example, we categorize the pathology datasets into microscopic and macroscopic tasks.

2.6 Notes to the Tables

We have compiled tables in this article to present the datasets and challenges. These tables contain basic information such as the index, dataset/challenge name, release year, and other relevant details. Some information is not available in the table due to either non-provision or unavailability during our data collection process.

For all tables, the information includes the following:

Reference index	The index to refer to the dataset or challenge in our work, and it is numbered in the order of appearing in the table.
Dataset/Challenge	The dataset or challenge’s name or abbreviation of the name. The name in the table is the hyperlink to the website or home page of the dataset or challenge.
Year	The year that the dataset or challenge was released. For datasets, it is the year the paper went public or their website was noted. For challenges, it is the year that their “name” was declared, but for some challenges, of which the conferences will be hosted after 2020, we note 2020.
Modalities	The modalities of data. For abbreviations, we note the terms in the footnote of the table.
Focus	Some words or a sentence to introduce the focus of datasets and challenges.
Tasks	The tasks of the dataset or challenge designed or its focus. For abbreviations, we note the terms in the footnote of the table.
Lesion/Tumor	The lesion or tumors that the dataset or challenge focused on. For abbreviations, we note the terms in the footnote of the table.
Diseases	The diseases that the dataset or challenge focused on. For abbreviations, we note the terms in the footnote of the table.
Organs	The organs that the dataset or challenge focused on. For abbreviations, we note the terms in the footnote of the table.
Category	The category of classification or segmentation.
Stain	The stain that is used in pathological images.
Resolution	The resolution of data in the pixel or the voxel. For some datasets and challenges, we note the size of it in millimeters.
Scale	The number of data in the dataset or challenge. We might note the division of the dataset or challenge if the authors of it mentioned such information.
Multi-center	Whether the dataset or challenge includes multi-center data. We will not use the checkmark ($\checkmark$) and crossmark ($\times$) if any related information is left in the description of the dataset or challenge.
Rank	Whether the challenge includes a rank to compare the participants’ algorithms.
Conf.	The name of the conference related to the challenge.
License	The license of the dataset. We note the abbreviation in the footnote of the tables.

3 Head and Neck Related Datasets and Challenges

The head and neck are ones of the most significant parts of the human body because many essential organs, glands, and tissues are located there. Several researchers’ image analysis work relates to the head and neck. To make effective use of computers for research, diagnosis, and treatment, many researchers have released datasets and challenges, for example: (i) the analysis of tissue structure and functions (1 –3, 5) and (ii) disease diagnosis (30, 40, 44).

Because the brain controls human emotions and actions/functions of other organs, the datasets and challenges significantly focus on the brain. First, we introduce the datasets and challenges related to the analysis of brain structure, function, imaging, and other basic tasks in Section 3.1. Second, we introduce the datasets and challenges related to brain disease diagnosis in Section 3.2.

Moreover, since the eyes are crucial to our vision, the computer-aided diagnosis of eye-related diseases is also an important research focus. The eye-related datasets and challenges are covered in Section 3.3. Finally, we introduce the remaining datasets and challenges related to the neck and the research of brain behavior and cognition in Section 3.4.

3.1 Structural Analysis Tasks of the Brain

The basic analysis and processing of the brain medical images are clinically critical for diagnosis, treatment, and other brain-related analysis tasks. The datasets and challenges we discuss are mainly for the segmentation tasks and center around the brain structure. In contrast, some datasets focus on imaging, including MRI acceleration (e.g., (14, 16)), the non-linear registration of different modalities (e.g., (24, 25)), and tractography (e.g., (28, 29)).

Table 1 lists the datasets and challenges related to basic brain image analysis, such as the segmentation of White Matter (WM) and Gray Matter (GM) and registration of images. One of the most popular tasks in Table 1 is the segmentation of WM, GM, and Cerebrospinal Fluid (CSF), and their respective datasets and challenges are introduced in Section 3.1.1. Meanwhile, other tissues and functional areas’ segmentation are also the focus of research, and their related datasets and challenges are discussed in Section 3.1.2. Section 3.1.3 describes the other basic tasks. Table 1 presents an overview of the datasets and challenges related to medical image analysis of the head and neck. The majority of these datasets provide MR images, whereas some offer CT, PET, or ultrasound data. Approximately half of the datasets and challenges in Table 1 include multi-center data, and some do not specify whether they contain multi-center data. Most of the challenges listed in Table 1 are affiliated with conferences such as MICCAI, ISBI, and MIDL, whereas others do not have an affiliation. All of the challenges shown in Table 1 provide a ranking of participants’ methods. Some of the challenges require registration to access the data and submit methods, whereas others are public under license.

Table 1.

3.1.1 Segmentation of WM and GM.

The segmentation of WM, GM, CSF, and other parts of brain tissues is always a focus for researchers working on brain-related analysis and diagnosis [96, 150]. Similarly, for AI algorithms, it is also of significance to understand the human brain’s structure. Therefore, MICCAI and others have held many challenges (1–8) with this research focus, and researchers could design automatic algorithms to segment MR images into different parts. We introduce these datasets and challenges related to their modalities and tasks in the following.

Modality. Since these datasets and challenges focus on the segmentation of brain tissues, the data usually provided is MR images. Challenges (1–6) provide mainly two modalities: T1 (T1-weighted MRI, based on longitudinal relaxation time [8], providing structural information about tissues and commonly used for anatomical imaging), T2 (T2-weighted MRI, based on transversal relaxation time [8], highlighting fluid and helping to identify edema or swelling), whereas datasets (7, 8) only provide T1 for the WM hyperintensities segmentation task. Note that MR scans are sensitive to the hydrogen atom, and such a feature can effectively help image analysts distinguish between different tissues and parts of the image. Moreover, due to the color of the tissue imaged by MR, these scans are named as “white matter” and “gray matter.”

Task. The main focus of these datasets and challenges is the segmentation of WM, GM, and CSF. However, they do not only focus on that. Challenges (3–7) also provide the annotation of other parts of the brain, including basal ganglia, WM lesions, cerebellum, and infarction. Challenges (1, 2, 4, 6) use MR images of the neonatal brain and consider tissue volumes as an indicator of long-term neurodevelopmental performance [96]. Challenges (1, 3) increased regions to segment and provided different data compared with the previous challenges (2, 5). Challenge (8) focuses on the segmentation of WM hyperintensities in the brain.

Performance Metric. For the segmentation task, the Dice score is one of the most commonly used metrics, and all of these datasets and challenges adopt it as a performance measure. Besides the Dice score, datasets (3, 5, 8) also use Hausdorff distance and volumetric similarity as metrics; datasets (1, 2) use the average the Hausdorff distance and the average surface distance as one of their metrics; and moreover, dataset (8) also uses sensitivity and F1-score as metrics for performance evaluation.

3.1.2 Segmentation of Functional Areas and Other Tissues.

The segmentation of functional areas and tissues also has an essential meaning for brain-related research and computer-aided diagnosis. In this subsection, we introduce the datasets and challenges related to the segmentation of functional areas and tissues.

Tissues Segmentation. WM, GM, and CSF were introduced in Section 3.1.1; however, the segmentation of other brain tissues is also an active research area. Challenges (3, 5–7) aim to segment brain images into different tissues, including ventricles, cerebellum, brainstem, and basal ganglia. These challenges provide MR images and the voxel-level annotations of the regions of interest with 30 or 40 scans. Because these regions are essential for brain health, researchers need to overcome the challenges related to their size and shape to segment them. Dataset (9) is designed for the segmentation of the cerebellum from Diffusion-Weighted Imaging (DWI) (a.k.a. dMRI), which is a type of MRI that utilizes the movement of water molecules to generate images [7] and measures the random movement of water molecules within tissues and is particularly sensitive to acute strokes. However, dataset (10) focuses on the segmentation of the caudate from brain MR images.

Functional Areas. The segmentation of the human brain cortex into different functional areas is of great significance in education, clinical research, treatment, and other applications. Datasets (11, 12) provide images and annotations for the design of automatic algorithms to segment the brain cortex into different functional areas—in other words, brain atlas. Brain atlas is a map to annotate different functional areas or tissues. Dataset (11) uses the DTK protocol [111], which is modified from the DK protocol [62], and the DTK protocol includes 31 labels, details of which are listed at https://mindboggle.readthedocs.io/en/latest/labels.html. Dataset (12) is a commercial dataset for research in the segmentation of functional areas of the brain cortex.

Dataset (13) focuses on the segmentation of the hippocampus, which can be used to detect the progress of Alzheimer’s Disease (AD), according to challenge (13). It provides MR images from AD patients as the training set and the test set.

3.1.3 Imaging-Related Tasks.

In addition to the segmentation tasks of the brain tissues and the functional areas, some of the datasets and challenges also focus on the generation, registration, and tractography.

Generation. Datasets and challenges (14–16) aim to accelerate MRI or generate high-resolution MR images from low-resolution ones. Usually, high-resolution imaging requires higher cost, whereas low-resolution imaging is cheaper but affects analytical judgment and may lead to an incorrect diagnosis. These challenges provide many scans at low resolution to allow researchers to design algorithms to convert or map low-resolution images onto higher-resolution ones. The datasets and challenges mainly focus on the generation tasks. Another focus is cranioplasty (17–19) to generate a part of broken skull from CT images of the models of the broken skull. Additionally, challenges (20, 21) focus on another type of generation task, the reconstruction of diffusion MR images. Compared with challenge (21), challenge (20) focused on the effect of the local reconstruction accuracy on the quality of connectivity reconstruction. Challenge (22) focuses on the reconstruction of WM to answer the questions such as “What model (either biophysical or signal model) better explains the underlying tissue environment?” and “What dMRI sequence enables the most accurate estimation of these parameters?” Dataset (23) focuses on face reconstruction from PET and CT images, which aims to augmented reality in surgery or other tasks.

Registration. Registration is a research area that focuses on aligning images from different modalities. Challenges (24, 25) are centered on the registration of ultrasound data and MR images of the brain. Cross-modality registration can be challenging because the subject is not completely stationary, and ultrasound is a 2D imaging modality while MR is a 3D volume imaging modality. Therefore, these challenges aim to establish the topological relationship between the pre-operative MR image and intra-operative ultrasound. Challenge (24) provides registration tasks for pre- and post-resection, whereas challenge (25) does not. Challenge (26) focuses on the registration of diffusion MR images to eliminate variations caused by different vendors’ hardware devices and protocols.

Tractography. Tractography is another segmentation task and focuses on the segmentation and imaging of the fiber in the WM. Dataset (27) aims to segment the fiber bundles from brain images, including phantom, squirrel monkey, and macaque, whereas challenges (28, 29) focus on the tractography with Diffusion Tensor Imaging (DTI), which is a kind of DWI) that provides additional information about the direction of water movement within tissues, allowing for the assessment of WM tracts in the brain [7]. The details of the challenges’ tasks are different. Challenge (28) focused on the segmentation of the corticospinal tract from a set of neurosurgical cases, and challenge (29) focused on the segmentation and the analysis based on the segmentation.

3.2 Brain Disease Related Datasets and Challenges

Besides the structural analysis and image processing tasks, computer-aided diagnosis is also a research focus in healthcare. Medical image analysis plays a critical role in clinical research, diagnosis, and treatment. The datasets and challenges we have included are for two tasks: (i) the segmentation of lesions and (ii) the classification of diseases. For the segmentation task, the respective datasets and challenges focus on the lesion segmentation of the human brain and mark the lesion’s contour for diagnosis and treatment, and the relevant details are shown in Section 3.2.1. For classification tasks, the datasets and challenges have been used for the development of automatic algorithms to classify or predict diseases from medical images, and these datasets and challenges are presented in Section 3.2.2.

3.2.1 Datasets for Segmentation of Lesions.

Lesions in the brain affect a human’s healthy life and safety, and image analysis is an effective way to diagnose relevant diseases. In this subsection, related datasets and challenges are introduced, and they are reported in Table 2.

Table 2.

The tasks of these datasets mainly include the segmentation of brain tumors (e. g., glioma), the segmentation of cerebral aneurysm, the segmentation of stroke lesion, the segmentation of Intra-Cranial Hemorrhage (ICH), and sclerosis lesion. Most of the datasets and challenges use MR images that include different submodalities, whereas some are using CT. Nearly half of these datasets and challenges listed in Table 2 are reported including multi-center data, whereas a few of them are reported as not included. The rest of them do not report whether multi-center data are used. The challenges in Table 2 provide rankings of the participants’ methods, and most of them are associated with MICCAI. Most of the challenges in Table 2 are required to register when accessing data, whereas other datasets and challenges are opened under specific licenses.

Glioma Datasets and Challenges. Gliomas are one of the most common and deadly brain malignancies [12]. Therefore, many challenges and datasets focus on the segmentation of glioma for its diagnosis and treatment. The BraTS challenge series (30–38) has been going on since 2012 to segment the glioma. The challenges of such a segmentation task are caused by the heterogeneous appearance and shape of gliomas. The heterogeneity of the glioma reflects its shape, modalities, and many different histological subregions, such as the peritumoral edema; the necrotic core; and the enhancing and the non-enhancing tumor core. Therefore, these series of challenges provide multi-modal MR scans to help researchers design and train algorithms to segment tumors and their subregions. The tasks of this challenge series include (i) low- and high-grade glioma segmentation (37, 38), (ii) survival prediction from pre-operative images (32, 33), and (iii) the quantification of segmentation uncertainty (30, 31). Besides the BraTS challenge series, dataset (40) is another one for the segmentation of low-grade glioma and provides T1-weight and T2-weight MR images with biopsy-proven gene status of each subject by FISH (fluorescence in-situ hybridization) [181]. Dataset (39) focuses on the processing of brain tumor and aims to design and evaluate DL-based automatic algorithms for glioblastoma segmentation and further research.

Cerebral Aneurysm Challenges. Cerebral aneurysm is life threatening because the rupture of the cerebral aneurysm is associated with high mortality and cognitive impairment in case of survival. Therefore, challenges (41–43) focus on the detection of the cerebral aneurysm for the strategy to prevent rupture. These three challenges respectively focus on detection, segmentation, and rupture risk assessment of the cerebral aneurysm.

Ischemic Stroke Lesion Datasets and Challenges. Similar to tumor segmentation, brain lesion segmentation also focuses on detecting brain abnormalities. However, the difference is that lesion segmentation deals with damaged tissues. Challenges (44–48) focus on stroke lesion segmentation because stroke is also life threatening and can disable the surviving patients. Stroke is often associated with high socioeconomic costs and disabilities. Automatic analysis algorithms help diagnose and treat stroke, since its manifestation is triggered by local thrombosis, hemodynamic factors, or embolic causes. In MR images, the infarct core can be identified with diffusion MR images, whereas the penumbra (which can be treated) can be characterized by perfusion MR images. The challenge ISLES 2015 (47) focuses on subacute ischemic stroke lesion segmentation and acute stroke outcome/penumbra estimation and provides 50 and 60 multi-modality MR scans of data for training and validation, respectively, for two subtasks (i. e., subacute ischemic stroke lesion segmentation and acute stroke outcome/penumbra estimation). The subsequent year’s challenge, ISLES 2016 (46), focuses on the segmentation of lesions and the prediction of the degree of disability. This challenge provides about 70 scans, including clinical parameters and MR modalities, such as DWI, ADC, and perfusion maps. The challenge ISLES 2017 (45) focuses on the segmentation with acute MR images, and ISLES 2018 (44), focuses on the segmentation task based on acute CT perfusion data. Moreover, dataset (48) i.e., ATLAS) focuses on the segmentation of the stroke lesion and the brain itself after stroke for further treatments.

ICH-Related Datasets. ICH is a sudden bleeding into the brain tissues or ventricles or both of them, which can be caused by traumatic brain injuries or other reasons [92]. ICH could lead to disability or even death if it is not treated on time. Normal diagnosis and treatment is based on the analysis of brain CT to localize the regions of ICH. Thus, dataset and challenge (49, 50) focus on the detection and segmentation of ICH to help medics locate the hemorrhage regions and decide on a treatment plan. Dataset (51) also provides data for the classification of normal or hemorrhage CT images.

Multiple Sclerosis Lesion Related Datasets. The multiple sclerosis lesion is another kind of lesion in the brain that is not life threatening but can cause disabilities. Datasets and challenges (52–54) are related to multiple sclerosis lesion segmentation with multi-modality MR data, such as T1w, T2w, and FLAIR (fluid-attenuated inversion recovery, a kind of MR image, “removing” signal from the CSF [5]).

3.2.2 Classification of Brain Disease.

Except for the lesion segmentation, brain disease classification also plays an essential role in healthcare. Brain-related diseases have a severe effect on patients’ health and their lives (e. g., AD [25, 67, 105, 199] and Parkinson’s disease (PD)). Therefore, effective diagnosis and early intervention can effectively reduce the health damage to patients, the effect on the social times of families, and the economical impact on society. In this section, we first introduce the datasets and challenges of AD (55,–59, 64), then introduce other diseases (65–68). Table 3 shows the relevant challenges and datasets.

Table 3.

The datasets listed in Table 3 mainly use MRI and PT. For AD, PET is commonly used with specific contrast agents to find landmarks associated with AD. For other diseases, MRI or specific modalities are used to better detect diseases. AD has a more detailed division of cognitive impairment, whereas datasets of other diseases are binary classifications. The numbers of datasets and challenges that included or did not include multi-center data are nearly 50-50, whereas there is a dataset that did not report whether it was used or not. The challenges listed in Table 3 are required to register when accessing, whereas other datasets are opened or public under specific licenses.

Alzheimer’s Disease. AD affects a person’s behavior, cognition, memory, and daily life activities. Such progressive neurodegenerative disorder affects the normal daily life of patients because suffering from such a disease makes patients not know who they are and what they should do, which then progresses to the point until they forget everything they know. The disease takes an unbearable toll on the patient and leads to a high cost to their loved ones and to society. For example, according to Sudha et al. [9], AD became the sixth deadly cause in the United States in 2018 and costs around $200 to $300 billion.

Therefore, researchers are doing everything they can to explore the causes of AD and its treatments. Diagnosis based on medical images has become a research focus because early diagnosis and intervention have significance on the progress of this disease. Hence, many researchers work on the classification (i.e., prediction of AD using brain images). The datasets mainly include ADNI and Open Access Series of Imaging Studies (OASIS).

ADNI is a series of projects that aim to develop clinical, imaging, genetic, and biochemical biomarkers for the early detection and tracking of AD. It includes four stages: ADNI-1 (55), ADNI-GO (56), ADNI-2 (57), and ADNI-3 (58). These projects provide image data of the brain for researchers, and the modalities of images include MR (T1 and T2) and PET (FDG, PIB, Florbetapir, and AV-1451). These four stages consists of 1,400 subjects. The subjects can be categorized into normal cognition, Mild Cognitive Impairment (MCI), and AD, where MCI can be split into early mild cognitive impairment and later mild cognitive impairment.

OASIS is a series of projects aiming to provide neuroimaging data of the brain, which researchers can freely access. OASIS released three datasets: OASIS-1, OASIS-2, and OASIS-3. All of these three datasets are related to AD, but they are also used in functional areas segmentation and other tasks. OASIS-1 (59) contains 418 subjects aged from 18 to 96 years, and for subjects older than 60 years, there are 100 subjects diagnosed with AD. The dataset includes 434 MR sessions. OASIS-2 (60) contains 150 subjects, aged between 60 and 96 years, and each subject includes three or four MR sessions (T1). About 72 subjects were diagnosed as normal, whereas 51 subjects were diagnosed with AD. In addition, there are 14 subjects who were diagnosed as normal but were characterized as AD at a later visit. OASIS-3 (61) includes more than 1,000 subjects, more than 2,000 MR sessions (T1w, T2w, FLAIR, etc.), and more than 1,500 PET sessions (PIB, AV45, and FDG). The dataset includes 609 normal subjects and 489 AD subjects.

Moreover, there are many other challenges based on ADNI and OASIS or independence datasets. Challenge TADPOLE (62) is based on ADNI and aims at the prediction of the longitudinal evolution. Dataset (63) is based on OASIS and is released on Kaggle for the classification of AD. Challenge CADDementia (64) is an independent AD-related challenge to classify subjects into normal cognition, MCI, and AD.

Other Diseases. Similar to AD, other brain diseases are also important from the diagnosis and treatment perspective. However, the number of datasets and challenges of these diseases is not as large as AD. A few datasets focus on PD and Spinocerebellar Ataxia Type 2 (SCA2). Datasets ANT (65) and (66) provide images of PD with MR images and classification labels. Dataset (67) provides images and classification labels of SCA2. Dataset MTOP (68) provides images and annotations for the diagnosis of mild traumatic brain injury.

3.3 Eye-Related Datasets and Challenges

As the human’s imaging sensor, the eyes’ health is essential for human beings, and eye diseases may lead to blindness. We introduce the relevant challenges and datasets in this subsection and list them in Table 4. Many of the studies presented the size of their dataset, with approximately half of them separating it into training and testing sets. Table 4 includes a list of challenges that offered rankings for participants’ methods, with many of them being associated with MICCAI and ISBI. It is worth noting that access to these challenges often requires registration, whereas other datasets may be accessible with specific licenses.

Table 4.

Datasets according to the Modality. With regard to eye-related datasets and challenges, the main used modalities are fundus photography (69–72, 74–79, 84, 86) and OCT (73, 80, 81, 83, 85). The data provided by challenge (82) consists of surgery videos. Fundus photography can help medics evaluate health of the eye and locate retinal lesions because fundus photography clearly shows the important parts of the eye, such as the blood vessels and the optic disc. OCT is a new imaging approach that is safe for the eye and shows the retinal tissues in detail. However, it has also disadvantages—it is not suitable for diagnosing microangioma and the planning of retinal laser for photocoagulation treatment.

Datasets according to the Analysis Task. These datasets and challenges can be used for four tasks:

(1)

Classification tasks focus on classifying whether the subject has specific diseases or judging whether the subject is abnormal. Datasets and challenges (70–73, 76–78, 84) focus on predicting a single disease, whereas others (69, 75, 79–81) focus on diagnosing multiple diseases. Challenge (71) is an upgrade of challenge (72) that provided more data for participants.

(2)

Segmentation is another task, which provides more information compared to classification. Datasets and challenges focus on the segmentation of the tissues and lesions, such as disc (71, 72, 76, 79), Cup (71, 72), retinal lesion (76, 79), blood vessels (74), and retinal (83) and DME lesions (85), for further diagnosis and disease analysis.

(3)

Datasets and challenges (72, 73, 76, 78, 79, 86) focus on the detection of lesions or other landmarks. These tasks help medics locate key targets, such as areas and tissues, for effective diagnosis or provide feature details for other automated algorithms.

(4)

Unlike other tasks, the last one focuses on the annotation of the tools used for eye-related surgery (82).

Datasets according to Focused Eye Diseases. Researchers mainly focus on the following diseases:

•

Glaucoma (72, 75)

•

Closure glaucoma (73)

•

(Age-related) macular degeneration (75, 78, 80)

•

Cataract (75)

•

Diabetes retinopathy (75, 77, 79, 80, 81, 84, 86)

•

Hypertension (75)

•

Pathologic myopia (75, 76)

•

Diabetic macular edema (79, 85)

Besides these diseases, dataset (82) aims at the annotation of images and videos in surgery.

3.4 Datasets and Challenges of Other Subjects

Besides the brain’s structural analysis, the image processing, and the computer-aided diagnosis tasks, another important research focus is the human neck because it holds many essential glands and organs. This subsection discusses the datasets and challenges of the neck and teeth, covered in Sections 3.4.1 and 3.4.2, respectively. Moreover, many researchers are working on the analysis of behavior and cognition with DL-based methods. We discuss the details in Section 3.4.3. Table 5 includes datasets and challenges that cover various modalities, including CT, PET, MR, and ultrasound, with a focus on segmentation. Most of the challenges listed in Table 5 have rankings for submitted data and require registration for data access, whereas others have specific licenses. Some of these challenges are affiliated with MICCAI, whereas others are associated with AAPM, and one challenge is jointly hosted by ISBI and MICCAI.

Table 5.

3.4.1 Neck-Related Datasets.

The neck is essential for our health, as it holds many glands and organs. When these become abnormal, effective diagnosis and segmentation play an essential role in their treatments. The related image datasets and challenges are listed in Table 5.

Datasets and challenges (87–92) focus on the segmentation of glands and the lesions in relevant glands. Dataset (93) focuses on the binary classification tumor vs. normal. Challenge (94) aims at the task of thyroid gland nodules detection with ultrasound images and videos. Challenge (95) focuses on the nerves segmentation in the neck, whereas challenge (96) focuses on evaluating carotid bifurcation.

3.4.2 Cephalometric and Teeth Related Datasets.

Challenges (97, 98) focus on the diagnosis of dental x-ray images. The main tasks of these two challenges include landmark localization and caries segmentation. Challenge (97) provides around 400 cephalometric x-ray images with the annotation of landmarks by two experienced dentists. Challenge (98) provides about 120 bitewing images with experts’ annotations of different parts of the teeth.

3.4.3 Behavior and Cognition Datasets.

To understand what we see, hear, smell, and feel, our brain draws on neurons in the brain to compute and analyze the stimulations and understand what, where, why, and when questions and scenarios. Many researchers now use artificial neural networks as a research method to analyze the relationship between brain activities and stimulation. They use functional MR images to scan our brain activity, analyze the hemodynamic feedback, and identify the area of the neurons that react. Therefore, the analysis of the reactions of the brain in response to a specific stimulation is an important research focus. Researchers use DL to detect or decode the stimulation of subjects to work out the brain’s functionality. The related datasets are listed in Table 6. These datasets are all opened with specific licenses, and all of them are single-center data, which means that the publisher collected the data themselves. Due to the difficulty of collecting data, the scale of these datasets is not too large (less than 50).

Table 6.

Some datasets (99–101) focus on classifying the stimulations or the subject’s attribution based on the subject’s functional MR images. Dataset (99) aims to identify whether the subject is a beginner or an expert in programming via the reaction of their brain to source codes. Dataset (100) focuses on diagnosing subjects with depression vs. subjects with no depression using audio stimulations and analyzing the subjects’ brain activity. Dataset (101) focuses on the influence of cannabis on the brain.

Datasets (102–109) focus on the encoding of the stimulations (i.e., brain activities’ decoding). Datasets (105, 107, 108) aim to rebuild what subjects have seen using DL-based methods from their brain activities using functional MR images. In addition, datasets (103, 109) work on the encoding of faces that subjects have seen from functional MR images with similar modalities.

3.5 Summary

Within this section, we have provided an overview of datasets and obstacles concerning the head and neck region. To present this information, we created six tables detailing each entry within the prior subsections. Table 1 specifically pertained to datasets and challenges centered around brain segmentation, registration, and generation. Table 2 listed datasets and challenges that mainly focused on the segment of brain images, and additionally, there were many other datasets and challenges focused on the tasks, such as generation and registration. The datasets and challenges, listed in Table 3, mainly focused on the classification of brain disease. Table 4 listed the datasets and challenges related to the classification, segmentation, detection, and other tasks associated with the diagnosis and surgery of eye disease. Table 5 included datasets and challenges about diseases of the head and neck, such as the segmentation of the glands or lesions in glands and cephalometric analysis. The datasets listed in Table 6 were related to behavior and perception.

4 Chest and Abdomen Related Datasets and Challenges

There are many vital organs in the chest and abdomen. For example, the heart is responsible for the blood supply, the lungs are responsible for breathing, and the kidneys are responsible for the production of urine to eliminate toxins from the body. Therefore, medical image analysis of organs in the chest and abdomen is an important research focus. Most of the tasks are computer-aided diagnosis with classification, detection, and segmentation of lesions being the most targeted tasks.

Many datasets and challenges aim to segment one or more organs in the chest and abdomen for diagnosis or treatment planning. Section 4.1 discusses the datasets and challenges related to inter- or intra-organ segmentation. Section 4.2 introduces the datasets and challenges that focus on the classification, detection, and segmentation tasks related to the diagnosis of diseases of the chest and abdominal organs. Section 4.3 describes the datasets and challenges of the chest and abdomen that are not categorized in other sections, including regression, tracking, registration, and other tasks related to the chest and abdominal organs.

4.1 Datasets for Chest and Abdominal Organ Segmentation

This subsection covers the datasets and challenges of the chest and abdominal organs that are used for anatomic segmentation tasks. The anatomic segmentation tasks include inter-organ segmentation (Section 4.1.1) and intra-organ segmentation (Section 4.1.2). Inter-organ segmentation aims to separate an organ from the background or mark the boundaries between multiple organs and the background. Intra-organ segmentation aims to segment the organ into different parts at the anatomical level.

Table 7 showcases the datasets and challenges that are dedicated to the segmentation of organs in the chest and abdominal regions. The majority of these datasets and challenges use CT and MR modalities. Information such as the number of images and division of training/testing sets were reported for most of these datasets and challenges. The challenges listed in Table 7 generally provide rankings for submitted methods, and many of them are affiliated with MICCAI, whereas a few are associated with ISBI or AAPM. Access to most of these challenges requires registration, whereas datasets and the remaining challenges are accessible under specific licenses.

Table 7.

4.1.1 Datasets of the Chest and Abdominal Organs.

Inter-organ segmentation is a necessary information for the pre-planning of surgery and diagnosis. A well-segmented contour of the organs provides a precise mask, which helps produce accurate segmentation results for the diagnosis, treatment, and operation. This subsection introduces datasets and challenges for inter-organ segmentation of a single organ and of multiple organs.

Chest and Abdomen Datasets according to the Organ. Inter-organ segmentation is one of the most attractive topics of medical image analysis [97, 103]. It is a basic pre-processing step in diagnosis and treatment, as it provides contour of the organs. Some datasets and challenges focus on the larger organs, such as the liver and lungs, whereas others focus on multi-organ segmentation or smaller organs. This is because of challenges, such as imbalanced labels and blurred contour in low resolution, brought by multi-organs and smaller organs. The datasets and challenges that we have covered involve the following organs and parts:

•

Lung (110–117)

•

Liver (110, 111, 115, 116, 118–121)

•

Kidney (110, 115, 116, 118, 120, 122)

•

Prostate (111, 123–127)

•

Esophagus (113, 114, 120, 128)

•

Heart (111, 113, 128, 129)

•

Pancreas (111, 115, 120, 130)

•

Aorta (115, 120, 128)

•

Spleen (115, 118, 120)

•

Adrenal glands (115, 120)

•

Bladder (110, 115)

•

Gallbladder (115, 120)

•

Spinal cord (113, 114)

•

Trachea (115, 128)

•

Colon (111)

•

Breast (131)

•

Lymph (132)

•

Clavicles (133)

•

Stomach (120)

Chest and Abdomen Datasets according to Modality. The most commonly used image modalities for the chest and abdominal organs segmentation are MR and CT. As Table 1 shows, many datasets and challenges use MR images. MR images have higher resolution under certain conditions and have better resolution for soft body tissues and organs, such as the heart and prostate. Meanwhile, CT is the most widely used modality for organ segmentation and other tasks and diagnosis related to the chest and abdomen, such as the lung and liver, according to our research, because of its convenience, effectiveness, and low cost.

Chest and Abdomen Datasets according to Focus. The purpose of these datasets and challenges can be categorized into three groups: further analysis, benchmark, and radiotherapy. Most datasets and challenges that provide annotated organs’ contours are provided with the objective to focus on further analysis and treatments. One of the challenges of segmentation is to achieve a robust segmentation of the whole organ and separate it from the background, without omitting the lesions and tumors, and thus some test benchmarks (111, 115) are provided for researchers to evaluate their algorithms. Another challenge is the imbalance between different organs because of their sizes and shapes, and such an imbalance makes it challenging to segment small organs and provide valuable information for analysis and treatment. Dataset and challenges (113, 128) provide data for researchers to address the problem.

Single Chest and Abdominal Organ Segmentation. Single-organ contour segmentation tasks usually focus on segmentation with an anatomical purpose (112, 117, 123–125, 127) or the research of a region for subsequent tasks (119, 121, 122, 126, 130–132). The difficulty of the former task is that the lesions and tumors may affect the segmentation by separating the organ from the background, whereas the latter’s difficulty is for researchers and their algorithms to perform more precise segmentation. Challenge (133) focused on the segmentation of the clavicles.

Chest and Abdomen Multi-Organ Segmentation. Chest and abdomen multi-organ segmentation focuses distinguishing inter-organs. Some of these datasets and challenges (110, 111, 118) focus on the segmentation of multiple organs, including the relatively larger organs, which are easier to segment, and the relatively smaller organs, which can be more challenging to segment compared to the larger ones, especially when the model is handling larger and smaller organs at the same time. Similarly, some of these datasets and challenges (113, 128) focus on the “organ at risk,” which means that these organs are healthy but might be at risk because of radiation therapy. Dataset (115) aims to provide a benchmark for the segmentation algorithms. Dataset (120) focuses on multi-atlas-based methods, which are widely used in brain-related research.

4.1.2 Chest and Abdomen Intra-Organ Segmentation.

Different from inter-organ segmentation of the chest and abdominal organs, intra-organ segmentation aims to segment the organ into different parts. Just as the hand has five fingers, organs are made up of multiple parts, and a typical example is the Couinaud liver segmentation method [58]. This subsection introduces the datasets and challenges for organ segmentation. These datasets and challenges are listed in Table 7.

Heart-Related Datasets and Challenges. Most of these datasets and challenges (137–143) are related to heart segmentation. The most frequently used modalities are MR and ultrasound, and the aim is to segment the heart into the left atrium, chambers, valves, and other parts. Although MR and ultrasound can effectively image the different tissues of the heart, the heartbeat results in blurred images, which makes the segmentation task more difficult, whereas for ultrasound, the dynamic nature of its images is another challenge for the segmentation algorithm.

Other Chest and Abdomen Body Parts. Challenge (144) provides 55 CT scans and focuses on the segmentation of the lung with the labeling of its different parts: outside the lungs, the left lung, the upper lobe of the left lung, the lower lobe of the left lung, the upper lobe of the right lung, the middle lobe of the right lung, and the lower lobe of the right lung. The biggest challenge is the effect of the lung lesions and diseases, such as tuberculosis and pulmonary emphysema, on the performance of segmentation. Moreover, challenges (145, 146) focus on segmentation of the lung vessels.

4.2 Datasets for Diagnosis of Chest and Abdominal Diseases

Diseases of organs in the chest and abdomen have a significant impact on human health. Therefore, many researchers work on this problem by analyzing medical images. Several researchers have designed automatic or semi-automatic algorithms for classification, segmentation, detection, and characterization tasks to help medics diagnose these diseases. In this subsection, we describe the datasets and challenges related to diagnosis of diseases of the chest and abdomen, which are reported in Tables 8 through 10, respectively.

Table 8.

Table 9.

Table 10.

The majority of datasets and challenges listed in Tables 8 through 10 provided information on the scale of the data, including whether they included multi-center data and whether they had rankings. Many of the challenges were associated with various conferences, such as MICCAI, ISBI, RSNA, AAPM, and ICIAR. Most of these challenges required registration to access the data, whereas other datasets and challenges were available under specific licenses. Additionally, most of them reported the division of the training/testing set.

Chest and Abdomen Datasets according to Modality. According to the datasets and challenges collected, CT is the most commonly used imaging modality for the chest and abdomen because of its suitable imaging quality and ability to clearly display tissues and lesions. Some datasets and challenges also provide CT images using contrast agents for clearer images. Besides CT imaging, there are other modalities, including MR, x-ray digital radiographs, PET, and endoscopy, among others. The MR images are used in breast-related diagnosis, cardiac-related tasks, soft tissue sarcoma detection, and ventilation imaging. Because of the organs’ size and the CT’s resolution, which is limited by the imaging exposure time and radiation dose, MR is a more suitable imaging modality for small or specific organs. PET is always used with other modalities, such as CT and MRI. The contrast agent’s density is related to the metabolism, which means that the density of radiation from the contrast agent will be high in the tumor, so PET is always used for tumor-related tasks. Endoscopy images are used for medical inspection of the stomach, intestines, and others.

Chest and Abdomen Datasets according to Classification of Diseases. The classification of diseases intends to determine whether a subject is healthy or not. A quick and early diagnosis can allow effective interventions to increase the probability of the patient recovering before the condition worsens.

The main focus of these datasets is to judge whether there is any cancer, lesion, or tumor, such as soft tissue sarcoma (147), prostate lesion (148, 149), lung cancer (150–153), and breast cancer (154).

Another focus is the classification of diseases. These diseases include mainly pneumothorax (155), cardiac diseases (156), tuberculosis (157), pneumonia (158), and COVID-19, which are discussed at the end of this subsection.

The endoscopy-related challenges (159–161) provide data with the aim to classify RGB images and videos into “normal” vs. “abnormal” categories. The abnormal category in (160) includes VA (mucosa showing villous atrophy), CH (mucosa showing crypt hypertophy), and VACH (mucosa showing both villous atrophy and crypt hypertrophy); the abnormal category in (161) includes GMP (gastric metaplasia), BAR (intestinal metaplasia), and NPL (neoplastic mucosa).

Dataset (162) focuses on classification based on diagnostic records. These datasets and challenges provide data for researchers to design AI-based algorithms to diagnose common diseases.

Chest and Abdomen Datasets for Attribute Classification. The characterization task of the tumor and lesion is also called attribute classification, which focuses on the subsequent characterization analysis of tumors and lesions following detection and segmentation tasks using automatic analysis algorithms. A typical example is the attributes classification of pulmonary and lung cancer (164–168). The datasets and challenges usually provide CT scans with the annotation of different attributes, such as lesion type, spiculation, lesion localization, margin, lobulation, calcification, and cavity. Each attribute includes two or more categories. Another focus is the characterization of breast-related lesions and tumors (169, 170).

Chest and Abdomen Datasets for Detection. In most research and clinical situations, classification is not enough. The medics and researchers usually focus on lesions related to such a disease and their localization. Further treatment evaluations and a plan and interpretability are the specific focus for medics and DL researchers. Thus, detection and segmentation are the tasks receiving a lot of attention today. The detection task aims to find a region of interest and localize its position. The regions of interest usually include the following:

•

Lung cancer and tumor (167, 171–174)

•

Pulmonary nodule (164, 168, 174–176)

•

Celiac-related damage (177–180)

•

Action and artifact of surgeon (180, 181)

•

Other lung lesions (182, 183)

•

Polyp (179, 184)

•

Breast cancer (169, 185, 186)

•

Cervical cancer (187)

•

Liver cancer (188)

Chest and Abdomen Datasets for Segmentation. Segmentation is a refinement of the detection task because it provides information about the location and the pixel-level labels. Pixel-level annotations can help researchers design pixel-level algorithms for accurate and effective quantification, volume calculations, and other analysis and diagnosis of tumors and lesions at the pixel level (e. g., monitoring of tumor size). According to the datasets and challenges we have collected, most of them aim at the segmentation of the tumor and lesion from CT of the following:

•

Heart-associated conditions (189)

•

Kidney tumor (122)

•

Liver tumor (119)

•

Lung cancer (190–192)

•

Pneumothorax (155)

•

Polyp (179)

•

Pulmonary nodule (193)

Additionally, challenge (194) centers around the segmentation and ventilation-related tasks of the lung. Meanwhile, challenges (178, 180) concentrate on the segmentation of artifacts (e.g., polyps) in endoscopic images. The discrepancies between the two challenges are the specific tasks and corresponding data. Challenge (195) encompasses a range of intricate tasks associated with medical images and offers various modalities.

COVID-19. In 2020, COVID-19 became a research focus because it caused more than 100 million infections and 2 million deaths. Different datasets and challenges focus on this devastating disease and provide data to help researchers develop DL models to detect COVID-19 via various medical image modalities.

With respect to modalities, most of these datasets and challenges use either CT or CR images, and some provide both modalities. One exception is dataset (196), which uses ultrasound images. These datasets provide image annotations labeled by radiologists.

Most of these datasets and challenges are related to classification tasks. Datasets (197–201) directly focus on diagnosing COVID-19 from normal subjects. In contrast, datasets and challenges (196, 202–205) focus on diagnosing COVID-19 from a few other similar diseases, which can also lead to lung opacity or other symptoms, such as Middle East respiratory syndrome, severe acute respiratory syndrome, and acute respiratory distress syndrome. Moreover, other datasets and challenges (206, 207) focus on the diagnosis task, with natural language processing or genomics.

Similarly, some other datasets (199, 208, 209) focus on the segmentation or detection of COVID-19-related lesions, such as ground-glass opacity, air-containing space, and pleural effusion.

Additionally, dataset (210) offers images and clinical data to support medical professionals and researchers in their diagnostic efforts and algorithm development.

4.3 Datasets for Other Chest and Abdomen Related Tasks

Besides the classification, detection, and segmentation tasks, there are several other tasks that are the current focus of research. In the following, we present the datasets and challenges related to these tasks and report them in Table 11.

Table 11.

The datasets and challenges presented in Table 11 primarily consist of MR, CT, and ultrasound data, with some providing information on the size of the dataset. Challenges in this table include rankings of participant methods, with many associated with MICCAI. Although most challenges require registration to access data, some datasets and challenges are openly available.

Chest and Abdomen Datasets for Regression. Similar to attributes classification, regression is another task that aims to compute or measure the target attributes from given images, but the difference is that the outputs of regression are continuous. A typical example is fetal biometric measurements (211, 212). These challenges provide ultrasound images to help researchers design algorithms to measure such attributes to estimate the gestational age and monitor the fetus’s growth. Another example is cardiac measurements (213–219). Of these, challenge (214) provided data more clinical data compared with its previous challenge (215). These datasets and challenges provide MR or ultrasound images to analyze the heart’s attributes to detect heart diseases.

Chest and Abdomen Datasets for Tracking. Tracking is a crucial task because organs and the body tend to move during imaging. The motion characteristics of organs such as the heart provide valuable information. Challenges (220, 221) focus on tracking the liver using ultrasound data to analyze surgery and treatments. The tasks of challenges (220, 221) are similar, but challenge (220) provided more data compared to the previous year. Datasets and challenges (222–224) focus on the tracking of the heart. They provide ultrasound images to track and analyze the heart.

Chest and Abdomen Datasets for Registration. Challenge (225) focuses on the CT registration of lungs and provides CT scans with and without enhancement and contrast agents. Meanwhile, challenges (226, 227) focus on the registration between different modalities of the heart and provide MR, CT, and other modalities to register images with beating hearts. Challenge (228) focuses on the registration of lungs.

Datasets for Other Chest and Abdomen Related Tasks. Challenge (229) focuses on the detection of amniotic fluid. Challenges (230, 232) focus on localizing specific landmarks, including the amniotic fluid and the heart, using ultrasound and MR images. Challenge (233) focuses on the classification of surgery videos. Dataset (234) focuses on reconstruction of the coronary artery.

4.4 Summary

In this section, we provided a summary of the datasets and challenges related to the chest and abdomen. We organized them into five tables and described them in detail. Table 7 included datasets and challenges that concentrate on the segmentation of organs in the chest and abdomen, with some focused on single- or multiple-organ segmentation, whereas others focused on intra-organ segmentation. Tables 8, 9, and 10 presented datasets and challenges associated with diagnostic tasks such as disease classification, tumor segmentation, and lesion detection. Table 11 listed other datasets and challenges that focus on tasks like regression, tracking, registration, and generation.

5 Datasets and Challenges for Pathology and Blood

Although radiography, MRI, and other imaging modalities have been used as the basis for diagnosis, pathology images are also used as a gold standard for diagnosis, particularly for tumors and lesions. Digital pathology images are generally obtained by collecting tissue samples, making slices, staining, and imaging. Therefore, pathology images are also one of the mainstream image modalities used for diagnosis.

The focus of these datasets and challenges include (i) the classification and segmentation of basic elements (e. g., cell and nucleus) in pathology images and (ii) blood-based diagnosis from images. In this section, we present datasets and challenges of pathology images (Section 5.1), and we cover the datasets and challenges of blood images in Section 5.2.

5.1 Datasets and Challenges for Pathology

Pathology images are used as one of the bases for the cancer diagnosis [29]. Pathologists and automatic algorithms analyze images based on specific features, such as cancer cells and cells under mitosis. Many organizations and researchers provide datasets and challenges that focus on microscopic pathology and at the Whole Slide Image (WSI) level. The relevant datasets and challenges are listed in Table 12.

Table 12.

Table 12 contains datasets and challenges that primarily use H&E as the staining method. Most of the listed datasets and challenges report the size of the data and whether it includes multi-center data. The division of the training/testing set is also reported for most of them. Multi-center datasets and challenges are more prevalent than single-modality ones. The challenges provide rankings for submitted methods, and they are mostly associated with MICCAI, ISBI, or other conferences. Some of the challenges require registration to access the data, whereas others are open with specific licenses.

Imaging Datasets and Challenges. In most situations, WSI is used in pathology diagnosis. Unlike CT or MR images, the pathology image is an optical image similar to a picture taken by a camera. However, one major difference is that a pathology image is imaged by transillumination, whereas the usual photo is imaged by reflection. Another difficulty is in the size of the image. WSI is stored in a multi-resolution pyramid structure. A single multi-resolution WSI is generally achieved by capturing many small high-resolution image patches, and it might contain up to billions of pixels. Thus, WSI is used as a virtual microscope in diagnosis for clinical research, and many challenges use WSI, such as (235–240).

However, for some machine learning based algorithms, it is hard to directly process and analyze the whole WSI—for example, cell segmentation (241, 242). In some situations, the original image is too large to be analyzed directly by algorithms because of limited computing resources. In other situations, “end-to-end” DL-based algorithms are hard to use for both low- and high-resolution-level features without zoom-in and zoom-out action and intelligent agents. Therefore, pathology image patches are used in several other challenges, such as (243) for visual question answering, (244) for mitosis classification, and (241, 242) for multi-organ nucleus detection and segmentation.

Stains. Slides made from human tissues are without color and require staining. The commonly used stains include hematoxylin, eosin, and diaminobenzidine. Usually, two or more stains are used in staining the slide, and the most commonly used combinations include Hematoxylin & Eosin (H&E) and hematoxylin & diaminobenzidine (H-DAB), as listed in Table 12.

Pathology Datasets according to Disease. Pathology slides are widely used in the diagnosis of many diseases, especially cancer. Cancer cells and tissues have different shapes compared to their normal counterpart. Thus, the diagnosis via pathology is the gold standard. Many datasets and challenges, such as (241, 242, 245), do not address any specific disease but do address cell-level tasks. In addition, many datasets and challenges target specific diseases, such as breast cancer (246, 247), cancers of the digestive system (248), cervical cancer (249, 250), lung cancer (251), thyroid cancer (252), and osteosarcoma (253).

Pathology Datasets according to Task. Generally speaking, the tasks used with these datasets and challenges can be classified into two categories: microscopic tasks and WSI-level tasks. The latter targets the diagnosis of diseases based on a classification task. Expanded from the simple classification tasks, many datasets and research methodologies focus on complex tasks, such as the segmentation of tumor cell areas (241, 242, 254) and the detection of pathological features (239, 248). Microscopic tasks derive from the clinical analysis to identify cells and detect mitosis to extract key features from pathology images to support further disease diagnosis. The following subsections expand on microscopic tasks and WSI-level tasks, respectively.

5.1.1 Microscopic-Related Datasets.

In this subsection, we introduce microscopic task related datasets and challenges. These tasks focus on microscopic features extraction (e.g., nucleus features) for further diagnosis and WSI-level tasks.

Data. Unlike the WSI level, the datasets and challenges that focus on microscopic tasks usually provide small-size patch-level images with high resolution. These patches are suitable for the annotation of microscopic-level objects and resource-limited algorithms. The size of images varies depending on the image analysis tasks. For segmentation and detection of the cells and nucleus, the size of images is usually a 1,000-pixel square to contain the suitable number of cells or nuclei. For individual cell analysis tasks (e.g., mitosis determination), the size is usually of a single cell. For other tasks (e.g., patch-level classification), the size varies from dataset to dataset.

Pathology Datasets for Cell Detection and Segmentation. The cell is one of the essential parts of the pathology image. The analysis of cells is one of the most effective ways to extract pathology image features for diagnosis. Pathologists analyze the size, shape, pattern, and stained color of the cells with their knowledge and expertise to make judgments about these cells and classify them as normal or abnormal. Thus, many datasets and challenges focus on segmentation and detection of cells. The cells and nucleus can be placed neatly on the slide. However, during slide preparation, these cells could overlap or locate randomly on the slide. Aiming at such a problem, challenges (249, 250) focus on segmentation and detection of overlapping cells and nuclei. The shape and size of cells from different organs might be different and can have different recognition and analysis challenges. Therefore, challenges (241, 242) focus on multi-organ cells or nucleus segmentation.

Pathology Datasets for Patch-Level Classification. Generally, the size of WSI is too large to be able to analyze every cell and relationships between cells. DL-based methods can find essential information from the patch-level image to support the diagnosis based on feature learning. The datasets and challenges, which provide patch-level images, mainly focus on classification, segmentation, or detection tasks. Challenges (252, 255–258) focus on patch-level image classification to determine whether metastatic or different tissue is present.

Datasets for Other Pathology Tasks. Besides the detection and segmentation of cells and patch-level classification, there are other microscopic tasks. Challenge (244) focuses on mitotic detection for nuclear atypia scoring. The atypical shape, size, and internal organization of cells are related to the progression of cancer. The more advanced the cancer is, the more atypical the cell appear. Challenge (245) focuses on cell tracking, to know how cells change shapes and move as they interact with their surrounding environment. This is the key to understanding the cell migration’s mechanobiology and its multiple implications in normal tissue development and many respective diseases. Challenge (243) focuses on the visual question answering task of pathology images using AI where the model is trained to pass the examination of the pathologist. Challenge (263) focuses on the tracking of particles, which mimic moving viruses, vesicles, receptors, and microtubule tips.

5.1.2 Datasets for WSI-Level Tasks.

WSI-level pathology tasks focus on the diagnosis of cancer and pathology image processing. WSI contains all the complete information of a patient to be able to establish an accurate diagnosis. Automatic diagnosis algorithms can quickly analyze the slide. This is useful especially in developing countries where there is a lack of well-experienced pathologists. However, it is a challenge to directly analyze WSI for both pathologists and algorithms because the size of WSI can be up to $100,\!000 \times 100,\!000$ pixels. Thus, such analysis becomes challenging, and to address this, most of the current datasets and challenges focus on classification and segmentation of biomarkers, cells, and other regions of interest. At the end of this subsection, we introduce other datasets and challenges that are related to the tasks of regression and localization of tumors and biomarkers.

Datasets for Classification of WSI. The prime goal of the examination of pathological images, especially WSI, is to diagnose cancer. Thus, how to classify WSI with large size and limited computing resources becomes a research challenge. Datasets and challenges (235, 236, 259, 260) focus on predicting cancer or evaluating WSI, such as Gleason grade or HER2 evaluation. In addition, some datasets and challenges (238, 239, 247) focus on the classification of metastasized cancer. The main difference between (239) and (238) is the scale of the dataset, and the latter provided more data for training and testing.

Datasets for Segmentation and Detection of WSI. DL-based methods are seen as a black box that processes pathology images. These methods have achieved state-of-the-art performance, but the interpretability of them remains difficult. From the pathologists’ point of view, datasets and challenges (237–239, 248, 261) focus on segmentation and detection tasks to determine the critical elements leading to a particular diagnosis, such as the cancer cell area and signet ring cell.

Datasets for Other WSI Tasks. Besides classification and detection, there are a few other tasks based on WSI. These includes the registration of pathology images (262) for data pre-processing and the localization of lymphocytes (255).

5.2 Blood-Related Datasets

Blood image analysis is the basis of the diagnosis of many diseases. In contrast to pathology images, blood samples’ images mainly contain blood cells, and blood-related datasets and challenges are aimed at blood-related cancer and cell counting. Similar to pathology images, these datasets and challenges also focus on the segmentation, detection, and classification of cells. The relevant datasets and challenges are listed in Table 13, with most of them reporting the size of data available. Only two of them mention the division of the training/testing set. About half of the challenges listed in Table 13 are linked with ISBI, and most of them require registration to access data. Other datasets and challenges are open.

Table 13.

One of the primary tasks of these datasets is the classification of cells, which focuses on identifying the different types of cells. Dataset (264) focuses on classifying red blood cells, white blood cells, platelets, and other cells. Dataset (265) focuses on the classification of malignant and non-malignant cells. Challenges (266, 267) focuses on the classification of malignant cells from normal cells in B-ALL white blood cancer microscopic images. Other datasets and challenges (268) (lymphocytes), (269, 270) (multiple myeloma segmentation), (271) (mitochondria segmentation), and (272) (malaria detection) focus on segmentation and detection of blood cells and biomarkers. Dataset (273) focuses on stain normalization for white blood cancer (B-ALL & MM) microscopic images.

5.3 Summary

In this section, we provided an overview of the datasets and challenges related to pathology and blood. We organized them into two tables to better illustrate and describe them. Table 12 listed the datasets and challenges related to pathology, including tasks such as WSI-level classification and segmentation related to cancer, cell detection or segmentation, and patch-level tasks. Table 13 listed datasets and challenges related to blood-related microscopic images. Most of the datasets and challenges listed in Table 13 reported the scale of data, and some of them provided information on the division of the training/testing set. Half of the challenges were associated with ISBI, and most of them required registration when accessing data, whereas other datasets and challenges were open.

6 Discussion

The success of AI algorithms such as DL has led to their widespread use in several fields, including medical image analysis. Researchers with different knowledge and background levels tackle image-based clinical tasks using computer vision tools to design automatic algorithms for different applications [59, 108, 137, 137, 143, 228, 230, 231]. Although AI algorithms can successfully handle many tasks, several unsolved problems and challenges hinder the development of AI-based medical image analysis.

6.1 Problems and Challenges

DL-based algorithms learn from input images of real data through gradient descent. Large-scale annotated datasets and a powerful DL model are key to the development of successful DL models. For example, the success of AlexNet [116], GoogleNet [198], and ResNet [85] is based on powerful models, which include millions of parameters. Additionally, a large-scale dataset, such as ImageNet [60], is also necessary to train the DL model to be able to tune such a large number of parameters. However, when these methods are applied to medical image analysis, many domain-specific problems and challenges start to appear. This subsection discusses some of these challenges.

6.1.1 Data Scarcity.

The biggest challenge in the development of DL models is data scarcity. Different from other areas, the scale of the medical image datasets is usually smaller due to many limitations (e.g., ethical restrictions).

The commonly used datasets for traditional computer vision are in larger scale compared to medical image datasets. For example, the handwritten digits dataset, MNIST [122], includes a training set with 60,000 examples and a testing set with 10,000 examples; the ImageNet dataset [60] includes 3 million images for training and testing; and Microsoft COCO [132] includes more than 2 million images with annotations. In contrast, many medical image datasets are smaller and only include hundreds or at most thousands of images. For example, the challenge BraTS 2020 (30) includes 400 subjects and different modalities for each subject, the challenge REFUGE (72) provides about 1,200 images of the eye, the challenge LUNA 16 (166) provides 888 CT scans, our recently published dataset of pulmonary lesions [127] provides only 694 scans, and the challenge CAMELYON 17 (238) contains more than 1,000 WSI pathology images.

There are multiple reasons for the lack of data. The main cause is due to restricted access to medical images by non-medical researchers (i.e., barriers between disciplines). The root causes of these barriers relate to the cost and difficulties of annotation and restricted access due to ethics and privacy.

Access to Data. As mentioned in Section 1, the direct cause of data scarcity is that most non-medical researchers are not allowed to access medical data directly. Although many medical data are generated worldwide every day, most non-medical researchers have no authorization to access clinical data. The easily accessible data include publicly available datasets, but these datasets are not at a large enough scale to be able to properly train a DL model.

Ethical Reasons. Ethics of medical data usage is a major bottleneck and a limitation for researchers, particularly computer scientists, because they do not have resources to collect data. It is also hard for them to access medical data. Medical data stored in databases always contains sensitive or private information, such as name, age, gender, and ID number. In some cases, the data records of medical images can be used to identify a patient. For example, if an MR scan includes the face, an intruder can identify them for a possibly evil purpose. In most countries and regions, it is illegal to distribute such data with private information without the patients’ permission, and nobody would usually consent to such distribution. Therefore, for DL researchers, it is impossible to gain authorization to access these datasets.

Before DL researchers are able to gain authorization even to desensitized data, they still need to pass ethical reviews.

Annotation. Another root cause is the difficulty to annotate medical images. Unlike other computer vision areas, the annotation of medical images requires specialized professionals and knowledge. For example, in autopilot, when annotating objects such as vehicles and pedestrians, there are no specific annotators’ requirements because most of us can easily distinguish a car or a human. However, when annotating medical images, domain-specific knowledge is essential. For example, few people without professional knowledge would be able to tell the difference between abnormal and normal tissue. However, it is impossible for a non-specialist to mark the lesion’s contour or diagnose a disease.

This difficulty cannot easily be solved even when professionals are employed to annotate data. First, the cost of annotation of medical data is huge. Once the researcher and their organization have obtained some data, they then need to spend more money to employ a few medics for its labeling. Such annotation cost is enormous, particularly when medical resources are scarce or where medical costs are high. For example, the challenge PALM (76) provides about 1,200 images with annotation, but its organizers involved only two clinical medics. Second, the physician who annotates the data is required to have a rich clinical and diagnostic experience, thus reducing the number of people who are suitable for this task even further. Third, to avoid any subjectivity, one image needs to be annotated by two or more physicians. Another problem is this: what should be done if the labels of two annotators are not the same? In many challenges, the organizer employs many junior physicians to annotate and employs a senior physician to decide if the junior physicians’ annotations are not the same. For example, in the challenge AGE (73), each data annotation is determined by the mean of four independent ophthalmologists in a group and it is then manually verified by a senior glaucoma expert.

6.1.2 Challenges of Medical Data.

The characteristics of medical images themselves pose difficulties for medical image analysis tasks.

There are many types and modalities of images used in medical image analysis. Similar to computer vision, the modalities include both 2D and 3D. However, the medical images have several other differences. Although the average scale of a medical image dataset is smaller than computer vision related field datasets, the size of each sample of data is larger on average than the one of a computer vision related field.

For 2D images, CR, WSI, and other modalities have large variances in resolution and color than the other computer vision data. Some modalities might need more bits to encode a pixel, whereas other modalities are significantly huge. For example, CAMELYON 17 (238) only includes about 1,000 pathology images, but the whole dataset is about 3 TB. Such datasets with few large samples pose a challenge for the AI algorithms, and it requires the researchers to focus on designing algorithms that can learn from limited computational resources and be useful for clinical diagnosis.

For 3D medical images such as CT and MRI, they are dense 3D data compared with sparse data, such as point cloud, in autopilot. Like the BraTS serial of challenges (30–38), many researchers face challenges in designing algorithms that can effectively learn from a multi-modal dataset.

These characteristics of medical images require well-designed algorithms with a more robust capability to fit the data well and without overfitting. However, that further leads to the need for more data and resources. It is a challenge to learn suitable features from a small sample dataset.

6.2 No Silver Bullet

The ideal scenario is to find or invent a method or an algorithm to simultaneously solve all of these encountered problems. However, there is no silver bullet. The problems and challenges related to the data and the adopted methods cannot be entirely resolved, or sometimes a problem arises as another is solved. Nevertheless, many ideas have been introduced to address the current problems, and they are introduced in this subsection.

With respect to the problems and challenges mentioned previously, researchers are working on two research directions: (i) a more effective model with less data and (ii) a more practical approach to access data. For the learning methods with small datasets, researchers use approaches such as few-shot learning and transfer learning. To access more data, researchers adopt three main approaches, namely federated learning, lifelong learning, and active learning.

6.2.1 Practical Learning from Small Samples.

Many medical image datasets have a small number of samples. For example, challenge MRBrains13 (5) only includes 20 subjects for training and testing, whereas challenge KITS 19 (122) has about 200 subjects. Therefore, many researchers struggle to find a practical approach to learn from small samples.

Few-Shot Learning and Zero-Shot Learning. Few-shot learning hits one of the critical spots of DL-based medical image analysis problems—that is, the development of DL models with fewer data [222]. Humans can effectively learn from a few samples. Therefore, different from the standard DL-based methods, humans learn to diagnose a disease from images without the need to view tens of thousands of images (i.e., from only few-shot). Meta-learning, also called learning to learn, is a technique used to address the few-shot learning problem [90]. Meta-learning can learn the meta-features from a small data size. The number of medical images in most datasets and challenges is not as large compared to the regular computer vision related datasets and challenges. Mondal et al. [155] use few-shot learning and GAN to segment medical images. The GAN is modified for semi-supervised learning with few-shot learning. Similar to few-shot learning, zero-shot learning aims at novel samples. Rezaei and Shahidi [175] cover a review of zero-shot learning from autonomous vehicles to COVID-19 diagnosis. However, zero-shot and few-shot learning also have their disadvantages, such as domain gap, overfitting, and interpretability.

Knowledge Transfer. Transfer learning is another method that can recognize and apply knowledge and skills learned from a previous task [236]. For example, both WM and GM segmentation and multi-organ segmentation are segmentation tasks [102]. However, the general neural network training is usually independent, which means that almost nobody trains a neural network with two tasks at once. However, it does not mean that these two tasks are unrelated. Besides zero-shot learning and few-shot learning, transfer learning, or say, knowledge transfer, is another method to infer knowledge from a previously learned task. Transfer learning can be applied to two similar tasks and between different domains [107]. The most significant advantage of transfer learning is that it uses rich scale datasets to pre-train the neural network and then fine-tune and transfer the network to the main task on a dataset with few samples.

6.2.2 Effective Access to More Samples.

Besides finding a practical approach to learn from small samples, many researchers have been working on active learning and federated learning (which aims to use data without access to sensitive information). This also reduces annotation costs of DL algorithms.

Federated Learning. Federated learning provides another way to access data. As discussed previously, the limitation of accessing data is led by privacy and other problems. Instead of directly sharing data, federated learning shares the model to protect privacy from being leaked. With other privacy protection methods, federated learning can effectively use the data from each independent data center or medical center.

However, there are two disadvantages of federated learning: annotation and implementation. The problem of annotation cannot be solved by sharing data but other methods [52, 155, 186]. The main challenges are the design and implementation of a method that allow only feature sharing (needed by algorithms) and not the private information, as only a few institutions have attempted federated learning so far. For example, Intel and other institutions have attempted to apply federated learning for brain tumor related tasks in their research [187]. The main challenges in their implementation include the following:

(1)

The implementation and proof of privacy protection,

(2)

The methodology for sharing and updating millions of the model’s parameters,

(3)

Preventing attacks on DL algorithms and leaks of data privacy on the Internet or computing nodes.

Natural Language Processing. Natural language processing is also a potential tool to automatically or semi-automatically annotate medical image data. It is a standard procedure for a medic to provide a diagnostic report of the patient, particularly after the medical image was taken. Therefore, such large amounts of data (image and text) are useful for medical image analysis after desensitization, and natural language processing can be used for annotation. Several natural language processing based methods (e.g., [109, 130, 214]) have been applied in medical-related research fields.

Active Learning. Active learning aims to reduce the annotation cost by indirectly using the unlabeled data to select the “best” samples to annotate. Generally, data annotation for DL requires experts to label data so that the neural network can learn from the data. Active learning does not require too many samples at the beginning of training. In other words, active learning can “help” annotators label their data. Active learning uses the knowledge learned from the labeled data to select and annotate the unlabeled data. The unlabeled data with annotation from algorithms is used to subsequently train the network over the next number of epochs. Active learning [52, 186] is used in medical image analysis in a loop of (i) the algorithm learning from the data annotated by humans, (ii) the human annotating the unlabeled data selected by the algorithm, and (ii) the algorithm adding the newly labeled data to the training set. The advantage of active learning is obvious: annotators do not need to annotate all the data they have, and additionally, the neural network can learn from data faster from such interactive progress.

7 Conclusion

In this work, we provided a comprehensive survey of the datasets and challenges for medical image analysis, collected between 2013 and 2020. The datasets and challenges were categorized into four themes: head and neck, chest and abdomen, pathology and blood, and others. We provided a summary of all the details about these themes and data. We also discussed the problems and challenges of medical image analysis and the possible solutions to these problems and challenges.

Footnote

The website with the git repo will be public after the article is accepted.

References

[1]

William R. Hendee and E. Russell Ritenour. 2002. Computed tomography. In Medical Imaging Physics (4th ed.). John Wiley & Sons, 251–264.

Abstract

1 Introduction

2 Medical Image Datasets

2.1 Dataset Timeline

2.2 Body Parts

2.3 Modalities

2.4 Tasks

2.5 Source and Destination

2.6 Notes to the Tables

3 Head and Neck Related Datasets and Challenges

3.1 Structural Analysis Tasks of the Brain

3.1.1 Segmentation of WM and GM.

3.1.2 Segmentation of Functional Areas and Other Tissues.

3.1.3 Imaging-Related Tasks.

3.2 Brain Disease Related Datasets and Challenges

3.2.1 Datasets for Segmentation of Lesions.

3.2.2 Classification of Brain Disease.

3.3 Eye-Related Datasets and Challenges

3.4 Datasets and Challenges of Other Subjects

3.4.1 Neck-Related Datasets.

3.4.2 Cephalometric and Teeth Related Datasets.

3.4.3 Behavior and Cognition Datasets.

3.5 Summary

4 Chest and Abdomen Related Datasets and Challenges

4.1 Datasets for Chest and Abdominal Organ Segmentation

4.1.1 Datasets of the Chest and Abdominal Organs.

4.1.2 Chest and Abdomen Intra-Organ Segmentation.

4.2 Datasets for Diagnosis of Chest and Abdominal Diseases

4.3 Datasets for Other Chest and Abdomen Related Tasks

4.4 Summary

5 Datasets and Challenges for Pathology and Blood

5.1 Datasets and Challenges for Pathology

5.1.1 Microscopic-Related Datasets.

5.1.2 Datasets for WSI-Level Tasks.

5.2 Blood-Related Datasets

5.3 Summary

6 Discussion

6.1 Problems and Challenges

6.1.1 Data Scarcity.

6.1.2 Challenges of Medical Data.

6.2 No Silver Bullet

6.2.1 Practical Learning from Small Samples.

6.2.2 Effective Access to More Samples.

7 Conclusion

Footnote

References

Cited By

Index Terms

Recommendations

Explainable Deep Learning Methods in Medical Image Classification: A Survey

Medical deep learning—A systematic meta-review

Painless and accurate medical image analysis using deep reinforcement learning with task-oriented homogenized automatic pre-processing

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations