1 Introduction

Intelligent medical refers to establishing medical information platforms for health records through big data, Internet of things, cloud computing and other technologies, so as to establish the interaction between patients and medical staff, medical institutions and medical devices, gradually realizing informatization (Zhao et al. 2020a, b). By using artificial intelligence technology, intelligent medical partly replaces the previous human completed medical work, making digital medical highly intelligent, greatly improving the working efficiency and accuracy, so as to construct a new medical system from the bottom of the gene, the middle level of disease data, to the higher level of diagnosis and surgery. Nowadays, the intelligent medical industry around the world is in a stage of rapid development and the market size is constantly expanding (Wu et al. 2018). With the continuous expansion of the market scale, more and more capital is turning to the intelligent medical industry (Table 1 shows previous review articles of artificial intelligence techniques handling uncertain medical data).

Table 1 Overview of previous review articles of artificial intelligence techniques handling uncertain medical data

Meanwhile, with the continuous development of artificial intelligence technique, the basic theories of intelligent medical decision-making are also deepened and expanded. As a special type of machine learning, deep learning is constructed in the form of multi-layer neural network (LeCun et al. 2015). Due to its strong expression ability and performing complex enough functions for feature fitting, deep learning has achieved good practical results in many fields, e.g. emotion recognition (Hossain and Muhammad 2019), visual object recognition (Li et al. 2022), object detection (Chen and Lu 2021), and drug discovery (Berrhail et al. 2022). At present, deep learning plays an important role in medical image analysis (Torfi et al. 2022), such as medical image segmentation (Jyothi and Singh 2023) and medical image classification (Murtaza et al. 2020), etc.

However, such classical deep learning is calculated with crisp values, while imprecise, uncertain, and vague medical data is common in the process of diagnosis and treatment. Generally, uncertain medical data include three categories: noise, artifact and high dimensional unstructured information (Alizadehsani et al. 2021). For example, noise among labels, unclear boundary of organ in image, imprecise test measurement, unstructured text of the disease description, missed cases in diagnosis information, low-quality multimodal medical images. Moreover, there are many problems with deep learning in intelligent medical, including lack of interpretability, over-fitting or under-fitting if medical sample size is small, and limit in handling uncertain or imprecise circumstances. Therefore, deep learning models based on precise mathematics may seriously affect the outcome of precise diagnosis, reduce the accuracy of model predictions, and exhibit undesirable performance in handling uncertain medical data.

Fortunately, fuzzy deep learning originating from the fuzzy sets (Zadeh 1965), has strong ability to effectively deal with uncertain and imprecise information, providing a new perspective to alleviate the above problems. Firstly, fuzzy-based models based on membership function are useful for effectively representation uncertain medical data. Secondly, fuzzy logic is similar to the way that humans think and perceive, endowing the interpretability in the learning and representation process. Moreover, due to the ability of handling uncertain and imprecise medical data, fuzzy deep learning is also suitable for few-shot learning which is common in medical domain. A state-of-the-art survey paper summarizes the research achievements about the fusion of deep learning and fuzzy systems in recent years, and represents the overall framework and graphical form of fusing deep learning and fuzzy systems, as well as some challenges and future research (Zheng et al. 2022a, b, c). Especially, fuzzy deep learning has been applied to handle uncertain medical data, e.g., images (Amaya-Rodriguez et al. 2019; Ramirez et al. 2019; Mohebbian et al. 2020; Huang et al. 2021), text records (Davoodi and Moradi 2018; Gangavarapu et al. 2019; Li et al. 2021a, b, c; Poli et al. 2021) and video files (Lawanot et al. 2019; Fathabadi et al. 2021; Verma and Dubey 2021). Meanwhile, fuzzy deep learning has been used to solve practical problems in medical scene, such as segmentation (Chouhan et al. 2018; Sevik et al. 2019), classification (Luo et al. 2020; Zhuang et al. 2020), natural language processing (Davoodi and Moradi 2018; Gangavarapu et al. 2019), prediction (Jiang, Li et al. 2021a, b, c; Nguyen et al. 2022) and fusion (Hermessi et al. 2019; Vinnarasi et al. 2021), etc. Fuzzy deep learning not only handles uncertainty, but also improves the accuracy of medical data processing, and adds interpretability of the learning model. Specifically, pyramid fuzzy block can find the uncertain pixels and reduce their weights; fuzzy entropy can accurately measure the uncertainty of pixels. Furthermore, fuzzy deep learning achieves favorable performance in a few-shot learning manner and improves the cross-dataset generalization ability in classification and disease grading missions. Because of the embedding of fuzzy mathematics, the interpretability of the whole model is further enhanced.

This paper aims to review fuzzy deep learning for uncertain medical data. Firstly, the reviewed articles are selected from Web of Science, Scopus, IEEE Xplore and ACM Digital Library, and some statistical results reflect rapid development of this research direction. Secondly, four types of frameworks of fuzzy deep learning models used for uncertain medical data provides overall perspective of techniques, and a survey of fuzzy deep learning for uncertain medical data has been investigated from three aspects: fuzzy deep learning models, uncertain medical data and application scenarios. Thirdly, the evaluation metrics are exhibited from the aspects of classification, segmentation and fusion. At last, some critical discussions on advantages are provided to show challenges and future research directions of fuzzy deep learning for uncertain medical data.

The rest of the paper is constructed as follows: Sect. 2 gives some descriptive analysis of some literature on fuzzy deep learning for uncertain medical data. Section 3 constructs four types of frameworks and introduces fuzzy deep learning for uncertain medical data from three aspects: widely-used fuzzy deep learning models, uncertain medical data and application scenarios. Then, Sect. 4 analyzes frequently used evaluation metrics in literature. Section 5 exhibits some critical discussions on advantages, challenges and future research directions. Finally, some conclusions are shown in Sect. 6.

2 Descriptive analysis

This section introduces some descriptive analysis including materials of reviewed literature and statistical results.

2.1 Materials

In order to collect as many relevant publications as possible, the data search process in Web of Science, Scopus, IEEE Xplore and ACM Digital Library combines the search string with the Boolean operator to form our queries. Web of Science: TS = (fuzzy AND “deep learning” AND (medical OR medical)), database = core collection database. Then, till June 30th, 2022, 198 articles matched the constraints. Scopus: TITLE-ABS-KEY (fuzzy AND “deep learning” AND (medical OR medical)). Then, till June 30th, 2022, 263 articles matched the constraints. IEEE Xplore: searching from “fuzzy AND “deep learning” AND (medical OR medical)”. Then, till June 30th, 2022, 37 articles matched the constraints. ACM Digital Library: searching from “fuzzy AND “deep learning” AND (medical OR medical)”. Then, till June 30th, 2022, 107 articles matched the constraints. All of the above are exported to BibTeX format which include detailed information such as titles, abstracts, keywords, authors, etc. After that, the files are converted to excel files for deduplication by Andrea Caputo’s method (Caputo and Kargina 2021). And the literature selection process was optimized to make the topic more precise. Finally, 513 articles are selected for further analysis. The whole selection process of reviewed literature is shown as Fig. 1.

Fig. 1
figure 1

Flowchart of the material selection

2.2 Statistical results

On the basis of the integrated materials searched from different databases, we use bibliometrix and excel to analyze and obtain some statistical results including publications by year and keywords analysis of publications.

2.2.1 Publications by year

According to the integrated materials, the first publication appeared in 2014. To demonstrate the development of fuzzy deep learning for uncertain medical data, Fig. 2 shows the annual publication volume.

Fig. 2
figure 2

The number of publications by year

In Fig. 2, it is clear that the production of relevant papers from 2014 to 2017 was relatively small. It has grown rapidly since then, from 28 papers in 2018 to 172 papers in 2021. In addition, till June 31, 2022, 83 papers have been published. Because of such a growth trend, the number of articles published in 2022 is likely to exceed that in 2021. To sum up, the direction of fuzzy deep learning for uncertain medical data is getting more attention and the pace of its development is very fast and breathtaking.

2.2.2 Keywords analysis of publications

Keywords are important summary of a paper, which are intuitive to reflect the current research hotspots of fuzzy deep learning for medical data through keyword analysis of the collected papers. Thus, we conduct an analysis by bibliometrix based on the author keywords to study the keywords tree map and keyword dynamics. In addition, to be as precise as possible in the final statistics, keywords that have the same meaning, such as “CNN” and “convolutional neural network”, are combined together.

From a statistical point of view, Fig. 3 shows the frequency and proportion of the top 15 keywords. Their frequency ranges from 133 to 8, of which “deep learning” has the largest frequency and proportion, followed by are “CNN”, “image segmentation”, “machine learning”, “fuzzy logic”, etc. Moreover, we can conclude that the relevant theories in fuzzy deep learning are mainly applied to image segmentation, COVID-19, breast cancer, and image processing, etc.

Fig. 3
figure 3

Top 15 keywords

From a changing perspective over time, Fig. 4 shows the dynamics of the top 10 most frequently used words over time from 2014 to 2022, from which we can know the general growth trend of keywords. It is obvious to see that the development of this field was relatively slow from 2014 to 2017, and it has grown significantly after 2017. Among them, “deep learning” and “CNN” have always been in the leading position, but after 2019, the number of “CNN” appearances began to decrease. It is worth mentioning that with the outbreak of the COVID-19. After 2019, research on the COVID-19 has risen sharply. In addition, the applications of “medical segmentation” and “classification” have increased steadily.

Fig. 4
figure 4

Word dynamics of keywords

3 Fuzzy deep learning for uncertain medical data

Motivated by different fusion patterns of fuzzy techniques and deep learning models, four types of frameworks of fuzzy deep learning models, named sequential framework, simultaneous framework, blending framework, integrated framework, are divided for uncertain medical data after overviewing the most of exiting literature.

At first, the sequential framework of fuzzy deep learning models is shown in Fig. 5. Fuzzy techniques, such as fuzzy logic, fuzzy C-means and fuzzy inference system, etc., are used for data normalization or image segmentation, and then using deep learning models, like deep neural network, long short-term memory, autoencoder, deep belief network and convolutional neural network, etc., for feature extraction or classification. For example, fuzzy C-means is applied for data normalization (Kumar et al. 2022), and the fuzzy logic system realizes fuzzy inference and transform the image pixel to fuzzy correlation map, the fuzzy logic output is then fed to the AlexNet convolutional neural network to classify the melanoma as benign or malignant (Yalcinkaya and Erbas 2021). Meanwhile, fuzzy entropy is used to handle low frequency sub-bands and then fuse high frequency sub-bands and low frequency sub-bands by inverse NSST, which is one of variant of convolutional neural network (Vinnarasi et al. 2021). Furthermore, fuzzy C-means method is executed for clustering unlabeled data, and then, the clustering data is input into the deep neural network to form the hybrid model (Joloudari et al. 2022). Fuzzy C-means method or fuzzy gray level co-occurrence matrix is used for image segmentation, and the segmented images are delivered to autoencoder or convolutional neural network to feature representation or feature reduction (Hassan et al. 2017; Chauhan and Choi 2021; Yamunadevi and Ranjani 2021).

Fig. 5
figure 5

The sequential framework of fuzzy deep learning models for uncertain medical data

Secondly, the simultaneous framework of fuzzy deep learning models is shown in Fig. 6. Fuzzy techniques and deep learning models are both applied for the same mission, like segmentation or classification, and then fuse the results of fuzzy techniques and deep learning models to get final results. That can overcome the drawbacks of different methods to obtain more accurate diagnosis. For example, both deep neural network and fuzzy inference engine are applied for classification, and the final decision is obtained by averaging the output of the above two models (Shaban et al. 2021).

Fig. 6
figure 6

The simultaneous framework of fuzzy deep learning models for uncertain medical data

Thirdly, the blending framework of fuzzy deep learning models for uncertain medical data is shown in Fig. 7. Fuzzy logic system or fuzzy membership function is embedded into deep learning framework. The first layer is to transform the crisp values into fuzzy data through fuzzification, and then go to the second layer. All hidden layers of this type of fuzzy deep learning model are designed to represent the fuzzy if–then rules, which are obtained through learning algorithm. The last layer is defuzzification layer, transforming fuzzy results into crisp values and obtaining the final results. For example, interval type-2 possibilistic fuzzy C-means is embedded into the conventional fuzzy neural network to deal with the uncertainty of the inputs (Shen et al. 2020). By embedding fuzzy logic in autoencoder and setting regularization parameters in the loss function, the performance of learning model can be largely improved (Hwang et al. 2019). In addition, fuzzy membership function can be implemented as the activation function in deep learning model (Sharma et al. 2021).

Fig. 7
figure 7

The blending framework of fuzzy deep learning models for uncertain medical data

Finally, the integrated framework of fuzzy deep learning models for uncertain medical data is shown in Fig. 8. Different deep learning models are designed for feature extraction or classification at the same time to produce the probability map, and then provide to fuzzy techniques, such as fuzzy integral and network-based fuzzy inference system to fuse different results and obtain the final results. For instance, in pattern recognition problems, fuzzy integral has made great achievements in combining the output of classifier, and plays an important role in combining scores obtained from different convolutional neural network variants as a way to acquire an effective final output (Banerjee et al. 2022). Similarly, a network-based fuzzy inference system is implemented to fuse different segmentation results from several convolutional neural networks to get the final multiple sclerosis lesion segmentation (Essa et al. 2020).

Fig. 8
figure 8

The integrated framework of fuzzy deep learning models for uncertain medical data

In the following, we review the literature from three aspects: fuzzy deep learning models, uncertain medical data, and application scenarios.

3.1 Fuzzy deep learning models

Frequently fuzzy deep learning models used for uncertain medical data includes fuzzy deep neural network approaches, fuzzy long short-term memory approaches, fuzzy convolutional neural network approaches, fuzzy autoencoder approaches, fuzzy deep belief network approaches, and other neuro-fuzzy approaches. Each of them is explained as follows.

3.1.1 Fuzzy deep neural network

As one of the fundamental models in deep learning, Deep Neural Network (DNN) can be considered as a neural network with many hidden layers (Hinton et al. 2012). Combined with fuzzy technique such as fuzzy C-means and fuzzy active shape model, fuzzy DNN approaches deal with many problems in medical. For example, fuzzy C-means combined with DNN is used for diagnosing coronary artery disease (Joloudari et al. 2022), achieving the best performance compared to neural network and single DNN models. Similarly, density-based fuzzy C-means algorithm is applied in the segmentation of intracranial hemorrhage CT images, and then image classification is realized by DNN (Venugopal et al. 2021). DNN is also implemented to localize the kidney bounding box, and the weighted fuzzy active shape model is used to automatically segment the kidney capsule in 3D ultrasound images (Tabrizi et al. 2018). Moreover, DNN and fuzzy inference engine are both designed for classification, and the final decision is made by calculating the average of the output values of two models, to overcome the inherent challenges of any model and obtain more accurate diagnosis (Shaban et al. 2021). A review article has investigated some soft computing approaches like fuzzy logic, artificial neural network, genetic algorithm, and deep learning in medical imaging modalities and processing (Devi et al. 2021).

3.1.2 Fuzzy long short-term memory network

Long Short-Term Memory (LSTM), proposed by Hochreiter and Schmidhuber (Hochreiter and Schmidhuber 1997), can learn long-term dependency relations. Combined with fuzzy C-means, fuzzy recurrence plots, and adaptive neuro-fuzzy inference system, LSTM glows new vigor in uncertain medical data. Recently, fuzzy C-means acts as a segmentation tool, and the segmented images are used as an input for deep feature extraction. After feature extraction and feature transformation, a modified LSTM is done for final classification. Comparative results on several dataset show that the proposed methods for skin disease is superior to conventional approaches in terms of computation effectiveness and classification accuracy (Elashiri et al. 2022). Fuzzy C-means and dual attention mechanism can cope with the issues of high complexity and high noise in medical image, improving segmentation performance. Then the bi-directional LSTM is adopted to combine multi-scale feature (Cai et al. 2021). Fuzzy recurrence plot of very short time series is used to process the initial data and act as input data for the machine training, then LSTM is applied for classification (Pham et al. 2019). LSTM and adaptive neuro-fuzzy inference system is used for forecasting COVID-19, respectively. Results show that these methods both play key roles in predicting the number of beds or other types of medical facilities needed during the coronavirus pandemic (Shafiekhani et al. 2022).

3.1.3 Fuzzy convolutional neural network

Convolutional Neural Network (CNN) is one of representative algorithms of deep learning (Gu et al. 2018), with the deep structure containing convolution computation. Fuzzy CNN approaches are presented in two forms, one is CNN combined with fuzzy C-means, and the other is CNN combined with fuzzy logic. For the first fusion form, fuzzy C-means improves the preprocessing result of the input MRI dataset, and CNNs are used for the diagnosis of glioma detection (Amaya-Rodriguez et al. 2019). Similarly, fuzzy C-means are adopted for image segmentation during preprocessing (Kim et al. 2019; Sevik et al. 2019). Fast region-based CNN algorithm with fuzzy k-means clustering achieves automated diagnose different eye diseases (Nazir et al. 2020). Before tumor segmentation through convolutional block attention, fuzzy C-means is used for data normalization, whose performance is better than the other two normalization techniques (Kumar et al. 2022). For the second fusion form, CNN with fuzzy logic, can observe the relationship between the pixels and associated medical image (Yalcinkaya and Erbas 2021). Fuzzy logic, combined with three state-of-the-art CNNs, namely Inception V3, Inception ResNet V2 and DenseNet 201, shows great efficiency in COVID-19 detection (Banerjee et al. 2022). AlexNet deep CNN with type-2 fuzzy logic is used to fuse magnetic resonance scans (T1, T2FS and STIR) when dealing with low-frequency sub-images, while high frequency sub-images are selected according to the maximum of the absolute value (Hermessi et al. 2019).

3.1.4 Fuzzy autoencoder

Autoencoder (AE) is an artificial neural network that takes input information as the learning target and learns the representation of input information (LeCun 1987). The level set method and fuzzy C-means clustering algorithm are used to segment liver lesions, and = stacked sparse AE is assigned to extract high-level feature representation from pixels of the segmented images (Hassan et al. 2017). Similarly, fuzzy C-means is used to segment brain MRI, and the segmented image is adopted as input data of AE for feature reduction (Chauhan and Choi 2021). A hybrid depth acoustic emission segmentation method based on Bayesian fuzzy clustering is proposed for brain tumor classification (Raja and Rani 2020). Furthermore, AEs are applied to reduce the dimension of the data and methods such as fuzzy C-means clustering is used to determine the risk level (Dervishi 2017). After dimensionality reduction employed by AE, a standard adaptive neuro-fuzzy inference system and its variants are used for classification (Shoeibi et al. 2022). In order to prevent and control COVID-19, a co-evolutionary transfer learning models is proposed to predict the demands of medical materials, in which fuzzy deep contractive autoencoder is one of important technique for each prediction task (Song et al. 2022a, b).

3.1.5 Fuzzy deep belief network

Deep Belief Network (DBN) is a probabilistic generative model (Hinton et al. 2006), applied in statistical modeling and representing abstract features or statistical distributions of things. Usually integrated with fuzzy C-means, fuzzy DBN plays an important role in medical area. For example, based on CT images, DBN and fuzzy C-means are combined to cluster lung cancer patients (Zhao et al. 2020a, b), and also used to identify the compression and ventilation waveforms so as to evaluate the quality of cardiopulmonary resuscitation (Zhang et al. 2020a, b). As for heart disease diagnosis, sparse fuzzy C-means is used to pick out key features from medical data, and then the selected features are given to DBN, which is trained using the Taylor-based bird swarm algorithm. In the feature selection process, the incorporation of sparse fuzzy C-means provides more benefits for the interpretation of the model and can be used to deal with high dimensional data (Alhassan and Zainon 2020).

3.1.6 Deep fuzzy system

Different from the above deep learning models based on neural network, the deep fuzzy system is developed on the foundation of a hierarchical fuzzy system, drawing inspiration from deep learning models (Wang et al. 2022a, b, c). This allows it to get rid of its dependence on deep neural networks and maintain the high interpretability of fuzzy systems. In recent years, a number of deep fuzzy systems applied in the medical field have been developed based on the Takagi–Sugeno-Kang (TSK) method. For example, the multiview TSK fuzzy system (MV-TSK-FS) used for recognition of epileptic EEG signals is developed based on a multiview collaborative learning mechanism (Jiang et al. 2017). An enhanced transductive transfer learning TSK fuzzy system (ETTL-TSK-FS) is investigated to improve the transfer learning abilities of TSK fuzzy system for epileptic EEG recognition (Deng et al. 2018). Similarly, a method leveraging a TSK fuzzy system and deep features has been proposed for the automatic identification of anxiety among college students (Meng and Zhang 2020). In addition to TSK, a deep convolutional fuzzy system (DCFS) integrated with the Wang-Mendel method is proposed specifically for addressing high-dimensional problems (Wang 2020). Moreover, a deep fuzzy model is designed for the analysis and detection of CT images in individuals infected with COVID-19 (Song et al. 2022a, b).

3.1.7 Other neuro-fuzzy models

Except above fuzzy deep learning models, there are some neuro-fuzzy approaches widely-used to handle uncertain medical data. A fuzzy deep neural network model is constructed to classify the electronic diagnosis of CT brain intracranial hemorrhage (Mansour et al. 2021). And a new cat fuzzy neural model is proposed for the classification of cardiovascular disease such as heart attack, angina, stroke, arrhythmia, and coronary heart diseases (Kumar and Ramana 2021). Moreover, a new skin lesion classification model based on segmentation is proposed by combining GrabCut algorithm with adaptive neuro-fuzzy classifier (Sikkandar et al. 2021). An attentive hierarchical adaptive neuro-fuzzy inference system is proposed to predict clinical outcomes, where the hierarchical structure in fuzzy modeling helps to improve interpretability (Nguyen et al. 2022). Further, a breast cancer diagnosis model is developed by fusing interval type-2 possibilistic fuzzy C-means with fuzzy neural network, which includes a segmentation model and a hierarchical fuzzy classifier, and thus achieves favorable performance and enhance the interpretability (Shen et al. 2020). The deep non-iterative random vector function link neural network is used to establish the classification model with S-membership function as the activation function. The S-membership activation function can not only map nonlinear data from input vectors to feature vectors, but also compress outliers into a membership range between 0 and 1 (Sharma et al. 2021). Fortunately, a survey has investigated different techniques and algorithms applied in detection of mitosis and non-mitosis cells, such as fuzzy logic, neuro-fuzzy system, Artificial Neural Network, CNN, etc. (Malavade et al. 2018).

3.2 Uncertain medical data

3.2.1 Images

Medical image analysis, that is the processing and analysis of digital medical images, plays an important role in the development of medical intelligence platform. The common medical images can be divided into four types, including X-ray imaging, Computerized tomography (CT) imaging, Ultrasound imaging, and Magnetic resonance imaging (MRI).

X-ray imaging (Varela-Santos and Melin 2021) is a method of diagnosing disease with x-rays. It can be divided into ordinary examination, special examination and contrast examination. In the medical imaging field, it’s an important goal to decrease the absorbed dosage of X-ray by patient while keeping image quality. To be specific, low-dose images can decrease the absorbed dosage compared with normal-dose images, but also affects the image quality because of quantum noise. Thus, a fast-ICA is combined with an adaptive type-2 fuzzy filter is proposed to filter low-dose images (Mohebbian et al. 2020).

CT imaging is a noninvasive medical examination or procedure that produces cross-sectional images of the body adopting specialized X-ray equipment (https://www.fda.gov/radiation-emitting-products/medical-x-ray-imaging/computed-tomography-ct#1), so that it can depict internal organs, bones, soft tissues, and blood vessels and provide more detail than traditional X-ray images such as chest X-Ray. COVID-19 attracts many people’s attention during these three years, and fuzzy logic combined with three state-of-the-art CNNs, namely Inception V3, Inception ResNet V2 and DenseNet 201, as well as fuzzy C-means, are proposed to screen COVID-19 from chest X-rays and CT scans (Ali et al. 2021; Banerjee et al. 2022). CT image also plays a key role in the early detection of suspicious lung pathology, and a fuzzy deep learning is presented to localize pulmonary lung lesion (Ramirez et al. 2019), and DBN along with fuzzy-C means perform better classification compared to the latest methods (Zhao et al. 2020a, b). As for breast cancer CT image, fuzzy C-means clustering algorithm is first improved and optimized for medical images, and then construct a classification and detection model using CNN (Wang et al. 2021).

Ultrasound imaging is used to view inside the body with the help of high-frequency sound waves (https://www.fda.gov/radiation-emitting-products/medical-imaging/ultrasound-imaging#description). Unlike X-ray imaging, no ionizing radiation exposure exists in the process of ultrasound imaging. In terms of breast ultrasound image semantic segmentation, by analyzing uncertainty using pyramid fuzzy blocks and generating new features based on connectivity, a novel deep learning model is proposed that outperforms eight state-of-the-art deep learning-based approaches (Huang et al. 2021). As for breast lesion ultrasound images classification, fuzzy enhancement and bilateral filtering techniques are used to enrich the input information of breast lesions and provide better classification results (Zhuang et al. 2020). By integrating level set method, fuzzy C-means and stacked sparse AE, an intelligent system is designed for the diagnosis of focal liver diseases based on ultrasound images, which shows great classification accuracy compared to three state-of-the-art techniques (Hassan et al. 2017).

MRI is a medical imaging procedure that uses strong magnetic fields and radio waves to image the internal structures of body. The signal in MR image usually comes from the protons in fat and water molecules in the body, so that it is effective on soft tissues with anatomical information. To detect glioma, a type of brain tumor, CNNs are used for diagnosis aids and fuzzy C-means improves the method for preprocessing the input MRI dataset (Amaya-Rodriguez et al. 2019). Based on heart MRI dataset, an innovative hybrid algorithm is proposed to address noisy data, by combining hybrid ant colony, cat fuzzy neural model and African buffalo optimization (Doppala et al. 2020). As for aided diagnosis for soft tissue sarcomas of the extremities, a fusion framework, constructed by AlexNet deep CNN and type-2 fuzzy sets, investigates the significance from MR images (Hermessi et al. 2019). MRI is also a widely used technique to classify and detect prostate cancer, and fuzzy k-nearest neighbor model is utilized for classification process (Malibari et al. 2022).

3.2.2 Text records

In addition to medical images, some valuable information and rich features also can be extracted through medical text records by fuzzy deep learning (see Table 2). In terms of electronic health records, fuzzy based deep learning models (Davoodi and Moradi 2018) not only predict mortality in intensive care units, but also automated classify unstructured clinical nursing notes (Gangavarapu et al. 2019). As for traditional Chinese medical, fuzzy linguistic model and artificial intelligence technique has successfully made a judgment of constitutional type (Li et al. 2021a, b, c). Natural language generation technology combined with fuzzy logic is used for textual interpretation generation of image region semantic annotations (Poli et al. 2021).

Table 2 Fuzzy deep learning for medical text records

3.2.3 Video files

Multi-source heterogenous data is more and more common in medical evaluation and treatment, and attracts the interests of researchers. Towards medical video files, deep learning and fuzzy based techniques have recognized daily stress and mood (Lawanot et al. 2019), predict the diseased or disinfected rice plant (Verma and Dubey 2021), and evaluate the trainees’ laparoscopic surgery skills (Fathabadi et al. 2021). Detailed information of fuzzy deep learning for medical video files can be seen in Table 3.

Table 3 Fuzzy deep learning for medical video files

3.2.4 High-throughput data

In the past decades, the application of high-throughput data (obtained through genomics, proteomics, or metabolomics) has been gradually accepted for clinical practice, such as gene mutation, single nucleotide polymorphism, etc. They are also important medical data. Deep learning-based approaches help computational biologists integrate, predict and draw statistical inference about biological outcomes (Sen et al. 2021). Combined with fuzzy logic, machine learning or even deep learning is applied to provide a more comprehensive and advanced mechanism for high-throughput data mining (Yang et al. 2021). Detailed information of fuzzy deep learning for medical high-throughput data is displayed in Table 4.

Table 4 Fuzzy deep learning for medical high-throughput data

3.2.5 Tabular data

Tabular data frequently encapsulates crucial pathological features and individual information of patients, facilitating a more comprehensive understanding of their medical conditions and aiding in clinical decision-making for healthcare practitioners. Approaches grounded in fuzzy deep learning effectively integrate diverse feature information to make informed judgments. These approaches have extensive application in predicting conditions such as Parkinson’s disease (Pham et al. 2019), diabetes (Gucen and Karaboga 2019), and COVID-19 (Shafiekhani et al. 2022). Besides, in the realm of classification tasks based on tabular data, fuzzy deep learning methods exhibit enhanced interpretability compared to traditional deep learning models (Nguyen et al. 2022). These methods are adept at addressing imbalanced classifications (Wang et al. 2022b) and achieve enhanced classification performance (Aversano et al. 2020; Shaji et al. 2023). Detailed information of fuzzy deep learning for medical tabular data is displayed in Table 5.

Table 5 Fuzzy deep learning for medical tabular data

3.3 Application scenarios

3.3.1 Segmentation

Image segmentation, dividing an image into its component parts, is a very important and decisive procedure in medical imaging study. It can be executed in several automated computational procedures, extracting meaningful information such as shape, volume and motion (Saxena et al. 2019). A survey, focusing on soft computing approaches like fuzzy logic, artificial neural network, and genetic algorithm for image segmentation, has explored state-of-the-art technology for researchers (Chouhan et al. 2018). Fuzzy C-means algorithm and SegNet-based semantic segmentation are implemented for automatic classification of skin burn color images (Sevik et al. 2019). As for semantic segmentation of breast ultrasound images, a novel deep learning structure is constructed by using pyramid fuzzy blocks and a new feature is generated based on connectivity (Huang et al. 2021). To automatically diagnose different eye diseases, a fast region-based CNN algorithm with fuzzy k-means clustering is presented for disease localization and segmentation (Nazir et al. 2020), and a contour detection-based image processing algorithm based on type-2 fuzzy rules is also developed to detect blood vessels in fundus images (Orujov et al. 2020). In terms of segmentation of lungs CT images, a new fuzzy C-mean automated region-growing segmentation approach is designed for COVID-19 diagnosis (Ali et al. 2021).

3.3.2 Classification

Image classification is an image processing method that distinguishing objects into different classes according to their different features reflected in the image information. In terms of breast, fuzzy C-means with deep metric learning approach, shows benefit in classification process of breast cancer tissues, computing based on ResNet50 (Calderaro et al. 2021). Similarly, a segmentation model and hierarchical fuzzy classifier for breast cancer diagnosis is proposed by combining interval type-2 possibilistic fuzzy C-means with fuzzy neural network (Shen et al. 2020). Fuzzy enhancement plays an important role in solving the problem of blurring and speckle noise in the region of interest of breast lesions (Zhuang et al. 2020). Aiming at the problem of cardiovascular disease prediction, a hybrid algorithm based on cat fuzzy neural model is proposed. Results show that the proposed method performs better in classification accuracy and error rate than the existing methods (Doppala et al. 2020). Fast and robust fuzzy C-means and simple linear iterative clustering superpixel algorithms are adopted for image segmentation in the process of preprocessing (Kim et al. 2019). Similarly, fuzzy C-means is also used to fine-tune the DBN as ground-truth labels (Zhao et al. 2020a, b). For diabetic retinal image classification, a self-supervised fuzzy clustering network is proposed, containing a feature learning module, a reconstruction module, and a fuzzy self-supervision module (Luo et al. 2020).

3.3.3 Natural language processing

With medical information and patients’ medical history stored in a free-text format, natural language processing can help doctors extract key information from vast records and turn the text into usable knowledge. A deep rule-based fuzzy system is proposed to deal with big data with heterogeneous mixed categorical and numeric attributes, where the hidden layers in each unit are represented by interpretable fuzzy rules (Davoodi and Moradi 2018). A fuzzy marker-based similarity approach is proposed to aggregate a large number of clinical documents of a patient, and a modeling method based on vector space and coherence topic is presented to construct free texts (Gangavarapu et al. 2019). To construct nine standard Traditional Chinese Medical constitutional types as the basic sample data, the fuzzy linguistic variables are represented using membership degree in neural network model (Li et al. 2021a, b, c). Moreover, semantic image annotation with fuzzy logic is a useful technique that can capture not only imprecise segmented images but also vague human spatial knowledge and vocabulary (Poli et al. 2021).

3.3.4 Prediction

Prediction of clinical outcomes based on the patient’s medical data can improve prognostic accuracy and further reinforce clinical decision-making (Nguyen et al. 2022). An attentive hierarchical adaptive neuro-fuzzy inference system is proposed for clinical prediction, which combines fuzzy inference in a hierarchical structure with attention (Nguyen et al. 2022). Deep learning combining fuzzy logic are used to recognize and predict breast cancer, and the results show that the model implemented by fuzzy logic not only improves the prediction accuracy, but also makes the prediction more stable (Jiang et al. 2021).

3.3.5 Fusion

Multi-modal image fusion techniques provide sufficient supplementary information from multi-modal medical images to provide a more comprehensive and clear description of the scene, so as to help medical experts make better disease diagnosis. For soft tissue sarcoma aided diagnosis, type-2 fuzzy logic is applied to fuse T1 with T2FS or STIR when dealing with low-frequency sub-images (Hermessi et al. 2019). Images were segmented by fast fuzzy C-means clustering algorithm and ostu threshold were fused using Siamese neural network and entropy-based image fusion algorithm. Experimental results on multi-modal medical images show that the proposed multi-modal image fusion technique has good performance (Vinnarasi et al. 2021). A survey has investigated some image fusion algorithms, based on morphological methods, human value system operator based methods, sub-band decomposition methods, neural network based methods, and fuzzy logic methods (Kumar and Sathish 2021).

4 Performance evaluation metrics

The evaluation index is to analyze the related algorithms or trained model based on fuzzy deep learning, and it is an indispensable process in technological innovation. By evaluating the experimental results from different aspects, it is better to understand advantages of the proposed algorithm and the performance of the model. We summarize the existing evaluation metrics of fuzzy deep learning for medical data and divide them into three categories based on different output types: classification-based evaluation metrics, segmentation-based evaluation metrics and fusion-based evaluation metrics.

4.1 Classification-based evaluation metrics

In the medical field, disease diagnosis and prediction (Dey et al. 2021; Sharma et al. 2021; Verma and Dubey 2021), tumor identification and classification (Asuntha and Srinivasan 2020; Krithiga and Geetha 2020; Wang et al. 2021), and the severity of a patient’s disease (Davoodi and Moradi 2018; Gangavarapu et al. 2019; Iraji 2019) all belong to classification problems which are the ultimate goals of medical research. Therefore, it is very necessary to analyze all aspects of the classification results.

4.1.1 Confusion matrix

Confusion matrix (Sokolova and Lapalme 2009) is a situation analysis table, which summarizes the prediction results of fuzzy deep learning. As shown in Table 6, The rows and columns of the matrix represent the true category and the predicted category, respectively, and the predictive values ‘a’ to ‘i’ are the output quantities of the different categories.

Table 6 Confusion matrix

As shown in Table 7, for classic binary classification problems, the output samples can be divided into four cases: true positive (TP), false positive (FP), true negative (TN) and false negative (FN). The following classification-based evaluation metrics will be explained based on binary classification problems.

Table 7 Confusion matrix based on binary classification

4.1.2 Accuracy

Accuracy (Sokolova and Lapalme 2009; Gangavarapu et al. 2019) is the ratio of the number of correctly predicted samples to the total number of samples in experiments. It’ s mathematical expression is:

$$ Accuracy = \frac{TP + TN}{{TP + FP + TN + FN}}.$$

However, accuracy does not apply in all cases. For the case that the number of sample categories is seriously unbalanced, it can’t be measured by the accuracy. For example, when the negative samples account for 99%, the classifier can obtain 99% accuracy by predicting all samples as negative samples. Therefore, we need to do a more in-depth analysis of the true and predicted values of the target variable.

4.1.3 Precision and recall

A number of evaluation metrics based on TP, FP, TN and FN have been proposed to better evaluate the working performance of the model. Their names, mathematical expressions, and the relationship between the metrics are shown in Table 8.

Table 8 Evaluation metrics based on confusion matrix

The most commonly used evaluation metrics in Table 8 are Precision (Sokolova and Lapalme 2009; Zhuang et al. 2020) and Recall (Sokolova and Lapalme 2009; Zhuang et al. 2020). Precision represents the proportion of all positive samples that are predicted to be correct, and Recall means the proportion of all positive samples that are predicted correctly to all actual positive samples. The larger their values, the better the predictive ability of the model. However, precision and recall are contradictory measures. Because if we want to improve the Precision, we can choose those samples with a high probability of being predicted as positive examples, others with a low probability of being predicted as negative examples. In this way, the number of FN will indeed increase significantly, so Recall decreases. Similarly, if we want higher Recall, we can count those samples with a relatively small probability of being predicted as a positive example, so that the number of FN is greatly reduced but the number of FP gets increased which makes Precision low. In general, if Precision is high, Recall tends to be low; if Recall is high, Precision is often low.

4.1.4 \(F_{\beta }\)Score

It is known that Precision and Recall influence each other. Ideally, we want both of them to be high. In order to balance these two metrics, we can use the weighted harmonic mean of Precision and Recall to measure, that is, \(F_{\beta }\) Score (Sokolova and Lapalme 2009), which is defined as:

$$ F_{\beta } = \frac{{(1 + \beta^{2} ) \times P \times R}}{{(\beta^{2} \times P) + R}}{,}$$

where \(P\) and \(R\) represent precision and recall, respectively. \(\beta\) measures the relative importance of the recall rate to the precision rate, and \(\beta > 0\). If \(\beta > 1\), then the recall rate has a greater impact, and if \(\beta < 1\), then the precision rate has a greater impact. Generally, the value of \(\beta\) is set as 1, which is \(F_{1}\) Score (Ramya et al. 2021):

$$ F_{1} = \frac{2 \times P \times R}{{P + R}}{,} $$

where the formulas of the \(P\) and \(R\) are:

$$ P = \frac{TP}{{TP + FP}},R = \frac{TP}{{TP + FN}}{.} $$

so \(F_{1}\) Score’s mathematical expression is also shown as:

$$ F_{1} = \frac{2TP}{{2TP + FP + FN}}{.} $$

4.1.5 Matthews correlation coefficient

The above evaluation metrics have a common defect which are asymmetric and only focus on some categories. For example, from the calculation formulas of Recall, Precision and \(F_{1}\) score, it can be seen that these metrics have nothing to do with TN, and they only care about the performance of positive classes and ignore the performance of negative classes.

The Matthews correlation coefficient (MCC) (Chicco and Jurman 2020; Ramya et al. 2021) combines the four basic assessment indicators in the confusion matrix, namely TP, FP, TN and FN. the MCC describes the correlation coefficient between the actual and predicted samples, with values ranging from [-1, 1]. If the value of MCC is 1, then the model predicts perfectly. If the value of MCC is 0, then the prediction result is poor or even not as good as the random prediction. If the value of MCC is -1, then the prediction result is extremely bad and basically avoids the correct answer. The formula is as follows:

$$ MCC = \frac{TP \times TN - FP \times FN}{{\sqrt {(TP + FP)(TP + FN)(TN + FP)(TN + FN)} }}{.} $$

4.1.6 Receiver operating characteristic and area under curve

The difference between Precision and Recall is that they use different cut points to divide some samples into positive examples and others into negative examples. How to dynamically reflect the prediction of the model for different cut points? Receiver Operating Characteristic (ROC) (Kim et al. 2019) gives a good answer.

The full name of ROC is Receiver Operating Characteristic and AUC is the abbreviation of Area Under Curve. That is to say, ROC is the curve, and AUC is the area value. The x-axis of the ROC is FPR and the y-axis is TPR. The definitions of FPR and TPR can refer to Table 8.

As shown in Fig. 9, the closer the ROC curve is to the upper left corner, the better the classification effect is; the closer the ROC curve is to the lower right corner, the worse the classification effect is. When the ROC curve of one learner completely covers the ROC curve of another learner, it can be said that the performance of the former is better than the latter. When the ROC curves of two learners intersect, we can compare the area under the ROC curve which is AUC. The larger the AUC, the better the performance of the learner.

Fig. 9
figure 9

ROC curve and AUC diagram

4.2 Segmentation-based evaluation metrics

This type of metrics can be divided into two aspects, the one is to focus on the similarity of the inner filling of the mask, and the other is to be sensitive to the boundary of the segmentation.

4.2.1 Pixel accuracy, intersection over union and dice similarity coefficient

As shown in Fig. 10, where “G” represents the ground truth, “A” represents the overlapping part, and “S” represents the part of image segmentation. This kind of metric is mainly evaluated by the degree of coincidence with the ground truth which includes pixel accuracy (PA) (Liu et al. 2020), Intersection Over Union (IoU) and Dice Similarity Coefficient (DSC).

Fig. 10
figure 10

Image comparison

Pixel accuracy (PA) (Liu et al. 2020) is a measure of segmentation effect through pixel-level comparisons and its mathematical expression is:

$$ PA = \frac{{\sum\nolimits_{i = 0}^{k} {p_{ii} } }}{{\sum\nolimits_{i = 0}^{k} {\sum\nolimits_{j = 0}^{k} {p_{ij} } } }}{,} $$

where \(k\) is the maximum pixel value, \(p_{ii}\) represents the number of pixels that divide the class \(i\) into the class \(i\) (the number of correctly classified pixels), and \(p_{ij}\) represents the number of pixels that divide the class \(i\) into the class \(j\) (the number of all pixels).

Intersection over Union (IoU) (Zhang et al. 2020a, b) is the ratio of the overlapping part of the two images to the aggregate part of the two images, and it is a standard measure of segmentation problems. Its mathematical expression is:

$$ IoU = \frac{{\left| {G \cap S} \right|}}{{\left| {G \cup S} \right|}}{.} $$

Dice Similarity Coefficient (DSC) (Chen et al. 2021) is the ratio of the overlap of the two pictures to the sum of the two pictures and multiplied by two and it is also an important measure for evaluation and validation in medical image segmentation (Crum et al. 2006) whose mathematical expression is:

$$ DSC = \frac{{2\left| {G \cap S} \right|}}{\left| G \right| + \left| S \right|}{.} $$

It is worth mentioning that these three evaluation indicators mentioned above can also be explained by confusion matrix (Bai et al. 2020; Li et al. 2021a, b, c). Details can be found in Table 9:

Table 9 Evaluation metrics of PA, IoU and DSC

4.2.2 Maximum surface distance and average symmetric surface distance

Maximum Surface Distance (MSD) (He et al. 2019) is also known as the Symmetric Hausdorff Distance (as shown in Fig. 11).

Fig. 11
figure 11

Maximum surface distance

It is defined as:

$$ MSD(G,S) = \max (d(G,S),d(S,G)), $$

where

$$ d(G,S) = \mathop {\max }\limits_{g \in G} \left\{ {\mathop {\min }\limits_{s \in S} \left\| {g - s} \right\|} \right\}{,} $$
$$ d(S,G) = \mathop {\max }\limits_{s \in S} \left\{ {\mathop {\min }\limits_{g \in G} \left\| {s - g} \right\|} \right\}{,} $$

\(g\) and \(s\) represent points of ground truth (G) and segmented image (S), respectively. \(\Vert \Vert \) represents the Euclidean distance between \(g\) and \(s\). If MSD is smaller, then the matching degree with the target image is better. The Average Symmetric Surface Distance (ASD) (Aydin et al. 2021) takes the average distance, which better reflects the overall situation. It is shown as:

$$ ASD(G,S) = \frac{{\sum\nolimits_{g \in G} {\min \left\| {g - S} \right\|} + \sum\nolimits_{s \in S} {\min \left\| {s - G} \right\|} }}{\left| G \right| + \left| S \right|}{,} $$

where | | represents the collection of image points.

4.3 Fusion-based evaluation metrics

The existing fusion-based evaluation metrics are divided into two classes, namely with reference image and without reference image (Li et al. 2017).

4.3.1 With reference image

This method refers to comparing the differences between the output image and the reference image when the ideal image is selected as the reference image, analyzing the distortion degree of the image to be evaluated, so as to obtain the quality evaluation of the image to be evaluated, such as Mean Square Error (MSE) (Vinnarasi et al. 2021), Root Mean Square Error (RMSE) (Zoran 2009), Peak Signal to Noise Ratio (PSNR) (Zoran 2009), Mutual Information (MI) (Zhang et al. 2010), Structural Similarity Index Measure (SSIM) (Wang et al. 2004). The details are shown as follows:

MSE is to calculate the average pixel difference between the output image and the reference image,

$$ MSE = ({1 \mathord{\left/ {\vphantom {1 {NM}}} \right. \kern-0pt} {NM}})\sum\nolimits_{n = 1}^{N} {\sum\nolimits_{m = 1}^{M} {\left[ {O(n,m) - R(n,m)} \right]} }^{2} {,} $$

where the images are of size \(M \times N\), \(O(n,m)\) is the pixel value at position \((n,m)\) in the output image \(O\). \(R(n,m)\) is the pixel value at position \((n,m)\) in the reference image \(R\).

RMSE is the arithmetic square root of calculating MSE,

$$ RMSE = \sqrt {MSE} = \, \sqrt {\left( {{1 \mathord{\left/ {\vphantom {1 {NM}}} \right. \kern-0pt} {NM}}} \right)\sum\nolimits_{n = 1}^{N} {\sum\nolimits_{m = 1}^{M} {\left[ {O(n,m) - R(n,m)} \right]} }^{2} } {.} $$

PSNR is the logarithmic transformation of MSE, where the \(MAX\) is the maximum value of an image pixel,

$$ PSNR = 10\log_{10} \frac{{MAX^{2} }}{MSE} = 10\log_{10} \left\{ {\frac{{MAX^{2} }}{{\left( {{1 \mathord{\left/ {\vphantom {1 {NM}}} \right. \kern-0pt} {NM}}} \right)\sum\nolimits_{n = 1}^{N} {\sum\nolimits_{m = 1}^{M} {\left[ {O(i,j) - R(i,j)} \right]} }^{2} }}} \right\}. $$

MI is applied to measure the similarity of image intensity between the output images and reference images,

$$ MI = \sum\limits_{i = 0}^{MAX - 1} {\sum\limits_{j = 0}^{MAX - 1} {P_{OR} \left( {i,j} \right)} } \log_{2} \frac{{P_{OR} \left( {i,j} \right)}}{{P_{O} \left( i \right)P_{R} \left( j \right)}}{,} $$

where \(P_{O} \left( i \right)\) is the probability that the pixel value \(i\) appears in the output image \(O\), \(P_{R} \left( j \right)\) is probability that the pixel value \(j\) appears in the reference image \(R\) and \(P_{OR} \left( {i,j} \right)\) is the probability of occurrence of the pixel value of \(i\) in the output image and the pixel value of \(j\) in the reference image.

It is worthy to note that these metrics are based on the global statistics of image pixel values. That is to say, these methods of calculating differences only regard the image as an isolated pixel point, and ignores some visual features contained in the image content, especially the local structure information of the image. To address this issue, structural similarity (SSIM) (Wang et al. 2004) is proposed, shown as:

$$ SSIM = \frac{{\left( {2\mu_{o} \mu_{R} + C_{1} } \right)\left( {2\sigma_{OR} + C_{2} } \right)}}{{\left( {\mu_{o}^{2} + \mu_{R}^{2} + C_{1} } \right)\left( {\sigma_{o}^{2} + \sigma_{R}^{2} + C_{2} } \right)}}{,} $$

where \(C\) is a constant, \(\mu\) is the image pixel mean and \(\sigma\) is the image pixel variance. SSIM can extract the structural information in the scene highly adaptively, and it measures the image similarity from three aspects of brightness, contrast and structure.

4.3.2 Without reference image

In general, since it is difficult to obtain ideal images, this kind of evaluation metrics that is completely independent of the reference images is widely used, such as standard deviation, Entropy (EN) (Kvalseth 1987), Standard Deviation (SD) (Huang and Jing 2007), Spatial Frequency (SF) (Yang et al. 2010), etc. The details are shown as follows:

EN refers to the average amount of information in an image. It measures the amount of information in an image from the perspective of information theory,

$$ EN = - \sum\limits_{i = 0}^{MAX - 1} {p_{i} \log_{2} } p_{i} {.} $$

SD is used to measure the contrast of the output image,

$$ SD = \sqrt {\frac{1}{MN}\sum\limits_{m = 1}^{M} {\sum\limits_{n = 1}^{N} {\left[ {O\left( {n,m} \right) - \mu } \right]^{2} } } } {,} $$

where the images are of size \(M \times N\), \(O\left( {n,m} \right)\) is the pixel value at position \(\left( {n,m} \right)\).

SF reflects the rate of change of image gray level,

$$ SF = \sqrt {RF^{2} { + }CF^{2} } {,} $$
$$ RF = \sqrt {\frac{1}{MN}\sum\limits_{m = 1}^{M} {\sum\limits_{n = 1}^{N} {\left[ {O\left( {n,m} \right) - O\left( {n + 1,m} \right)} \right]^{2} } } } , $$
$$ CF = \sqrt {\frac{1}{MN}\sum\limits_{m = 1}^{M} {\sum\limits_{n = 1}^{N} {\left[ {O\left( {n,m} \right) - O\left( {n,m + 1} \right)} \right]^{2} } } }. $$

5 Critical discussions and future research direction

5.1 Advantages of using fuzzy deep learning for uncertain medical data

  1. 1)

    Deep learning plays an important role and unleashes unprecedented energy in intelligent medical research. In terms of medical imaging, which is the most widely used application scenario, deep learning can make differences at three levels: the first level is to detect the lesion, that is the identification and segmentation of suspected lesions (Carvalho et al. 2020); the second level is to diagnose the lesion, so that to assist physicians to distinguish between benign and malignant diseases, conduct classification and staging, etc. (Teng et al. 2022); and the third level is to make treatment decisions, which is expected to help clinicians make scientific and reasonable treatment decisions and prognosis expectations through correlation analysis of imaging data and clinical data in the future (Tran et al. 2021). Compared with traditional medical imaging management technology, the medical imaging artificial intelligence system combined with deep learning is superior in high accuracy, high efficiency and high reliability, as well as reusability, portability and continuity, which are beyond the reach of human imaging doctors. As for electronic medical record, in addition to structured basic patient information, it also includes unstructured diagnostic information, e.g., medication information, examination information, clinical records, etc. Deep learning, with the ability to analyze large amounts of structured and unstructured data, realizes more rapid and effective usage of information from different sources. Through feature extraction and algorithm optimization, a large-scale analysis of a disease is carried out, so as to obtain comprehensive information such as etiology, incidence and medication effect. Moreover, deep learning is used to predict in-hospital mortality (Shickel et al. 2019), re-hospitalization (Jiang et al. 2018), prolonged stay and discharge diagnosis (Rajkomar et al. 2018) in a large number of heterogeneous electronic medical records. Continuous analysis can also be performed on the same patient at different time points. Therefore, it is of great significance to personal treatment, disease prediction and clinical diagnosis with deep learning.

  2. 2)

    Fuzzy deep learning shows great potential in handling uncertain medical data, especially in medical imaging and electronic medical records. The conventional medical images are divided into four types, e.g., X-ray imaging, Ultrasound imaging, CT imaging, and MRI. For these images, boundary point processing is one of the most important tasks in segmentation. However, in almost circumstances, the boundary of objective in image is vague and unclear (Lee et al. 2020). So that an image edge detection algorithm based on type-2 fuzzy logic is developed to detect blood vessels in retinal images (Orujov et al. 2020), and an AlexNet deep CNN with stochastic gradient descent is trained for soft tissue sarcoma classification, where low-frequency sub-image is processed by local energy and type-2 fuzzy entropy, and high-frequency sub-image is selected according to the maximum absolute value (Hermessi et al. 2019). The reasons that fuzzy logic achieves improvement in image segmentation are as follows (Huang et al. 2021): a) pyramid fuzzy block can find the uncertain pixels and reduce their weights; b) fuzzy entropy can accurately measure the uncertainty of pixels. Furthermore, the inevitable data ambiguity and a lot of noise easily lead to unpredictable uncertainty, which is a key problem in medical data processing. To overcome the uncertainties among the original data, fuzzy learning including fuzzy clustering and fuzzy neural network is established, achieving favorable performance in a few-shot learning manner and improving the cross-dataset generalization ability in classification and disease grading missions.

  3. 3)

    Fuzzy deep learning improves interpretability of the learning construction, which is very crucial for medical applications. Because of the embedding of fuzzy mathematics, the interpretability of the whole model is further enhanced (Shen et al. 2020). When managing multi-modal medical image fusion and classification, fuzzy set theory can be adopted to remove uncertainty (Hermessi et al. 2019). In terms of text information, medical individual judgment is described through fuzzy linguistic variable, and then expressed in value of membership degree in neural network model (Li et al. 2021a, b, c). Furthermore, a system, combining fuzzy rule with deep learning, is capable to deal with big data with heterogeneous mixed categories and numeric attributes in electronic health records, where the hidden layers in each unit are represented by interpretable fuzzy rules (Davoodi and Moradi 2018). In a nutshell, fuzzy logic is used to reduce the uncertainties and improve the interpretability of deep learning, meanwhile, hierarchical information extraction in deep learning is used to reduce noise in the raw data.

5.2 Challenges of using fuzzy deep learning for uncertain medical data

Although fuzzy deep learning shows significant function in handling uncertain medical data, there are some challenges which need further consideration.

  1. 1)

    Membership function in fuzzy deep learning is often preset and obtained from domain knowledge of experts, which is full of subjectivity but does not suit in different circumstances. The membership function, used to depict the features extracted from uncertain medical data. For example, a designed fuzzy logic system has four input membership functions and one output membership function (Yalcinkaya and Erbas 2021), but fuzzy logic and membership functions are both defined in advance, and only a membership function is produced from the graphical representation of fuzzy variable sets. Similarly, fuzzy neural networks consist of an input layer that fuzzes input features into membership functions utilizing premise parameters, several hidden layers representing fuzzy rules, and an output layer as a defuzzification operator (Shen et al. 2020). As we can see, the membership function in fuzzy deep learning is usually premise parameters or gained from domain knowledge of experts. If the membership function can be obtained from medical data by deep learning, then as a type of features extracted from data and being a right one for particular data, the fuzzy learning results will be more precise and be full of robustness.

  2. 2)

    Fuzzy deep learning increases the complexity of computation compared with traditional deep learning. Deep learning already has a complex architecture, and fuzzy systems will bring more parameters to be adjusted in the learning structure. Therefore, computational cost will be increased because of the complicated structure and learning style. For example, membership functions lead to more complicated algorithm, such as LSFC (Guan et al. 2020). ANFIS is a typical model that integrates deep learning and fuzzy systems, which also faces some challenges (Ciftcioglu et al. 2007; Salleh et al. 2017). In a word, membership function can lead to uncertain in computational complexity, and fuzzy deep learning may result in complex modeling structure.

  3. 3)

    Fuzzy deep learning cannot handle all uncertain medical data in some special or extreme circumstances, such as outlier or novelty. A data item that is outlying, sometimes due to noise or error, but sometimes because of something really unusual happening (Liu et al. 2018; Li et al. 2021a, b, c). On the one hand, although fuzzy deep learning can extract features from massive medical data, some outliers or novelty, are exceptional cases in medical circumstance. And the number of these cases is not large, so that outliers are hard to learn from deep learning, let alone fuzzy deep learning. On the other hand, the outliers, that are special medical cases or medical images, have been deleted during the learning process. So that fuzzy deep learning cannot handle all uncertain medical images or text. If the amount of valid data is small, deep learning generally exhibit undesirable performance because of overfitting. The features need learnt and extracted by models combined with few-shot learning.

5.3 Future research directions

According to the above critical discussions on advantages and challenges of fuzzy deep learning on uncertain medical data, some ideas and directions can be analyzed and given for future research.

  1. 1)

    Develop fuzzy deep learning in drug selection and research, disease treatment decisions, and genetic data processing. With the study of particular diseases more thorough and the gradually improved accuracy in deep learning model, the future is expected to establish a platform for various diseases of predictive diagnostics and treatment decision-making, starting from the actual needs of hospitals and patients, and based on the integrated information of disease analysis and individual information. The platform can also provide some useful information on how to select drugs for clinical therapy. Furthermore, genetic testing based on next-generation gene sequencing technology (NGS) has revolutionized clinical molecular testing. And medical data supported by gene big data will make drug research and development more accurate (Yang et al. 2021).

  2. 2)

    Combine deep learning and complex cognition information, and apply the decision-making theory based on complex cognition in intelligent medical assisted diagnosis. Decision-making theory based on complex cognition is a frontier research direction in the field of intelligent decision-making, and intelligent medical decision-making in complex cognition environment is a vital research topic. In the process of early disease screening and diagnosis, there is a series of complex uncertain information, such as unstructured data in medical text information and medical images, multi-source heterogeneous data, high-dimensional missing data, and blurred boundary information. As a new and feasible representation, complex cognition theory can not only depict objective uncertain information of things, in the form of intuitionistic fuzzy set (Jiang, Jin et al.), hesitant fuzzy set (Wang et al. 2022a, b, c), probabilistic hesitant fuzzy set (Zhang et al. 2017), etc., but also describe subjective fuzzy information of decision makers, in the form of hesitant fuzzy linguistic term set (Zheng et al. 2018; Zheng et al. 2021a, b), hesitant fuzzy linguistic term set with granularity (Zheng et al. 2022a, b, c; Zheng et al. 2022a, b, c), probabilistic linguistic term set (Li et al. 2020a, b; Zheng et al. 2021a, b), etc., which can accurately and comprehensively deal with the uncertain information and fuzzy information in the process of medical diagnosis decision-making.

  3. 3)

    Improve health management of Traditional Chinese Medical with fuzzy deep learning. Traditional Chinese Medical health management (Song et al. 2018) uses the core idea of cure not ill, overall concept, and syndrome differentiation, to provide healthy people or sub-health population with Traditional Chinese Medical health consultation, guidance, diagnosis and treatment. The effect of Traditional Chinese medical has been scientifically proven and makes great differences during the treatment process of COVID-19 (Li et al. 2020a, b). Meanwhile, the National Health Commission and the State Administration of Traditional Chinese Medical issued “Notice on the issuance of guidance suggestions on Traditional Chinese Medical rehabilitation in the convalescent period of COVID-19” (https://www.cn-healthcare.com/articlewm/20200225/content-1090467.html?appfrom=jkj). It is worthy to note that most of the diagnosis information in health management of Traditional Chinese Medical is language description and full of individual features. Artificial intelligence technology and fuzzy linguistic model helps the judgment of Traditional Chinese Medical constitutional type (Li et al. 2021a, b, c). It can be expected that fuzzy deep learning will makes more differences in health management of Traditional Chinese Medical.

  4. 4)

    Propose some novel methods combining the most advanced fuzzy tools and deep learning to deal with few-shot learning. The regularization parameters are set by fuzzy logic method to improve the performance of the model, and the segmentation and classification of CT images are realized by automatic feature extraction. To better deal with few-shot learning such as low-quality images and fuzzy boundaries, it is a good idea to combine deep learning with some other fuzzy tools such as hesitant fuzzy information (Mo et al. 2020), hesitant fuzzy linguistic information (Zheng et al. 2022a, b, c), and probabilistic hesitant fuzzy information (Zhu and Xu 2018), probabilistic linguistic information (Pang et al. 2016), etc. Compared with traditional fuzzy techniques, e.g., fuzzy set (Song et al. 2022a, b) or type-2 fuzzy set (Shen et al. 2020), the state-of-the-art fuzzy tools can not only depict uncertainty and ambiguity of medical images using membership degree, but also describe the membership degree of uncertainty in the process of describing image fuzzy information. So that novel methods, combining state-of-the-art fuzzy tools and deep learning, can obtain more accurate image analysis results, enhance rationality and precision, and effectiveness of the medical auxiliary diagnosis intelligent decision-making method, save decision time, and reduce the secondary damage caused by decision-making errors.

6 Conclusions

This paper presents a comprehensive and critical review of fuzzy deep learning for uncertain medical data. The main contributions of the paper are listed as follows: 1) constructing four types of frameworks of fuzzy deep learning models used for uncertain medical data; 2) making a survey of fuzzy deep learning for uncertain medical data, including widely-used fuzzy deep learning models, uncertain medical data and application scenarios; 3) exhibiting evaluation metrics considering classification, segmentation and fusion; 4) providing some critical discussions on advantages, challenges and future research directions of fuzzy deep learning for uncertain medical data. Based on the analysis of recent research results, the following conclusions are drawn that fuzzy CNN, fuzzy DNN, fuzzy LSTM, fuzzy AE, fuzzy system, and other neuro-fuzzy approaches are widely used to handle different types of uncertain medical data including images, text records, video files, high-throughput data, and tabular data, for segmentation, classification, natural language processing, prediction and fusion. Then, performance evaluation metrics of fuzzy deep learning models are analyzed in details from three perspectives: classification-based evaluation metrics, segmentation-based evaluation metrics, and fusion-base evaluation metrics. Finally, some critical reviews are provided, including advantages, challenges and future research directions of fuzzy deep learning for uncertain medical data, respectively. We have found that fuzzy deep learning indeed improves the interpretability, low-quality images, boundary point processing, imprecise text records processing and multi-source heterogenous data fusion in medical domain, but some challenges still exist and future research directions are provided from the perspectives of genetic data processing, intelligent medical decision-making based on complex cognition information, health management of Traditional Chinese Medical and few-shot learning.

To sum up, fuzzy deep learning has successfully handled uncertain medical data. With the development of artificial intelligence and fuzzy systems, medical diagnosis and treatment will become more intelligent, and some deeper and more extensive researches on fuzzy deep learning for uncertain medical data will be an important topic for years to come.