1. Introduction
Diabetic retinopathy (DR) is a visual manifestation of diabetes that also leads to blindness. According to the WHO report on diabetes, about 415 million people suffer from diabetes mellitus. This disease’s occurrence has doubled in the last three decades. People with diabetes between the ages of 20 and 74 may have blindness due to diabetic retinopathy. Many studies highlighted that an early diagnosis could save 90% of diabetic patients from this disease. Diabetic people have more risk of developing diabetic retinopathy (DR) [
1]. Microvascular tissue receives blood supply from the body through the blood vasculature, similar to other body tissue. In addition, the retinal tissue absorbs the blood via microscopic blood vessels and maintains the blood glucose level for the continuous flow of blood. Microscopic blood vessels begin to crash while glucose or carbohydrates are gathered in the blood due to inadequate oxygen distribution into the cells. Any obstruction in these vessels results in serious injury to the retina.
Consequently, the microvascular failure to provide the retina with nutrients as normal leading to ischemia or reduced blood flow [
2].
Figure 1 illustrates the difference between a normal and diabetic retina.
Figure 1a depicts a normal retina free of DR signs. Meanwhile,
Figure 1b illustrates a retina with numerous DR symptoms, including hemorrhages, microaneurysm, cotton wool patches, and exudates.
In the last several years, Computer-aided diagnostic (CAD) advancements have led to the development of automated approaches for detecting and grading DR using fundus images [
3]. Segmentation of blood vessels (BVs), optic discs (ODs), and lesions from fundus images are among the significant problems in developing a CAD system for DR detection [
4]. Traditional ML approaches are only practical when the handcrafted characteristics are carefully selected [
5]. In order to generalize this feature extraction as well as the selection method, it is very time-consuming and complicated. DL (Deep learning) is a powerful tool for automatically extracting features from images, and it has lately shown impressive accuracy in categorizing medical images [
6]. Based on the dataset size, DNN (Deep Neural Network) architectures, parameter tuning, DR detection, and grading systems have varying levels of effectiveness. In addition, the total performance is determined by the efficacy of several steps, such as OD extraction, BV segmentation, as well as DR classification.
A new two-stage approach was developed to discover bright lesions in retinal fundus images [
7]. For DR diagnosis, an intelligent decision support model has been used, implementing the texture and color characteristics to distinguish between exudate and non-exudate pixels. First, for OD segmentation, edge detection, as well as modulation operations are used. Second, periodic energy and color measurements are performed to gather tissue characteristics from the retinal region. Next, for the classification, a fuzzy SVM intelligent classifier is used. In [
8], an automatic DSS is designed to detect microaneurysms and hemorrhages. The strength of the DR depends on the position and amount of microaneurysm and hemorrhage. The technique was tested on 98 fundus images. Similarly, experimental results showed that 87.53% and 84.31% sensitivity as well as 95.08% and 93.63% specificity were reported throughout the proposed system to detect hemorrhages as well as microaneurysms, respectively. Numerous research projects have used OD segmentation for DR detection. In [
9], OD, as well as the blood vessel segmentation-based approach for detecting DR, are discussed. Watershed transform and RBFNN were used to classify OD and DR, respectively, with a sensitivity of 87% and specificity of 93%.
In [
9], Wang acquainted a profound learning framework with distinguishing optic disc in the DR. The author constructed a CCN structure zeroed in on the U-net model for the explicit acknowledgement of optical plates. In this task, CNNs autonomously processed the dark and shaded retinal fundus pictures to acquire different division outcomes. Using the U-net framework, the creator represented a settled procedure to recognize nearby images utilized for an additional division. In [
10], the creators direct OD and OC ID by figuring OD and OC’s sectioned area. Watershed change and morphological separating strategies are utilized to discover and distinguish OD. The technique was checked with 30 separate shading retinal fundus photographs, acquired affectability, and a mean prescient execution of 92.8 and 92.4% [
11]. In [
12], the authors used a multi-model to segment MRI tumor images based on ML, DL and conventional methods. Moreover, in [
13], the authors modify the basic U-net model using residual and identity mapping for the segmentation of microaneurysms. The Harris corner finder was utilized to find the OD center as well as accomplish 97.8% for the nearby dataset, 97.5% for the DRIVE and STARE datasets, as well as 86.75% for the DRIVE and STARE datasets, separately [
14].
The optic disc density in certain fundus pictures is brighter over the background; hence, inaccurate optic disc segmentation can lead to incorrect identification of light lesions, such as MA as well as HM. In contrast, the blood vessel strengths are darker, which grounds junctures by EX [
15]. To overcome such limitations, a novel architecture for DR detection classification is proposed in this study. Data were preprocessed and augmented in the first phase using novel Gaussian space scale theory and some other general augmentation settings. During the preprocessing, we used two different U-Net models for segmenting the optic disc as well as blood vessels. The notable features were then retrieved using the hybrid CNN-SVD approach, which combines a DCNN model with SVD (Singular Value Decomposition). Finally, transfer learning (TL) based Inception-V3, GoogLeNet, AlexNet, and ResNet models are used to classify fundus images. From three publicly available datasets, 11841 retinal fundus images have been used to train the proposed method (Messidor-2, EyePACS-1, and DIARETDB0). Finally, the model’s effectiveness is assessed using precision, F1-score, sensitivity, accuracy, specificity, and AUC (Area under the Curve). The following are the major finding of the planned investigation:
Proposed two-stage novel classification system for diabetic retinopathy.
After preprocessing and data augmentation, we used two independent U-Net models to segment the optic disc and blood vessels.
The features from the fundus images were extracted using a novel hybrid CNN-SVD created in this study. In total, 256 features were extracted from processed FIs using CNN. Then, by selecting the most significant characteristics, SVD reduces these features to 100, reducing the model complexity and enhancing model performance.
The transfer learning (TL)-based Inception-V3 model is used to classify fundus images.
A comprehensive DR categorization system is created that is accurate, dependable, and intuitive.
Three large public datasets were used to test the model’s performance: Messidor-2, EyePACS-1, and DIARETDB0.
The diagnostic capabilities of the proposed model are verified using several performance metrics, such as precision, F1-score, sensitivity, accuracy, specificity, as well as AUC.
The organization of the next part of the paper is as follows: the current research status is given in
Section 2. The datasets used in this study are described in
Section 3. The proposed methodology is presented in
Section 4. The results and discussions are explained in
Section 5, and the study is concluded in
Section 6.
2. Current Research Status
In recent decades, retinal image investigation has received a lot of attention; additionally, the automated diagnosis of diabetic retinopathy has attracted significant attention [
16]. The following sections will briefly overview the most popular methods for detecting the major DR characteristics (hemorrhages, microaneurysms, and exudates).
2.1. DR Detection Based on Classical Machine Learning
Researchers have established numerous predictive models based on machine learning to help ophthalmologists detect and classify diabetic retinopathy during the last several years. Much work has been performed on the early diagnosis of diabetic retinopathy and multi-stage DR classification using handcrafted feature extraction. Microaneurysms detection is vital for early DR diagnosis. The K-nearest Neighbor Classifier (KNN) detect Microaneurysms in [
17]. For MA detection in fundus images, morphological operators are introduced. The method achieved 81.61% sensitivity, 99.99% specificity, and 63.76% precision [
18]. Deep learnable features are extracted from retinal fundus images, and ResNet-50 with SVM was used to detect exudates. The author summarizes different methods of CNN and applies efficient methods to differentiate exudates. The obtained results showed that the reported method’s accuracy is 98% [
19].
Exudate, the bright white structure on the retina, is one of the most critical indicators of DR. Exudate detection methods have been divided into three groups based on previous research: ML (machine learning), morphology (mathematics), and pixel-based. In [
20], before detecting exudate, a matching filter was used to exclude vessel and optic disc. A random forest algorithm was employed to locate the exudate area using the saliency map, and it was shown to be 79% efficient. To identify exudates, [
21] used mathematical morphology as well as SVM classifiers. Using a private dataset, the author assessed the classifier’s accuracy and found that it had an AUC of 95%. For exudate detection, a number of ML-based algorithms have demonstrated significant results [
22].
In [
23], the authors proposed the automated red lesion detection strategy in diabetic retinopathy, adding hemorrhages and microaneurysms employing optical retinal images. Frangi-based filtering was conducted for the identification of blood vessels in this approach. In the initial phase, the input picture was preprocessed to disintegrate into little sub-pictures, and filtration was applied to each sub-image. The clean characteristics were fed into SVM for further classification of input images and whether images had an injury. The investigations were done separately on 143 fundus photos and acquired exactness for microaneurysms and hemorrhages of 97% and 87%. The literature shows that combining high-level and low-level features may improve diagnostic accuracy. Moreover, in [
24], the authors use the SVM voting methods to detect and classify bright and res lesions using the IDRiD dataset.
ML techniques for identifying and classifying diabetic retinopathy identify three main challenges. First, the retrieved handcrafted features need to be verified by ophthalmologists based on the subjectivity of the expert, which takes time and will raise the retinal expert load. Second, baseline methods are limited in generalization and robustness, since most studies are trained on lesser training data. Third, clinical signs for diabetic retinopathy are ambiguous, and the size of the blood vessels in the retinal image is substandard for expert graders. As a result, extracting DR indicating retinal biomarkers from fundus pictures is difficult. Subsequently, deep learning models automate the feature extraction as well as the classification process. The following section covers the details of different DL models used for DR classification.
2.2. DR Detection Based on Deep Learning
Deep learning methods are widely used to solve various medical image analysis issues while avoiding the drawbacks of conventional ML methods. In contrast to ML models, deep learning models quickly discover elevated features using retinal images without the involvement of hominoids. Deep learning models were developed for lesion detection with patch image classification [
25]. Throughout this procedure, 243 retinal images, verified by ophthalmologists, were tested. The Kaggle input dataset is split into image patches, including microaneurysms, hemorrhages, exudates, as well as the normal retina structure. Authors are using CNNs to detect as well as classify lesions into five grades. A collection-based framework to improve microaneurysm detection has been identified [
26]. The findings of [
27,
28] are particularly noteworthy because they represent the culmination of a thorough systematic review and meta-analysis. The first paper used the EyePACS dataset of 35,126 images to train a VGG16 CNN with binary-cross entropy as the loss function. They also ran two experiments in which they combined the VGG16 to a linear Support Vector Machine (SVM) as well as a softmax function with an output fully connected layer, yielding the maximum specificity and sensitivity using SVM approach: 93.0% and 85%, respectively. In the second paper, a CNN was combined not only with an SVM and with Teaching Learning Based Optimization (TLBO). Results from a binary classification experiment using the same EyePACS dataset yielded a specificity of 90.89%, accuracy of 91.05%, and sensitivity of 89.30%.
In [
29], Lončarić S. and Prentašić P. proposed the automated exudate detection technique based on DCNN. In the proposed exudate extraction method, CNN was used for feature extraction, and the SVM classifier was used for classification. The grayscale morphological procedures were applied in particular areas, and the vigorous curve model was also used to identify the exudate boundaries. After this, the Naïve Bayes classifier was adapted to the area-wise classification of exudates [
30]. In [
31], the authors proposed U-net and transfer learning base models to classify DR. A convolution neural network based on retinal image performance evaluation methodology was developed [
32]. In addition, the recorded method was focused on saliency maps to gather unsupervised information to make decisions on retinal quality images. The saliency maps are gathered locally as well as global information on retinal images at various scales for each pixel. Reference [
33] describes a method for detecting diabetic retinopathy utilizing shallow convolutional networks, with 85% accuracy using 35,000 images. In [
34], the authors used the evolutionary algorithm grey wolf optimization with CNN for the classification of DR, and later on in [
35], they improved the GWO for binary classification of DR. The authors of [
3] presented a trained lightweight complexity CNN to 768 FIs, yielding an accuracy of 88.4%. A weighted path CNN was proposed in [
36] for binary classification employing the STRAE dataset with 60,000 images, yielding 90.84% accuracy and a 93.4% F1 score.
Deep learning models often utilize several image patches to perform image-wise classification either via the integration of short-distance dependencies or other ensemble approaches, including SVM or majority voting. All such approaches disregard long-distance dependencies. In addition, the other ensemble methods typically pool the final entirely connected layer into a one-dimensional feature vector, which is unsuitable for patch-wise synthesis. To address these constraints, we presented new techniques for addressing annotated data insufficiency issues, selecting a relevant area of interest, and improving the classification performance.
4. Methodology
This paper presents a novel methodology for detecting diabetic retinopathy (DR) using retinal fundus images. Three publicly available datasets, Messidor-2, EyePACS-1, and DIARETDB0, are used. The quality FIs are enhanced by preprocessing techniques, such as image scaling, GCE (Green Channel Extraction), as well as top-bottom hat transformation. In addition, two different U-Net models are presented for extracting the OD as well as BVs from the enhanced FIs during preprocessing to counteract the influence of retinal biomarkers in DR detection. The improved image is attained after performing preprocessing techniques and extracting the OD as well as the blood vessel. Later, a hybrid novel CNN-SVD model was created after preprocessing for feature extraction as well as choosing the most suitable ones. Finally, the improved Inception-V3 model based on transfer learning is used to diagnose DR using an improved image dataset. Sensitivity, accuracy, precision, F1-score, specificity, and area under the curve are among the performance metrics used to evaluate the suggested approach (AUC).
Figure 4 depicts a flowchart of the proposed methodology.
4.1. Preprocessing and Data Augmentation
4.1.1. Preprocessing
Messidor-2, DIARETDB0, and EyePACS-1 were utilized to test the efficiency of the proposed technique. This research considers 10,966 retinal fundus pictures (Messidor2-1748, EyePACS1-9088, and DIARETDB0-130 images).
Table 1 shows the distribution of EyePACS-1, DIARETDB0, and Messidor-2 datasets. It is possible that the performance of a deep learning model can be affected by the differing FI size. To solve this problem, we scaled all images to the same size (256 × 256). Directly resizing FIs is also difficult due to the possibility that significant blood vessels, as well as the optic disc can vanish. The retinal FIs were resized using a bicubic interpolation process that maintained the perspective ratio. As demonstrated in
Figure 5, the FIs green channel carries more information than the red and blue channels, making it an ideal choice for our investigation. A top-to-bottom hat transformation improves the quality of the retinal image.
Figure 6 depicts the use of several preprocessing steps.
4.1.2. Data Augmentation
The size of the training dataset is one of the most critical factors in the effective processing of DL models. As a result, deep learning network training requires an extensive dataset to avoid overfitting and generalization difficulties. The dataset distribution across classes is considerably skewed, with most images coming from grade 0 (Normal). This extremely skewed dataset could lead to incorrect classification. We used data augmentation techniques to magnify the retinal dataset at multiple sizes and remove noise from fundus images. The primary data augmentation processes we performed are listed below.
Rotation: Images were rotated from 0 to 360 degrees at random.
Shearing: Sheared at a random angle ranging from 20 to 200 degrees.
Image flipping: Images were flipped horizontally and vertically.
Zoom: Images were randomly stretched in the (1/1.3, 1.3) range.
Cropping: At random, images were shrunk to 85–95% of their original length.
Image translation: Images were randomly moved between −25 and 25 pixels.
Several examples of postaugmentation images are shown in
Figure 7.
4.2. Optic Disc (OD) and Blood Vessel (BV) Segmentation
The variable concentration level of FIs may hamper the extraction of specific biomarkers. The OD is more intense than the background, which might lead to incorrect segmentation of the OD and distressing the bright lesions, such as microaneurysms as well as hemorrhages. The BVs concentration is darker than the backdrop, which might lead to an exudate junction. We developed two separate models of U-Net for the segmentation of OD as well as BVs to overcome the retinal biomarkers effect. U-Net employs a basic convolutional neural network (CNN) for biomedical image segmentation. U-Net is a distinct alternative to the standard CNN for disease detection and abnormality localization in biomedical image segmentation. A small training database has revealed significant effectiveness for various biological image segmentation tasks [
40,
41]. U-Net design is shaped like a U and consists of two pathways: contracting and expensive paths. Convolution, ReLU, and pooling procedures are performed on the contracted route, while the up-sampling approach is expensive. The suggested model resizes the original image by preserving the aspect ratio to convey the exact contextual information of FIs.
Figure 8a,b show the architecture of the fundamental U-Net model for segmenting the OD and BVs, respectively.
4.3. Dimensionality Reduction Using CNN-SVD
When a dataset has many features, some of them have minimal contribution to predicting the target variable or generating redundant data. The feature space strongly influences a classifier’s performance. The blasphemy of dimensionality is the name given to this phenomenon. Dimensionality reduction methods must be used to lower the complexity and time cost. It reduces the original feature region to a bare minimum that can retain the nonredundant data without considerable loss [
42]. Principal element analysis, linear discriminant analysis, and singular value decomposition (SVD) are a few well-known approaches for this purpose. CNN was utilized to extract features in this study at first. The features were normalized once they were extracted. Finally, dimensionality reduction was achieved using SVD.
4.3.1. Feature Extraction by CNN from FIs
A basic CNN has been anticipated in this part to obtain the maximum notable features of FIs. If the essential characteristics that discriminate among the several DR phases are extracted, the model classification performance will be improved. As a result, a simple CNN model was employed.
Figure 9 depicts the arrangement of the CNN feature extractor.
These derived characteristics can be effectively used to classify DR stages. Batch normalization and max-pooling layers have been applied to each CNN convolutional layer (CL). Batch normalization was used since it speeds up and enhances the model’s performance by re-centering as well as re-scaling the inputs of the layers [
43]. Max-polling is utilized to extract the essential features from the processed FIs by selecting the most significant value from each neuron in a cluster [
44,
45]. In this example, throughout the training phase, dropout prevents overfitting by often bypassing entire training nodes in each layer; this significantly speeds up the training process. Adam was chosen as an optimizer because of their excellent performance when working with large amounts of data [
46]. The final dense layer was then utilized to extract 256 discriminating features from every FI.
4.3.2. Features Reduction by SVD
This approach is based on the basic notion of FFT (Fast Fourier Transform). Mathematically, a matrix
can be factored into three different matrices, such as
. Every matrix will have a unique factorization. There is a single matrix that contains
and
. In this case,
stands for R complex conjugate. For a real-valued matrix,
W is termed a unitary matrix if
for the matrix W. Let
be the diagonal matrix with descendingly positive-valued diagonal entries and zero off-diagonal elements. The matrix A has the same number of positive-valued diagonal members. Put another way, a matrix’s rank indicates how many columns or rows are linearly independent. Like the FFT,
may be stated in a series form. Let us imagine we have three matrices,
,
, and
. Afterwards, they will be multiplied:
Hence, PQR* can be written as:
In order to transform D into an ideal lower rank approximation, SVD selects the higher-valued components of Q larger than a given value from Q.
4.4. Transfer Learning Models
It is usually unwise to execute CNN classification with a small biomedical dataset as well as train the network from the beginning. TL (Transfer learning) models are commonly used for biomedical image categorization to overcome these constraints. Transfer learning-based models can also be used to transfer knowledge from one task to another similar task, such as Inception-V3 [
47], GoogLeNet [
48], AexNet [
49], and ResNet [
50]. It has been trained using the ImageNet [
51] dataset, which contains over fourteen million images of one thousand categories. TL aims to improve the network’s performance regardless of its target dataset. Cataract identification [
52], breast cancer classification [
53], glaucoma diagnosis [
54], and diabetic retinopathy detection [
31] have demonstrated substantial performance for TL-based models in several medical image classification tasks.
Inception-V3, GoogLeNet, AlexNet, and ResNet models based on transfer learning were proposed in this study for the diagnosis of four classes of diabetic retinopathy: normal, mild, moderate, and severe.
4.4.1. Standard Classifiers
Inception-V3 is a deep neural network (DNN) that can classify 1000 different objects [
47]. The model is trained on images from a wide variety, and the model may be retrained for a smaller dataset while keeping the training information. This advantage of the Inception-V3 CNN model is that it eliminates the need for practical training, resulting in improved classification accuracy and reduced processing time. It is the primary objective of the Inception-V3 network to eliminate the limiting representation of subsequent network layers, which significantly reduces the input size of the subsequent layer. The factorization technique is used to lower the computational complexity of the network.
Google Net [
48] is also termed as Inception-V1 topology. It is from Google and based on LeNet’s conception component. It was the winner of the ILSVRC-2014 Challenge. Google Net is a 22-layer deep neural network trained using the ImageNet, with one thousand item classifications. The highest error rate on Google Net is 6.67%, which is highly similar to human competence (5.1%). The network comprises pooling layers, convolution layers, rectified linear (Relu) layers, as well as fully connected layers.
AlexNet [
49], created by Alex Krizhevsky, achieved the ImageNet 2012 Large Scale Visual Recognition Challenge with a top-five error rate of 0.1530. It has three fully-connected layers and five convolutional layers, with the Relu-activation function implemented after each convolutional and fully connected layer. Before the first two completely associated layers, a dropout value of 0.5 is employed. ImageNet is used to train the network with 100 distinct categories (1000-way softmax).
Kaiming et al. were awarded the ILSVRC-2015 Challenge with its Residual Neural Network (ResNet) [
50]. Similar to existing RNN components, this new design is built on batch normalization and skips connections (grated recurrent units). ResNet has 152 layers and produces a top-5 error of 0.357, outperforming human performance.
4.4.2. Experimental Configuration
The proposed model’s PC hardware environment includes an E5-2609 CPU, 32GB RAM, and a Quadro K620 GPU. The model is implemented using the open-source Python package Keras with the Tensor flow. ADAM optimizers with a Categorical cross-entropy loss function are used to train the proposed model for 100 epochs. Other settings are 0.9 momentum, 64 batch size, 0.01 learning rate, and 0.005 weight decay.
Table 2 shows the various hyperparameter setups.
The loss value was estimated using the loss function categorical cross-entropy. The possibility of activations throughout the output layer, as well as the target class, is used to calculate this loss function. To represent the value of absolute cross entropy-loss in mathematical terms, the following formula can be used:
This is a comparison of the predicted vs actual data distributions, denoted by . is the actual value, whereas is the projected value, with N and M being the sample and label counts, respectively. Each label’s loss function is computed independently and aggregated for all N classes.
4.5. Performance Evaluation Metrics
The recognition of diabetic retinopathy in the initial stage utilizing the fundus camera’s automatic retinal images needs basic preprocessing techniques before developing image dispensation algorithms. Multiple preprocessing methods such as contrast modification, standard strain, adaptive histogram equalization, homomorphism, and middle sifting are implemented to preprocess the retinal fundus images dataset. After deploying the algorithmic approach for retinal descriptions, the mean square error (MSE) and the hit the highest check the main level to noise ratio (PSNR) were calculated to test the algorithmic technique’s functionality. The PSNR is considered logarithmic decibel worth. The advanced PSNR value determines that the manipulated image is more significant than the original picture.
The statistics used in medical care are generally classified into two kinds; the first is related to disease information and the other to disease-free data. Understanding and specificity assessments estimate the height of rightness for behavior. Each image’s sympathy computes the digital fundus picture in diabetic retinopathy, and specificity in the medicinal science investigates the field. The true negative value (TN) indicates the non-lesion pixels, and the true positive (TP) determines the lesion pixels based on fundus pictures. On the other side, a false negative (FN) indicates the lesion pixels skipped with the algorithmic move, and a false positive (FP) suggests the number of non-lesion pixels mistakenly followed by the algorithmic rules [
55,
56]. The performance of the proposed methodology was measured using sensitivity, specificity, accuracy, F-1 score, precision, and AUC.
5. Results and Discussion
Two separate U-Nets are employed in the preprocessing step for the segmentation of the optic disc as well as blood vessels. The improved image that is the result of the preprocessing steps is fed into the CNN models that are based on transfer learning. The suggested model is tested on three publicly available fundus image datasets: Messidor-2, EyePACS-1, and DIARETDB0. Accuracy, sensitivity, precision, specificity, F1-score, as well as Area under the Curve (AUC) are performance indicators used to assess the effectiveness of the proposed approach model.
Table 3,
Table 4 and
Table 5 enlist the TL-based model’s performance of for Messidor-2, EyePACS-1, and DIARETDB0, respectively. According to
Table 3,
Table 4 and
Table 5, the Inception-V3 model obtained an average accuracy of 94.59%, 97.92%, and 93.52%, respectively, when tested on the Messidor-2, EyePACS-1, and DIARETDB0 datasets.
Table 3 shows the EyePACS-1 findings, which show that Inception-V3, GoogLeNet, AlexNet, and ResNet are all 97.92%, 96.15%, 95.70%, and 96.90% accurate, respectively. For Inception-V3, GoogLeNet, AlexNet, and ResNet, respectively, the suggested models obtained 94.59%, 93.75%, 93.15%, and 94% on the Messidor-2 dataset, as shown in
Table 4.
According to
Table 5, the accuracy value of DIARETDB0 for Inception-V3 is 93.52%, whereas the accuracy values of GoogLeNet, AlexNet, and ResNet are 92.05%, 91.30%, and 92.45%, respectively. Because of the poor resolution of the retinal pictures and the small number of training examples, DR classification in DIARETDB0 is more difficult than in the EyePACS-1 as well as Messidor-2 datasets. The ROC curve represents the performance of the Inception-V3 on the EyePACS-1 dataset at various thresholds.
Figure 10a–c show the graphical analysis of evaluation matrices for Messidor-2, EyePACS-1, and DIARETDB0, respectively. With accuracy ratings of 94.59%, 97.92%, and 93.52%, it can be said that Inception-V3 is more effective. When tested on improved retinal pictures, it is believed that Inception-V3 demonstrated the best accuracy, outperforming other networks and variants. The proposed Inception-V3 model, combined with U-Net-based OD as well as BV segmentation for DR diagnosis, is compared to several state-of-the-art approaches in
Table 6 to assess its efficacy.
6. Conclusions
Early detection is essential in the treatment of diabetic retinopathy patients. This process is moving in lockstep with technological advancements. This study used AI models to classify the fundus images’ severity. We propose a novel two-stage DR detection system consisting of OD and BV segmentation, as well as DR classification based on transfer learning. Extraction of the green channel, uniform resizing, top-bottom hat transformation, as well as OD and BV segmentation were all performed during the preprocessing phase. Then, for DR classification, a transfer learning-based model, Inception-V3, is trained on Messidor-2, EyePACS-1, and DIARETDB0, which are available publicly. The findings of this study suggest that the proposed Inception-V3 evaluated using the EyePACS-1 dataset has a high potential for use in clinical applications. In the future, SVD can be replaced with Gradient descent to overcome the computational expense faster than the SVD. That could be utilized to diagnose other retinal disorders such as cataracts as well as glaucoma, and we could enhance our model’s classification performance by employing ensemble techniques of machine learning and deep learning.