Transformative Transparent Hybrid Deep Learning Framework for Accurate Cataract Detection
Abstract
:1. Introduction
2. Related Works
3. Materials and Methods
3.1. Materials
3.1.1. Dataset Description
3.1.2. Data Preprocessing
Algorithm 1: Data Preprocessing |
Input: Image Data |
Output: Preprocessed Data |
dataPreprocessing(folder_path) |
Load the dataset from specified folder paths (train and test sets) |
Convert each image to grayscale |
Resize each image to 30 × 30 pixels |
Normalize the pixel values to the range [0, 1] |
Append each image to the image list and its corresponding label to the label list |
Encode the labels using LabelEncoder |
One-hot encode the labels for the hybrid Siamese-VGG16 model |
Return the processed dataset (X_train, X_test, y_train, y_test) |
end dataPreprocessing |
3.2. Methods
3.2.1. Proposed Architecture
3.2.2. The Model
- VGG16-based Siamese Network
- Feature Extraction
- Distance Calculation
- Learning Rate Scheduler
- Hybrid Deep Learning Approach
- Model Integration
Algorithm 2: Siamese Network for Cataract Detection |
Input: Image dataset |
Output: Trained Siamese network model for cataract detection and |
Grad-CAM visualizations for the interpretation of model predictions. |
readImagesFromFolder(folderPath) |
imageList <- empty list |
labelList <- empty list |
subfolders <- list of directories in folderPath |
for each subfolder in subfolders |
subfolderPath <- concatenate(folderPath, subfolder) |
if subfolderPath is a directory |
images <- list of files in subfolderPath |
for each imageFile in images |
imagePath <- concatenate(subfolderPath, imageFile) |
if imagePath is a file |
image <- open imagePath |
image <- resize(image, mysize) |
image <- convert image to array |
image <- image.astype(‘float32’) |
image <- image / 255 |
if image is not None |
append image to imageList |
append subfolder to labelList |
return imageList, labelList |
createPairs(images, labels, numPairs) |
pairImages <- empty list |
pairLabels <- empty list |
labelDict <- map labels to indices |
labelToIndices <- map label indices to image indices |
uniqueLabels <- list of labels in labelDict |
if numPairs is None |
numPairs <- 10,000 // default value |
for _ in 1 to numPairs |
if random() > 0.5 |
label <- randomly choose from uniqueLabels |
idx1, idx2 <- randomly select two indices from labelToIndices[label] |
append (images[idx1], images[idx2]) to pairImages |
append 1 to pairLabels // similar |
else |
label1, label2 <- randomly choose two different labels from uniqueLabels |
idx1 <- randomly choose from labelToIndices[label1] |
idx2 <- randomly choose from labelToIndices[label2] |
append (images[idx1], images[idx2]) to pairImages |
append 0 to pairLabels // dissimilar |
return pairImages, pairLabels |
folderPath <- ‘D:\\datasets\\eyeCataract\\processed_images’ |
trainFolderPath <- concatenate(folderPath, ‘train’) |
trainImages, trainLabels <- readImagesFromFolder(trainFolderPath) |
testFolderPath <- concatenate(folderPath, ‘test’) |
testImages, testLabels <- readImagesFromFolder(testFolderPath) |
trainPairs, trainLabels <- createPairs(trainImages, trainLabels, numPairs =10,000) |
testPairs, testLabels <- createPairs(testImages, testLabels, numPairs = 1000) |
createBaseNetwork(inputShape) |
baseModel <- load VGG16 model with ImageNet weights, exclude top layer |
for each layer in baseModel.layers |
set layer.trainable to false |
model <- new SequentialModel |
model.add(baseModel) |
model.add(Flatten()) |
model.add(Dense(128, activation = ‘relu’)) |
model.add(Dropout(0.5)) |
model.add(Dense(128, activation = ‘relu’)) |
return model |
euclideanDistance(vectors) |
featuresA, featuresB <- vectors |
sumSquared <- reduce sum of square differences between featuresA and featuresB |
return sqrt(max(sumSquared, epsilon)) |
inputShape <- (64, 64, 3) |
inputA <- create input layer with shape inputShape |
inputB <- create input layer with shape inputShape |
baseNetwork <- createBaseNetwork(inputShape) |
featA <- baseNetwork(inputA) |
featB <- baseNetwork(inputB) |
distance <- Lambda(euclideanDistance, output_shape = (1,))([featA, featB]) |
output <- Dense(1, activation = "sigmoid")(distance) |
siameseModel <- create model with inputs [inputA, inputB] and output output |
siameseModel.compile(optimizer = "adam", loss = "binary_crossentropy", metrics = ["accuracy"]) |
trainX <- convert trainPairs to array |
trainY <- convert trainLabels to array |
reduceLR <- ReduceLROnPlateau(monitor = ‘val_loss’, factor = 0.2, patience = 5, min_lr = 1 × 10−6, verbose = 1) |
fit siameseModel |
inputs <- [trainX[:, 0], trainX[:, 1]] |
labels <- trainY |
epochs <- 20 |
validation_data <- ([testPairs[:, 0], testPairs[:, 1]], testLabels) |
callbacks <- [reduceLR] |
gradCam(inputModel, img, layerName) |
gradModel <- create model that maps input image to activations of layerName and output predictions |
with GradientTape |
convOutputs, predictions <- gradModel(expand img to batch dimension) |
loss <- predictions of predicted class |
output <- convOutputs[0] |
grads <- gradient of loss with respect to convOutputs |
weights <- reduce mean of grads along axes 0 and 1 |
cam <- initialize array of ones with shape of output‘s first two dimensions |
for i, w in enumerate(weights) |
cam += w output[:, :, i] |
cam <- resize cam to (64, 64) |
cam <- max(cam, 0) |
heatmap <- normalize(cam) |
return heatmap |
exampleImg1 <- testPairs[0][0] |
exampleImg2 <- testPairs[0][1] |
heatmap1 <- gradCam(baseNetwork, exampleImg1, layerName = ‘block5_conv3’) |
heatmap2 <- gradCam(baseNetwork, exampleImg2, layerName = ‘block5_conv3’) |
create visualization |
create figure with size (10, 5) |
add subplot for image 1 with heatmap1 |
add title ‘Grad-CAM for Image 1’ |
add subplot for image 2 with heatmap2 |
add title ‘Grad-CAM for Image 2’ |
show visualization |
4. Results
4.1. Performance Evaluation Metrics
4.2. Comparative Analysis
4.2.1. Ablation Analysis
4.2.2. Analysis of Grad-CAM Outputs Using KL Divergence for Model Comparison
4.2.3. Proposed Analysis with Other Schemes
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Vision Loss Expert Group of the Global Burden of Disease Study. Global estimates on the number of people blind or visually impaired by cataract: A meta-analysis from 2000 to 2020. Eye 2024, 38, 2156. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization (WHO). Eye Care, Vision Impairment and Blindness. Available online: https://www.who.int (accessed on 4 October 2024).
- Shang, X.; Wu, G.; Wang, W.; Zhu, Z.; Zhang, X.; Huang, Y.; Yu, H. Associations of vision impairment and eye diseases with frailty in community-dwelling older adults: A nationwide longitudinal study in China. Br. J. Ophthalmol. 2024, 108, 310–316. [Google Scholar] [CrossRef] [PubMed]
- Kulbay, M.; Wu, K.Y.; Nirwal, G.K.; Bélanger, P.; Tran, S.D. Oxidative Stress and Cataract Formation: Evaluating the Efficacy of Antioxidant Therapies. Biomolecules 2024, 14, 1055. [Google Scholar] [CrossRef] [PubMed]
- Liang, W.; Zhou, C.; Bai, J.; Zhang, H.; Jiang, B.; Wang, J.; Zhu, H. Current advancements in therapeutic approaches in orthopedic surgery: A review of recent trends. Front. Bioeng. Biotechnol. 2024, 12, 1328997. [Google Scholar] [CrossRef] [PubMed]
- Patibandla, R.L.; Rao, B.T.; Murty, M.R. Revolutionizing Diabetic Retinopathy Diagnostics and Therapy through Artificial Intelligence: A Smart Vision Initiative. In Transformative Approaches to Patient Literacy and Healthcare Innovation; IGI Global: Hershey, PA, USA, 2024; pp. 136–155. [Google Scholar]
- Levinson, B.; Woreta, F.; Riaz, K. (Eds.) Clinical Atlas of Anterior Segment OCT: Optical Coherence Tomography; Elsevier Health Sciences: Amsterdam, The Netherlands, 2024. [Google Scholar]
- Zhang, H.; Che, W.; Cao, Y.; Guan, Z.; Zhu, C. Condition Monitoring and Fault Diagnosis of Rotating Machinery Towards Intelligent Manufacturing: Review and Prospect. Iran. J. Sci. Technol. Trans. Mech. Eng. 2024, 1–34. [Google Scholar] [CrossRef]
- Chakraborty, S.; Misra, B.; Mridha, M.F. Enhancing Intelligent Medical Imaging to Revolutionize Healthcare. In Smart Medical Imaging for Diagnosis and Treatment Planning; Chapman and Hall/CRC: Boca Raton, FL, USA, 2025; pp. 3–20. [Google Scholar]
- Shome, A.; Mukherjee, G.; Chatterjee, A.; Tudu, B. Study of Different Regression Methods, Models and Application in Deep Learning Paradigm. In Deep Learning Concepts in Operations Research; Auerbach Publications: Boca Raton, FL, USA, 2024; pp. 130–152. [Google Scholar]
- Agustin, S.; Putri, E.N.; Ichsan, I.N. Design of A Cataract Detection System based on The Convolutional Neural Network. J. ELTIKOM J. Tek. Elektro Teknol. Inf. Komput. 2024, 8, 1–8. [Google Scholar] [CrossRef]
- Islam, A.; Haque, A.A.; Tasnim, N.; Waliza, S. Deep Learning Based Early Glaucoma Detection. Doctoral Dissertation, Brac University, Dhaka, Bangladesh, 2024. [Google Scholar]
- Dos Santos, P.R.S.; de Carvalho Brito, V.; de Carvalho Filho, A.O.; de Araújo, F.H.D.; Rabêlo, R.D.A.L.; Mathew, M.J. A Capsule Network-based for identification of Glaucoma in retinal images. In Proceedings of the 2020 IEEE Symposium on Computers and Communications (ISCC), Rennes, France, 7–10 July 2020; IEEE: Piscataway, NJ, USA; pp. 1–6. [Google Scholar]
- Ali, M.S. A Hyper-Tuned Vision Transformer Model with Explainable AI for Eye Disease Detection and Classification from Medical Images. Doctoral Dissertation, Islamic University, Minnesota, Bangladesh, 2023. [Google Scholar]
- Fayyad, M.F. Application of AlexNet, EfficientNetV2B0, and VGG19 with Explainable AI for Cataract and Glaucoma Image Classification. In Proceedings of the 2024 International Electronics Symposium (IES), Surabaya, Indonesia, 13–15 August 2024; IEEE: Piscataway, NJ, USA; pp. 406–412. [Google Scholar]
- Velpula, V.K.; Sharma, D.; Sharma, L.D.; Roy, A.; Bhuyan, M.K.; Alfarhood, S.; Safran, M. Glaucoma detection with explainable AI using convolutional neural networks-based feature extraction and machine learning classifiers. In IET Image Process; IET: Washington, DC, USA, 2024. [Google Scholar]
- Santone, A.; Cesarelli, M.; Colasuonno, E.; Bevilacqua, V.; Mercaldo, F. A Method for Ocular Disease Diagnosis through Visual Prediction Explainability. Electronics 2024, 13, 2706. [Google Scholar] [CrossRef]
- Afreen, N.; Aluvalu, R. Glaucoma Detection Using Explainable AI and Deep Learning. EAI Endorsed Trans. Pervasive Health Technol. 2024, 10. Available online: https://publications.eai.eu/index.php/phat/article/view/5658 (accessed on 10 September 2024). [CrossRef]
- Kher, G.; Mehra, S.M.; Bala, R.; Singh, R.P. DeB5-XNet: An Explainable Ensemble Model for Ocular Disease Classification using Transfer Learning and Grad-CAM. Authorea Prepr. 2024. Available online: https://www.authorea.com/doi/full/10.22541/au.172465028.81948688 (accessed on 10 September 2024).
- AlBalawi, T.; Aldajani, M.B.; Abbas, Q.; Daadaa, Y. IoT-Opthom-CAD: IoT-Enabled Classification System of Multiclass Retinal Eye Diseases Using Dynamic Swin Transformers and Explainable Artificial Intelligence. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 7. [Google Scholar] [CrossRef]
- Serwaa, M.; Mensah, P.K.; Adekoya, A.F.; Ayidzoe, M.A. LBPSCN: Local Binary Pattern Scaled Capsule Network for the Recognition of Ocular Diseases. Int. J. Adv. Comput. Sci. Appl. 2024, 15. [Google Scholar] [CrossRef]
- Alenezi, A.; Alhamad, H.; Brindhaban, A.; Amizadeh, Y.; Jodeiri, A.; Danishvar, S. Enhancing Readability and Detection of Age-Related Macular Degeneration Using Optical Coherence Tomography Imaging: An AI Approach. Bioengineering 2024, 11, 300. [Google Scholar] [CrossRef]
- Suara, S.; Jha, A.; Sinha, P.; Sekh, A.A. Is grad-CAM explainable in medical images? In Proceedings of the International Conference on Computer Vision and Image Processing, Bhubaneswar, India, 20–22 November 2023; Springer Nature: Cham, Switzerland; pp. 124–135. [Google Scholar]
- Sharma, N.; Gupta, S.; Mohamed, H.G.; Anand, D.; Mazón, J.L.V.; Gupta, D.; Goyal, N. Siamese convolutional neural network-based twin structure model for independent offline signature verification. Sustainability 2022, 14, 11484. [Google Scholar] [CrossRef]
- Omiotek, Z.; Kotyra, A. Flame image processing and classification using a pre-trained VGG16 model in combustion diagnosis. Sensors 2021, 21, 500. [Google Scholar] [CrossRef] [PubMed]
Model Configuration | Base VGG16 | Siamese | Explainability (Grad-CAM) | Learning Rate Scheduler | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) |
---|---|---|---|---|---|---|---|---|
Baseline (VGG16) | ✓ | X | X | X | 88.0 | 86.5 | 84.5 | 85.5 |
Baseline + Learning Rate Scheduler | ✓ | X | X | ✓ | 91.0 | 89.5 | 88.5 | 89.0 |
Hybrid (VGG16 + Siamese) | ✓ | ✓ | X | X | 92.5 | 91.0 | 90.5 | 90.7 |
Hybrid + Learning Rate Scheduler | ✓ | ✓ | X | ✓ | 100 | 100 | 100 | 100 |
Baseline + Explainability | ✓ | X | ✓ | X | 88.0 | 86.5 | 84.5 | 85.5 |
Hybrid + Explainability | ✓ | ✓ | ✓ | X | 92.5 | 91.0 | 90.5 | 90.7 |
Full Model (VGG16 + Siamese + LR + Explainable AI) | ✓ | ✓ | ✓ | ✓ | 100.0 | 100.0 | 100.0 | 100.0 |
Comparison | KL Divergence | Interpretation |
---|---|---|
Heatmap1 (VGG16 + LRS) vs. Heatmap2 (VGG16 + Siamese) | 0.1398 | Moderate similarity in focus, indicating overlapping but distinct features emphasized by the two configurations. |
Heatmap1 (VGG16 + LRS) vs. Heatmap3 (VGG16 + Siamese + LRS) | 0.0999 | Strong similarity, suggesting both configurations focused on similar features within the images. |
Heatmap2 (VGG16 + Siamese) vs. Heatmap1 (VGG16 + LRS) | 0.1483 | Indicates a somewhat different focus in the features emphasized by Heatmap2 compared to Heatmap1. |
Heatmap2 (VGG16 + Siamese) vs. Heatmap3 (VGG16 + Siamese + LRS) | 0.0388 | The lowest divergence recorded, showing a close alignment in focus between the two configurations with significant feature similarity. |
Heatmap3 (VGG16 + Siamese + LRS) vs. Heatmap1 (VGG16 + LRS) | 0.1190 | Suggests notable similarities but also distinct differences, implying that the configurations captured different aspects of the input images. |
Heatmap3 (VGG16 + Siamese + LRS) vs. Heatmap2 (VGG16 + Siamese) | 0.0479 | Low divergence, confirming a close alignment between Heatmap3 and Heatmap2, reinforcing that they produced similar interpretations of the input data. |
Author(s) | Model/Methodology | Disease Focus | Accuracy | Explainability Tool |
---|---|---|---|---|
Dos et al. [13] | Capsule Networks (CapsNet) | Glaucoma | 90.90% | None |
Ali, [14] | Vision Transformer (ViT) with Explainable AI | Eye diseases (general) | 91.40% | Explainable AI |
Fayyad, [15] | AlexNet, EfficientNetV2B0, VGG19 with Grad-CAM | Cataract, Glaucoma | 89.77% | Grad-CAM |
Velpula et al. [16] | PCNNs for feature extraction and MLCs for classification | Glaucoma | 98.03% | Explainable AI |
Santone et al. [17] | CNNs for automatic analysis of ocular diseases | Cataract, Glaucoma | 95% | Explainability localization |
Kher et al. [18] | DeB5-XNet (DenseNet121 + EfficientNetB5) with CLAHE and Grad-CAM | Cataract, Diabetic Retinopathy | 95% | Grad-CAM |
AlBalawi et al. [19] | IoT-Opthom-CAD (Swin Transformers + LightGBM) with an IoT-enabled diagnostic system | Multi-disease detection | 96.5% | Grad-CAM |
Serwaa et al. [20] | LBPSCN for small datasets with illumination variation | Cataract | 96.87% | None |
Alenezi et al. [21] | Ensemble model (ResNet, EfficientNet, Attention models) for age-related macular degeneration from OCT images | AMD | 97% | None |
Proposed Model | Hybrid deep learning(VGG16 and Siamese Network with learning rate scheduler) | Cataract | 100% | Grad-CAM |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Olaniyan, J.; Olaniyan, D.; Obagbuwa, I.C.; Esiefarienrhe, B.M.; Odighi, M. Transformative Transparent Hybrid Deep Learning Framework for Accurate Cataract Detection. Appl. Sci. 2024, 14, 10041. https://doi.org/10.3390/app142110041
Olaniyan J, Olaniyan D, Obagbuwa IC, Esiefarienrhe BM, Odighi M. Transformative Transparent Hybrid Deep Learning Framework for Accurate Cataract Detection. Applied Sciences. 2024; 14(21):10041. https://doi.org/10.3390/app142110041
Chicago/Turabian StyleOlaniyan, Julius, Deborah Olaniyan, Ibidun Christiana Obagbuwa, Bukohwo Michael Esiefarienrhe, and Matthew Odighi. 2024. "Transformative Transparent Hybrid Deep Learning Framework for Accurate Cataract Detection" Applied Sciences 14, no. 21: 10041. https://doi.org/10.3390/app142110041