4.3. Results Discussion
This section analyzes the results acquired using the DDCNN model with the impact of different learning rates. It provides a detailed breakdown of the actual versus predicted classifications for each class of samples.
Figure 4 shows the classwise actual and predicted samples obtained using five different learning rates.
Figure 4e shows that the highest learning rate, i.e., 0.1, only predicted actual positive samples correctly and did not classify the actual negative samples. In other words, this learning rate classified all 600 samples as cancerous. It was observed that the best results were obtained using a 0.0001 learning rate (i.e.,
Figure 4b) whereas only 12 actual negative samples were misclassified out of a total of 600 samples. Other learning rates with respective confusion matrices were also shown with actual and predicted samples where a 0.00001 learning rate performed well but misclassified 15 (i.e.,
Figure 4a) cancerous and non-cancerous MRI samples.
Figure 5 shows the training and validation accuracy and loss of the Dual DCNN model using the best 0.0001 learning rate. Both the validation accuracy and training accuracy increase as the number of epochs (training iterations) progresses. However, around epoch 30, both curves seem to plateau. This suggests that the model’s performance has reached its optimal point. It is essential to balance high accuracy on the training data and generalization to unseen validation data. The loss initially decreases sharply. After approximately 10 epochs, the loss stabilizes. This indicates that the model effectively learns during the initial epochs but may not improve significantly beyond that point. Monitoring loss helps prevent overfitting and ensures the model generalizes well.
The evaluation parameters against different models trained at best learning rates are shown in
Table 3. With 99% accuracy and precision, the DDCNN model performed best, demonstrating its ability to correctly classify nearly all cases with extremely few false positives. Additionally, it obtained a high recall of 98%, which indicates that the majority of positive samples were successfully identified. As a result, its F1-score, a measurement for both recall and precision, was 99%, also outstanding. DenseNet121 achieved good performance with an accuracy of 97% and properly recognized a high majority of instances. Additionally, its precision and recall were both 97%, indicating that it performed fairly in detecting positive occurrences with few false positives.
With accuracy ranging from 95% to 96%, the InceptionV3, ResNet50, ResNet34, ResNet18, and EfficientNetB2 models performed similarly. Although their recall and precision scores were likewise quite satisfactory, there were slight variations across the models. For example, InceptionV3 got a lower F1-score than ResNet models. Models like SqueezeNet, VGG-16, AlexNet, and LeNet-5 performed lower than the previously mentioned ones. Although their levels of accuracy were still fair, they performed far worse in terms of precision, recall, and F1-scores. Particularly, AlexNet and LeNet-5 performed less as they had the lowest results across all criteria.
Figure 6 shows the confusion matrices of ten SOTA DL models used in this study. It is the classwise correct and incorrect prediction of two classes that allow a better understanding of the performance of individual models against best learning rates.
Figure 6a shows the confusion matrix of DenseNet121 DL model which is dominating with learning rate of 0.001. It also shows the highest evaluation parameters using SOTA DL models. Whereas,
Figure 6b–h shows the better performance of InceptionV3, ResNet50, ResNet34, ResNet18, EfficientB2, SqueezeNet and VGG-16, respectively. These DL models are showing performance matrices greater than 90% which are considered as acceptable in binary classification problem. However,
Figure 6i,j shows the evaluation parameters of AlexNet and LeNet-5 using best learning rates. These both are not performing well.
Similarly,
Figure 7 shows the accuracy, precision, recall, and f1-scores against all models used in this study which can be seen from
Table 3 as well.
Figure 7a shows the accuracy which is ratio of correctly predicted samples to the overall samples.
Figure 7b–d shows the percentage of precision, recall, and F1, respectively against all DL models used in this study.
Table 4 shows the classwise performance matrices (accuracy, precision, recall, and f1-scores). The DDCNN model achieved a high accuracy of 99%, precision of 99%, recall of 98%, and f1-score of 99% for non-cancer (class 0). This suggests that with extremely few false positives, it accurately classified nearly all cases of non-cancerous samples. It also performed better for cancerous cases (Class 1), showing the perfect classification of cancer with 99% accuracy, precision, recall, and f1-score. With all metrics ranging around 97%, DenseNet121 performed well for class 0. Its accuracy, precision, recall, and f1-score were lower than those of the DDCNN model. With accuracy, precision, recall, and f1-score all ranging around 96% for cancerous cases, the performance remained unchanged. This suggests that the cases of cancer were accurately classified.
InceptionV3, ResNet50, ResNet34, ResNet18, EfficientNetB2, SqueezeNet, and VGG-16 models have generally similar trends across both classes, with slightly varying performance metrics. With somewhat different performance criteria, the InceptionV3, ResNet50, ResNet34, ResNet18, EfficientNetB2, SqueezeNet, and VGG-16 models largely showed comparable patterns in both classes. Class 0 accuracy was between 93% and 97%, with slightly lower but still good precision, recall, and F1-score. With accuracy ranging from 91% to 95% and modest changes in precision, recall, and F1-score, class 1 performance was slightly lower than class 0. The AlexNet and LeNet-5 models underperformed as compared to other models. For class 0, precision, recall, and F1-score were similarly comparatively lower, whereas accuracy ranged from 79% to 90%. With accuracy ranging from 64% to 80% and precision, recall, and F1-score showing a similar decreasing trend, class 1 performance decreased even more.
In conclusion, the DDCNN model outperformed compared to the others in both classes, achieving almost perfect recall, accuracy, precision, and F1-score. However, AlexNet and LeNet-5 performed relatively worse while taking into account all evaluation parameters.
Table 5 presents the influence of five different learning rates on the DDCNN model and several SOTA DL models trained with five different learning rates. The Dual DCNN performance demonstrates sensitivity to the learning rate. The best results were obtained when training at a learning rate of 0.0001, achieving 99%, 99%, 98%, and 99% for accuracy, precision, recall, and F1-score, respectively, whereas performance decreased from 0.0001 to 0.00001 when the learning rate dropped. However, it remained quite satisfactory, with an accuracy of 98% and other metrics over 97%. Its lowest performance was for a higher learning rate as shown in the table. Similar to the Dual DCNN, models like InceptionV3, ResNets, EfficientNetB2, DenseNet121, SqueezeNet, and VGG-16 demonstrated inconsistent performance at different learning rates. Lower learning rates were often associated with better performance. For instance, DenseNet121 and ResNet18 both achieved their maximum accuracy of 93% and 96%, respectively, with a learning rate of 0.001 and 0.01, respectively.
Across all learning rates, AlexNet and LeNet-5 underperformed compared to the other models in terms of accuracy, precision, recall, and f1-score. For instance, for a given learning rate of 0.0001, AlexNet’s maximum accuracy was 85%, whereas LeNet-5’s maximum accuracy was 71% at 0.001.
In conclusion, the findings highlight the crucial role of hyperparameter tuning, particularly learning rate selection, in optimizing DL model performance. Performance was generally better at lower learning rates, whereas the ideal learning rate varied based on the particular model architecture. Additionally, several models showed consistency in their performance across a variety of learning rates, whereas other models showed variability. These results highlight how important it is to adjust hyperparameters, including the learning rate, in order to maximize the DL model’s performance.