Multi-Attention Integrated Deep Learning Frameworks for Enhanced Breast Cancer Segmentation and Identification
Abstract
Breast cancer poses a profound threat to lives globally, claiming numerous lives each year. Therefore, timely detection is crucial for early intervention and improved chances of survival. Accurately diagnosing and classifying breast tumors using ultrasound images is a persistent challenge in medicine, demanding cutting-edge solutions for improved treatment strategies. This research introduces multi-attention-enhanced deep learning (DL) frameworks designed for the classification and segmentation of breast cancer tumors from ultrasound images. A spatial channel attention mechanism is proposed for segmenting tumors from ultrasound images, utilizing a novel LinkNet DL framework with an InceptionResNet backbone. Following this, the paper proposes a deep convolutional neural network with an integrated multi-attention framework (DCNNIMAF) to classify the segmented tumor as benign, malignant, or normal. From experimental results, it is observed that the segmentation model has recorded an accuracy of 98.1%, with a minimal loss of 0.6%. It has also achieved high Intersection over Union (IoU) and Dice Coefficient scores of 96.9% and 97.2%, respectively. Similarly, the classification model has attained an accuracy of 99.2%, with a low loss of 0.31%. Furthermore, the classification framework has achieved outstanding F1-Score, precision, and recall values of 99.1%, 99.3%, and 99.1%, respectively. By offering a robust framework for early detection and accurate classification of breast cancer, this proposed work significantly advances the field of medical image analysis, potentially improving diagnostic precision and patient outcomes.
Keywords Breast Cancer Deep Learning Attention Mechanisms Medical Imaging
1 Introduction
Breast cancer is one of the most common cancers among women worldwide, resulting in approximately 570,000 deaths in 2015 alone. Annually, over 1.5 million women, accounting for 25% of all female cancer diagnoses, are diagnosed with breast cancer globally [1][2]. Breast tumors often originate as ductal hyperproliferation and can progress to benign tumors or metastatic carcinomas when stimulated by various carcinogenic agents. The tumor microenvironment, including stromal effects and macrophages, plays a crucial role in the development and progression of breast cancer [3].
Early detection of breast carcinoma significantly increases the chances of successful treatment. Therefore, implementing effective procedures for identifying early signs of breast cancer is crucial [4]. Mammography, ultrasound, and thermography are the primary imaging techniques used for screening and diagnosing breast cancer [5][6]. With over 75% of tumors responding to hormones, breast cancer is primarily a postmenopausal illness. Their incidence rates are at the highest between the ages of 35-39 and then plateau after 80 years, with age and female sex being significant risk factors. This hormone dependency interacts with environmental and genetic factors to determine the incidence and progression of the disease [7].
Precise segmentation and classification of breast cancer are essential for effective treatment planning and positive patient outcomes. Traditional methods heavily depend on manual interpretation, which is both time-consuming and prone to errors. Advancements in technology have transformed the provision of healthcare. High processing power, primarily from GPUs, enables the creation of deep neural networks with multiple layers, allowing for the extraction of formerly unachievable features. Convolutional Neural Networks (CNNs) have made a profound impact on image processing and understanding, especially in the areas of segmentation, classification, and analysis [8][9].
Deep learning models can process vast amounts of medical imaging data and detect subtle abnormalities that might elude human observers. Accurate tumor segmentation and classification enhances oncologists’ capacity to make decisions about whether a tumor is malignant or not. Typically, these methods require professional annotation and pathology reports to make this assessment [10], which consumes a lot of human effort. DL provides an efficient and promising solution for the automation of these procedures. They can learn complicated patterns and features from ultrasounds and mamograms, which has the potential to improve classification accuracy and efficiency.
This paper proposes the Spatial-Channel Attention LinkNet Framework with InceptionResNet Backbone for breast cancer segmentation, and DCNNIMAF Framework for breast cancer classification. The segmentation framework is a novel and effective attention-enhanced mechanism that uses a pre-trained CNN model architecture for the encoder backbone. This enhances the capability of feature extraction, while effectively enhancing segmentation using a coupled spatial and channel attention mechanism in the decoder. The proposed classification framework - Deep CNN with an Integrated Multi-Attention Framework (DCNNIMAF) - is a unique and novel architecture with a hybrid of integrated self and spatial attention mechanisms. The segmentation results were evaluated using evaluation metrics such as Dice coefficient, IoU score, and a combination of focal loss and Jaccard loss, while classification evaluation metrics include recall, F1-score, precision, and accuracy.
The organization of this paper is as follows: Section 2 reviews the literature on breast cancer segmentation and classification; Section 3 describes the proposed approach; Section 4 presents experimental results; Section 5 concludes and outlines future research directions.
2 Related Works
Osareh et al. [11] utilized the K-nearest neighbors (KNN), Support Vector Machine (SVM), and Probabilistic Neural Network (PNN) classification models to perform the classification of tumor regions. The methodology was employed on two different publicly available datasets where one of the datasets was composed of Fine Needle Aspirates of the Breast Lumps (FNAB) with 457 negative samples and 235 positive samples while the other dataset was composed of 295 gene microarrays with 115 good-prognosis class and 180 poor-prognosis class data. To support the classifier, feature extraction and selection methodologies were utilized. Feature extraction techniques like Principal Component Analysis (PCA), optimized with auto-covariance coefficients of feature vectors, were employed to reduce high-dimensional features into low-dimensional ones. Feature selection includes two different approaches such as the Relief algorithm for filter approach where the features are selected using a pre-processing step and no bias of the induction algorithms is considered unlike the wrapper approach namely the proposed Sequential forward selection (SFS) technique where a feature set composed of 15 sonographic features are obtained. The results underwent ranking using a feature ranking method that employed Signal-to-Noise Ratio (SNR) to identify crucial features. The evaluation involved wrapper approach estimates assessed through a leave-one-out cross-validation procedure, focusing on overall accuracy, Sensitivity, Specificity, and Matthews Correlation Coefficient (MCC).
Li et al. [12] introduced a novel patches screening method that included the extraction of multi-size and discriminative patches from histology images involving tissue-level and cell-level features. Firstly, patches of dimensions 512x512 and 128x128 are generated from the input data. This is followed by the utilization of two ResNet50s where one of the models is fed with patches of dimensions 128x128 while the other inputs patches of dimensions 512x512 which extract tissue-level and cell-level features respectively. A finetuning approach is adopted to train the ResNet50 models this is followed by a screening of patches by aggregating them into different clusters based on their phenotype. For speeding up the process, the patch size is reduced to obtain 1024 features followed by PCA to reduce the number of features to 200. This is followed by the k mean clustering process. A ResNet50 fine-tuned with 128x128 size patches is employed to select the clusters. Subsequently, the P-norm pooling feature method is applied to extract the final features of the image, followed by the use of a Support Vector Machine to classify input images into four distinct classes: Normal, Benign, In situ carcinoma, or Invasive carcinoma.
Zheng et al. [13] introduced a DL-assisted Efficient Adaboost Algorithm (DLA-EABA) where the Convolutional Neural Network is trained with extensive data so that high precision can be achieved. A stacked autoencoder is utilized for generating a deep convolutional neural network and the encoder and decoder sections contain multiple non-linear transformations which are taken from the combined depictions of actual data which is taken as input. An efficient Adaboost algorithm is utilized to train the classifiers which estimate the positive value for threshold and parity and is done by reviewing all the potential mixtures of both values, The deep CNN contains Long Short-Term Memory (LSTM) with logistic activation function as conventional artificial neurons. This is followed by Softmax Regression for classifying the images with the help of features extracted.
Lotter et al. [14] introduced a robust breast tumor classification model for mammography images which utilizes bounding box annotations and is extended to digital breast tomosynthesis images to be able to identify the tumor region in the image. The CNN first trains to classify if lesions are present in the cropped image patches. Subsequently, using the entire image as input, the CNN initializes the backbone of the detection-based model. This model outputs the entire image with a bounding box, providing a classification score. The model’s performance is then evaluated by comparing its ability to identify the tumor region with Breast Imaging Reporting and Data System Standard (BI-RADS) scores of 1 and 1 considered as negative interpretations and index and pre-index cancer exams.
Saber et al. [15] employed transfer learning methodology on five different models: ResNet50, VGG19, Inception V3, Inception-V2, and VGG16. Feature extraction involved freezing the trained parameters from the source task except for the last three layers, which were then transferred to the target task. The images were preprocessed using different methods such as Median Filter, Histogram Equalization, Morphological Analysis, Segmentation, and Image Resizing. The dataset is split into an 80-20 ratio and Augmentation is applied to the training dataset where the images are rotated and flipped. The newly trained layers are combined with the existing pre-trained layers and the features are extracted using these models. Classification is done by feeding the extracted features from the transfer learning models into a Support Vector Machine classifier and Softmax classifiers that are fine-tuned using the Stochastic Gradient Descent method with momentum (SGDM). The gradient’s high-velocity dimensions are reduced due to SGDM jittering and the past gradients with momentum are reduced to saddle point.
Cho et al. [16] proposed a Breast Tumor Ensemble Classification Network (BTEC-Net) which utilizes an improved DenseNet121 and ResNet101 as base classifiers where each of the four blocks is connected to the Squeeze and Excitation Block and Global Average Pooling layer. Next, the feature map sizes are aligned using a fully connected layer and integrated along the channel dimension. The combined feature map is then fed into a feature-level fusion module to perform binary classification. Once the classification is done, segmentation is carried out by utilizing the proposed Residual Feature Selection UNet model (RFS-UNet) which is an encoder-decoder network and are connected with the layer positions of the same feature map size using skip-connections. The encoder part is composed of five encoders with each one comprising of a convolutional layer, an RFS module, a residual convolutional block, and a max-pooling layer. Similarly, the model is composed of five decoders where each decoder comprises a convolutional layer and an RFS module as well, a transpose convolutional layer and a Residual Block. The skip connections contain a spatial attention module where the input involves the output of transposed convolution and output of the RFS module from the encoder and the output is concatenated to the output of the same transposed convolution layer. The segmentation process ends with a sigmoid activation function which returns the segmented tumor region.
Dayong Wang et al. [17] introduced a novel method for automatically detecting metastatic breast cancer in whole slide images of sentinel lymph node biopsies, achieving first place in the International Symposium on Biomedical Imaging (ISBI) grand challenge. Their system delivered impressive results with an AUC of 0.925 for whole slide image classification and a 0.7051 tumor localization score, surpassing an independent pathologist’s review. By integrating the DL system’s predictions with pathologist diagnoses, a notable reduction in the error rate was achieved, showcasing the profound impact of DL on enhancing the accuracy of pathological diagnoses for breast cancer metastases.
Abdelrahman Sayed Sayed et al. [18] developed a new, economical design for a 3-RRR Planar Parallel Manipulator (PPM), aiming to overcome the challenge of deriving kinematic constraint equations for manipulators with complex nonlinear behavior. Utilizing screw theory, they computed direct and inverse kinematics and then developed a Neuro-Fuzzy Inference System (NFIS) model that was optimized with Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) to predict the position of the end-effector. The proposed PPM structure underwent investigation, with the development of its kinematic model and subsequent testing of a prototype in ADAMS, followed by fabrication for validation. Results showed that PSO outperformed GA in tuning the NFIS model, aligning closely with actual PPM data, indicating promise for enhanced robot capabilities and performance through further optimization and control strategies.
Luuk Balkenende et al. [19] proposed a comprehensive review elucidating the integration of deep learning techniques in breast cancer imaging. Their research highlights the wide-ranging applications of DL across modalities such as digital mammography, ultrasound, and magnetic resonance imaging (MRI), with a focus on tasks including lesion classification, segmentation, and predicting therapy response. Additionally, they discuss research on diagnosing breast cancer metastasis using CNNs on whole-body scintigraphy scans, and their investigation into aiding clinicians in diagnosing axillary lymph node metastasis with a 3D CNN model on PET/CT images. They emphasize the necessity of conducting large-scale trials and addressing ethical considerations to fully harness the potential of deep learning in clinical breast cancer imaging.
Shen et al. [20] proposed a pioneering DL-based approach for detecting breast cancer on screening mammograms. Their innovative "end-to-end" algorithm efficiently utilizes training datasets with varying levels of annotation, achieving exceptional performance compared to previous methods. On independent test sets from diverse mammography platforms, the proposed method achieves per-image AUCs ranging from 0.88 to 0.98, with sensitivities between 86.1% and 86.7%. Notably, the algorithm’s transferability across different mammography platforms is demonstrated, requiring minimal additional data for fine-tuning. These results emphasize the potential of deep learning to revolutionize breast cancer screening, offering more accurate and efficient diagnostic tools for clinical applications.
Han et al. [21] introduced a novel method for breast cancer diagnosis and prognosis. Their Class Structure-based Deep CNN (CSDCNN) achieves impressive accuracy (average 93.2%) by addressing challenges in automated multi-class classification from histopathological images. Combining hierarchical feature representation and distance constraints in feature space, their methodology offers a unique solution to subtle differences among breast cancer classes. Comparative experiments highlight the superior performance of the CSDCNN compared to existing methods, positioning it as a valuable tool for clinical decision-making in breast cancer management. Their work represents a significant advancement in automated breast cancer classification, providing clinicians with a reliable diagnostic aid.
Wang et al. [22] introduced DeepGrade, a deep learning-based histological grading model aimed at improving prognostic stratification for NHG 2 tumors. Developed and validated on large-scale datasets of digital whole-slide histopathology images, DeepGrade offers a novel approach to classify NHG 1 and NHG 3 morphological patterns. By re-stratifying NHG 2 tumors into DG2-high and DG2-low groups, DeepGrade provides independent prognostic information beyond traditional risk factors. Its performance was validated internally and externally, showcasing its ability to predict recurrence risk accurately. The ensemble approach, employing 20 deep convolutional neural network models, ensures robustness and reliability in classification tasks. DeepGrade shows promise as a cost-effective alternative to molecular profiling, supported by high area under the receiver operating characteristic curve values. This innovative methodology heralds a significant advancement in histological grading for breast cancer, promising improved clinical decision-making and personalized treatment strategies. Further research should focus on validating DeepGrade across diverse patient populations and integrating it into routine clinical practice.
Sizilio et al. [23] introduced a fuzzy logic-based approach for pre-diagnosing breast cancer from Fine Needle Aspirate (FNA) analysis. Addressing the global burden of breast cancer and the variability in FNA diagnostic accuracy (65% to 98%), this method enhances reliability through computational intelligence. The research employed the Wisconsin Diagnostic Breast Cancer Data (WDBC) and proceeded through four stages: fuzzification, rule base establishment, inference processing, and defuzzification. Validation included cross-validation and expert reviews. The method achieved a sensitivity of 98.59% and a specificity of 85.43%, demonstrating high reliability in detecting malignancies but highlighting the need for improvement in identifying benign cases. This approach shows significant potential for enhancing breast cancer diagnostic accuracy.
Sarkar et al. [24] explored the use of the K-Nearest Neighbors (KNN) algorithm for diagnosing breast cancer with the Wisconsin-Madison Breast Cancer dataset. Recognized for its straightforward and efficient implementation, KNN served as a non-parametric classifier in this study. The research showed that KNN improved classification performance by 1.17% over the best-known result for the dataset. Advantages of KNN include its simplicity, effectiveness with small training sets, and no need for retraining when new data is incorporated. However, the algorithm also has significant limitations, such as substantial storage requirements for large datasets and extensive computational demands for distance calculations between test and training data. The study noted the existence of faster KNN variants, such as those using k-d trees, which have been successful in tasks like script and speech recognition. The findings highlight KNN’s potential for various diagnostic applications, even though no single algorithm is optimal for all diagnostic problems. This research emphasizes KNN’s promise in enhancing diagnostic accuracy while acknowledging its challenges with storage and computational efficiency.
Song et al. [25] introduced an ML technique aimed at accurately annotating noncoding RNAs (ncRNAs) by searching genomes to find ncRNA genes characterized by known secondary structures. Their method involves aligning sequences optimally with a structure model, a critical step for identifying ncRNAs within genomes. Acknowledging the limitations of using a single structure model, they developed an approach that processes genome sequence segments to extract feature vectors. These vectors are then classified to differentiate between ncRNA family members and other sequences. The results showed that this method captures essential features of ncRNA families more effectively and enhances the accuracy of genome annotation compared to traditional tools. This work underscores the significant role of ML in bioinformatics, particularly in improving the precision of ncRNA gene identification.
Foster et al. [26] offered a critical commentary on the integration of ML in biomedical engineering, particularly focusing on the application of support vector machines (SVMs) beyond mere statistical tools. Their analysis highlighted the inherent challenges in developing clinically validated diagnostic techniques using SVMs, emphasizing concerns such as overfitting and the imperative for robust validation procedures. Unlike studies focused on specific diseases, their research aimed to evaluate and enhance existing ML models for broader biomedical applications. The commentary serves as a cautionary perspective for researchers, reviewers, and readers, stressing the complexities and potential pitfalls in classifier development. It advocates for an integrated approach where classifier validation forms an integral part of the experimental process. This work underscores the critical need to establish the clinical validity of diagnostic tools developed through ML in biomedical research.
Wei et al. [27] proposed an innovative method for improving microcalcification classification in breast cancer diagnosis using content-based image retrieval (CBIR) combined with ML. Their approach integrates CBIR to retrieve similar mammogram cases, enhancing the performance of a support vector machine (SVM) classifier. By incorporating local proximity information from retrieved cases, the adaptive SVM achieved a notable increase in classification accuracy from 78% to 82%, as measured by the area under the ROC curve. This method aims to provide radiologists with enhanced diagnostic support, serving as a valuable "second opinion" tool. Despite these advancements, the study acknowledges limitations in dataset size, which may affect generalizability. These findings underscore the potential of CBIR-assisted classification approaches in improving the precision of breast cancer diagnostics, emphasizing the need for further validation with larger clinical datasets to validate its efficacy and applicability in real-world clinical settings.
3 Proposed Work
3.1 Methodology
The ultrasound images are first augmented to handle class imbalance. Following augmentation, the images were preprocessed using a sequence of preprocessing steps – gamma correction, gaussian filtering, image resizing, and normalization. Pixel values in ultrasound images can reflect non-linearities, especially in high- or low-intensity regions. Gamma correction can help compensate for these non-linearities, leading to more accurate and visually appealing images. An effective technique for noise reduction and edge detail preservation in ultrasound images is the application of Gaussian filtering. This effectively reduces noise while preserving edge details. To preserve consistency throughout the dataset and facilitate batch processing, resizing is done to ensure the images fed into the proposed DL model have the same dimensions. Normalizing the image pixels to scale within a specific range enhances the quality of activation functions’ ability to capture the non-linearities in the data. Here, the images have been scaled to fall within the range (0, 1). The preprocessed images are then fed to the proposed Spatial-Channel Attention LinkNet Framework with InceptionResNet backbone for segmenting the tumor region. The segmented tumor maps are then fed to the proposed DCNNIMAF classifier to classify the segmented mass as benign, normal, or malignant. The overall workflow of this proposed work has been presented in Figure 1.
3.2 Dataset Exploration
The data utilized in this work was obtained from the Breast Ultrasound Images Dataset [28] made available by Arya Shah on Kaggle. It contains a total of 780 ultrasound images along with their corresponding segmented ground truth masks, split into three categories – benign, malignant, and normal. Figure 2 showcases a sample of ultrasound images from the dataset overlapped with their corresponding segmentation maps.
The dataset exhibits a significant class imbalance, with benign samples contributing to 56.5% of the data, while malignant and normal samples covered only 26.7% and 16.9% respectively. The distribution of ultrasound images exhibiting this class imbalance has been represented graphically in Figure 3. To mitigate this imbalance and avoid bias during the training of segmentation and classification models, augmentation techniques are utilized. Specifically, random crop, random rotation, random zoom, random shear, and random exposure methods were applied to augment the images belonging to the ’normal’ and ’malignant’ classes.
The rationale behind this augmentation approach is to level the data count of the ’normal’ and ’malignant’ classes, thereby aligning them more closely with the larger ’benign’ class. By increasing the training data for the ’normal’ and ’malignant’ classes through augmentation, the effects of class imbalance are aimed to be mitigated and enable the models to learn effectively from all classes. This approach ensures that the segmentation and classification models are trained on a more balanced dataset, thereby improving their ability to accurately segment, identify, and characterize breast tumors across different classes. This augmentation resulted in a well-balanced data distribution of each category, which has been represented in Figure 4.
3.3 Preprocessing
Following augmentation, the images were preprocessed using a preprocessing pipeline, consisting of four stages – gamma correction, gaussian filtering, resizing, and image normalization. The output images obtained after each preprocessing step of breast ultrasound image preprocessing are shown in Figure 5.
3.3.1 Gamma Correction
Gamma correction serves as the initial preprocessing step tailored specifically for breast ultrasound images. It plays a pivotal role in enhancing the visibility of crucial anatomical structures and subtle details within the images. By adjusting the image’s brightness and contrast, gamma correction improves the delineation of tumor boundaries and enhances the visibility of tumor features. This step is particularly critical in breast cancer tumor segmentation, where accurate visualization of tumor margins is essential for precise delineation and subsequent analysis.
Gamma correction can be represented mathematically as follows:
(1) |
3.3.2 Gaussian Filtering
Following gamma correction, Gaussian filtering is employed to mitigate speckle noise, a common artifact in ultrasound images that can obscure tumor boundaries and hinder accurate segmentation. By selectively smoothing out noise while preserving essential details, Gaussian filtering improves the clarity of tumor features and enhances the accuracy of segmentation algorithms. This step is crucial in breast cancer tumor segmentation and classification, as it reduces noise artifacts and improves the fidelity of tumor delineation, leading to more accurate and reliable segmentation results. Gaussian filtering ensures that the images are cleaner and more conducive to subsequent segmentation and classification tasks, facilitating the accurate identification and characterization of breast tumors.
Gaussian filtering is given by:
(2) |
3.3.3 Ultrasound Image Resizing
Once the images have undergone gamma correction and Gaussian filtering, resizing is performed to standardize image dimensions, facilitating compatibility with segmentation and classification algorithms. Standardized image dimensions are essential for ensuring consistency and comparability across different datasets and analysis pipelines. Resizing enables researchers to create a uniform framework for analysis, simplifying the processing pipeline and reducing computational complexity.
Resizing of breast cancer images can be represented mathematically as:
(3) |
3.3.4 Pixel Normalization
The last preprocessing stage, normalization, scales the pixel values of images to a standardized range, typically from 0 to 1. This normalization process is crucial for ensuring consistency in pixel intensity across different images, which is essential for training machine learning models and neural networks. Normalization enhances the comparability of images and improves the convergence speed of machine learning algorithms during training. By eliminating variations in intensity that may arise due to differences in acquisition parameters or imaging conditions, normalization ensures that segmentation and classification algorithms can learn effectively from the data, leading to more accurate and reliable analysis results.
The normalization process is denoted as:
(4) |
3.4 Dual Attention and CNN Backbone Enhanced LinkNet Framework for Breast Cancer Segmentation
This section presents the proposed framework for breast cancer tumor segmentation utilizing a LinkNet framework with an InceptionResNet backbone, employing a dual spatial-channel attention mechanism. The framework takes preprocessing breast ultrasound images and their corresponding ground truth masks as input to the segmentation model and provides the predicted segmentation map as output.
The LinkNet architecture [29] is a deep learning model designed for semantic segmentation tasks, particularly in the context of biomedical imaging. The encoder of the proposed framework is built using the InceptionResNet CNN model [30], which is designed to capture contextual information from the entire input image. The decoder is a series of transpose convolution layers with dual spatial-channel attention mechanisms incorporated within the decoder blocks.
3.4.1 Encoder Section
The encoder section of the segmentation architecture is designed using an InceptionResNet CNN backbone and thus consists of a stem block, three types of InceptionResNet blocks, and two types of reduction blocks.
The stem block begins with three convolution layers and is followed by a max pooling layer and a convolution layer where the layers get executed at the same time. This is followed by a filter concatenation layer and this is split into two paths that are parallel to each other. One of the paths contains two convolutional layers while the other is composed of four convolutional layers. Both paths are combined using filter concatenation and are followed by a parallel convolution and max pooling again which is further followed by filter concatenation.
(5) |
The Inception Resnet blocks are of three types, named A, B and C respectively. Block A is composed of three different paths and a residual connection. The first path consists of a single convolution operation while the second and third paths consist of three and two convolutional operations respectively. The three paths are combined with the help of another convolution operation followed by concatenation with the residual connection. Blocks B and C are similar but the major difference is with the size of the feature maps since an average pooling operation is responsible for downsampling the data from block B to block C. They are composed of two different paths, one with three convolution operations and the other with one convolution operation. The convolution paths are combined by utilizing another convolution operation. There also exists a residual connection which is combined with the result of the convolution operations by utilizing a convolution operation.
(6) |
(7) |
The Reduction blocks are of two variants which are called A and B respectively. Block A begins with a filter concatenation operation which is then split into three paths. The first and third paths are composed of a max pooling and a convolution operation respectively and the second path is composed of three convolution layers. The three paths are then combined with the help of a filter concatenation operation. Block B also consists of a max pooling operation and three convolution operations which are present parallelly. Unlike block A, block B is composed of four different parallel paths where the first two paths are described in the previous statement. The other two paths are two convolution operations respectively and all the four paths are combined by utilising a filter concatenation operation.
(8) |
(9) |
3.4.2 Decoder Section
The Decoder section is composed of decoder blocks, spatial-channel attention blocks, convolution and transpose convolution layers, and a softmax activation function. The decoder block begins with convolution and a batch normalization operation followed by a transpose convolution operation and another batch normalization operation which is followed by convolution and batch normalization operations again.
(10) |
(11) |
The spatial channel attention block begins with two double convolution operations taking place simultaneously followed by the addition of the two feature maps obtained from the operation. The addition operation is followed by the introduction of non-linearity using ReLU activation followed by another convolution operation. The convolution operation is followed by a sigmoid activation function to restrict the values to lie within the range 0 and 1.
(12) |
(13) |
(14) |
This is followed by channel attention. The Channel attention block is composed of two different pooling operations (max pooling and average pooling) which happen simultaneously and the obtained feature maps are given as input to a shared multi-layered perceptron. The shared MLP is composed of a flatten layer, gaussian error linear unit (GELU) activation function and dropout layers. Once these three operations are done flatten and dropout operations are performed again. The decoder operation ends with a transpose convolution operation, two convolution operations, and a softmax activation function thus displaying the segmented output.
(15) |
(16) |
(17) |
(18) |
3.4.3 Workflow and Execution
Initially, the image is processed through a stem block, which captures low-level features such as edges and textures. These edges outline the boundaries of potential tumors, while textures reveal the internal structure of these masses, which often differ significantly between healthy tissue and malignancies. Following the stem block, the image progresses through five InceptionResNet-A Blocks. Microcalcifications require finer resolution, whereas architectural distortions span larger areas. The residual connections, facilitate deeper networks by mitigating the vanishing gradient problem, and capture a broad spectrum of features at various scales. This multi-scale feature capture is crucial for analyzing ultrasound images of breast tissue, where abnormalities can manifest at different scales.
Subsequently, the image encounters a reduction block, which reduces the spatial dimensions of the feature maps. This reduction allows the model to focus on higher-level features and significantly reduces computational complexity, facilitating more efficient processing. This is particularly useful for identifying broader patterns indicative of cancer, such as the overall shape and orientation of a mass.
The image then navigates through two InceptionResNet Block B layers, further through another stem block. This refines the detection of mid-level features, such as more nuanced textural patterns and subtle edge variations. The stem block repetition extracts additional low-level features that complement the more complex features identified in the intermediate stages, ensuring the model has a comprehensive grasp of the ultrasound image’s content.
Following this, the image passes through a reduction block, which reduces the spatial dimensions of the feature maps. This reduction allows the model to focus on low-level features, essential for the precise delineation of tumor boundaries. The image then enters five InceptionResNet Block C layers, and finally into an average pooling layer. These operations are optimized for extracting high-level semantic features for differentiating between various types of tissues present in the image.
This compressed feature map is then fed to the LinkNet decoder which transforms the abstracted feature map into a spatially coherent segmentation map. In the decoder, upsampling refines the segmentation map generated by the encoder, and the attention mechanism focuses specifically on the tumor region, enhancing its emphasis. By integrating spatial and channel attention mechanisms, the model can enhance feature maps by emphasizing spatial locations and informative channels. This comprehensive approach improves the model’s capability to understand intricate tumor patterns and structures, thereby enhancing segmentation performance.
Initially, the feature map is fed to 2 convolutional blocks, followed by a spatial-channel attention block, which is repeated thrice. They perform a preliminary enhancement of the map, focusing on sharpening the details and adjusting the contrast to make the underlying structures more prominent. This ensures that the feature map contains clear and distinguishable elements that correspond to the anatomical structures within the breast ultrasound images. It is then passed to the first decoder block.
The initial decoder block is designed to capture high-level semantic features essential for segmenting larger structures within breast ultrasound images. It facilitates the reconstruction of the spatial relationships and contextual information abstracted away during the encoding process. The spatial-channel attention block that follows this decoder block scrutinizes the feature map to identify and accentuate the regions that are most likely to contain tumor structures. This is achieved by assigning higher weights to the spatial locations that exhibit characteristics typical of tumors, such as irregular shapes and unusual textural patterns. The channel attention mechanism analyses the feature map across different channels to determine the ones that carry the most relevant information for segmentation. By amplifying the signals from these informative channels, the model can better discern the unique features that differentiate tumor tissue from the surrounding healthy tissue.
Finer textures and structures within the tumors are captured as the feature map moves up to the second decoder block. The spatial-channel attention block adjusts the feature map’s weights to emphasize the spatial locations where these detailed features are most prominent, resulting in more precise segmentation of smaller tumor components. Channel attention further identifies the most relevant feature map channels for the task, focusing on the texture and shape of the tumors.
In the third decoder block, the feature map captures even more detailed features, including intricate patterns and structures within the tumors. The attention mechanism in this block focuses on the boundaries of the tumor region, which helps the model improve the quality of the produced segmentation map. This makes the output more accurate and minimizes extraneous markings.
The final decoder block is responsible for capturing the most detailed features, including the specific patterns and structures unique to each tumor. The attention mechanism allows the model to distinguish between benign and malignant types of tumors and identify subtle variations within a single tumor type. The output from this decoder block is transpose-convolved to ensure a consistent output shape of the segmentation map, followed by convolutions to correct the output channels. This finally transforms the abstracted feature map into a detailed and accurate segmentation map.
The model was trained by backpropagating over a custom loss function (21), equal to an aggregate of focal loss (19) and dice (Jaccard) loss (20) obtained after each training epoch.
(19) |
(20) |
(21) |
The model specifications and parameters of the proposed Spatial-Channel Attention LinkNet Framework with InceptionResNet Backbone are shown in Table 1.
Parameters | Coefficients |
---|---|
Total Trainable Parameters | 57,881,011 |
Learning Rate | 0.001 |
Epochs | 100 |
Image Shape | (256, 256) |
Batch Size | 16 |
3.5 Multi-Attention Integrated Deep CNN Framework for Breast Cancer Classification
This section presents the proposed breast cancer deep learning classification model, coined Deep CNN with an Integrated Multi-Attention Framework (DCNNIMAF). Utilizing multiple attention modules integrated within its architecture the proposed approach is designed to effectively classify breast ultrasound images into malignant, benign, or normal categories. The input to the model comprises preprocessed breast ultrasound images and outputs the predicted class to which the image belongs.
The model architecture of DCNNIMAF integrates several pivotal blocks designed to extract pertinent features from the input breast ultrasound images. These blocks include convolutional blocks, double convolutional blocks, self-attention blocks, and fully connected layers. Each block plays a crucial role in feature extraction and classification. The layer architecture diagram of the proposed DCNNIMAF model is shown in Figure 7.
3.5.1 Convolutional Block
The convolutional block within DCNNIMAF consists of a convolutional layer, followed by a batch normalization layer, and finally an activation layer. The activation function used varies between Leaky ReLU and SiLU in different convolutional blocks. The operations performed by the block on the input feature map are mathematically represented as follows:
(22) |
(23) |
(24) |
(25) |
3.5.2 Double Convolutional Block
The double convolutional block comprises two consecutive convolutional layers with 256 filters, a kernel size of 3, and a padding of 1. Mathematically, the operation of this block can be represented as
(26) |
3.5.3 Self-Attention Block
The self-attention block in DCNNIMAF computes the attention weights for each pair of positions within the feature map.
(27) |
3.5.4 Workflow and Execution
The flow of information through DCNNIMAF begins with an input layer of shape (256, 256, 3). Initially, the segmentation map undergoes a convolutional block with 512 filters, a padding of 2 and a kernel size of 3. This extracts low-level features such as textures and edges from the input. Following this, the output from the first convolutional block is passed through another convolutional block with 256 filters, the same kernel size, and padding, but with SiLU activation. The introduction of SiLU activation enhances the non-linearity for higher-level feature extraction, which helps to distinguish between different breast tissue characteristics indicative of cancerous growth.
Subsequently, a double convolutional block is applied to further refine feature extraction. By employing consecutive convolutional layers with 256 filters each, this block extracts deeper and more abstract features from the input. Following this, a convolutional block with 128 filters, a kernel size of 4, and padding of 2 is employed, accompanied by a leaky ReLU activation. This operation aims to distill the extracted features into more compact and discriminative representations, facilitating the model’s capability to detect, and interpret complex patterns within the tumor’s structure such as textural anomalies to irregular shapes
Continuing the feature refinement process, another convolutional block with 128 filters, a padding of 1, and a kernel size of 3 is applied, this time utilizing SiLU activation.
Subsequently, two convolutional blocks are utilized - the first with 128 filters, a kernel size of 4, and a padding of 2, and the second with 64 filters, a padding of 1, and a kernel size of 3. These features are then fed to a spatial attention mechanism, enhancing the model’s capacity to adjust to subtle differences between various tissue characteristics associated with malignant and benign tumors.
The feature map obtained from the preceding operations is then concatenated with the output from a convolution and batch normalization layer with 64 filters, a padding of 2, and a kernel size of 3. This model integrates both high-level and low-level features across different layers through a concatenation approach, enabling a more comprehensive representation of the input image. This allows the model to learn about the presence of microcalcifications and the density of the tumor tissue, that are most indicative of malignancy.
This concatenated output undergoes further processing through convolutional and activation layers before being upsampled and concatenated again with intricate feature attention results. This iterative refinement process ensures that the model can effectively leverage both global and local contextual details present in the input segmentation map. This is then fed through additional convolutional blocks and pooling layers before being passed through a self-attention block. By incorporating self-attention mechanisms, it allows the model to highlight more weightage to the distribution of cells or the presence of necrosis, filtering out less relevant information and potential artifacts that could obscure diagnosis.
Ultimately, the result from the self-attention block is flattened and subjected to dropout regularization to mitigate overfitting. Dropout prevents the model from relying on specific features or patterns within the training data that may not generalize well to unseen samples, thereby improving its robustness and generalization performance.
The feature map is then directed into a fully connected layer containing 128 neurons, then proceeds to an output layer with three neurons and softmax activation for classification into malignant, benign, or normal categories. This final step consolidates the extracted features into a compact representation suitable for classification, enabling the model to make accurate predictions concerning the existence and severity of breast tumors based on the input ultrasound image. The model’s training parameters were updated after each epoch via backpropagation using the categorical cross entropy loss criterion (29).
(28) |
The working algorithm in classifying breast cancer as benign, malignant, or normal, is demonstrated in Algorithm 3.
The model specifications and parameters of the proposed DCNNIMAF classifier have been shown in Table 2.
Parameters | Coefficients |
---|---|
Total Trainable Parameters | 52,427,081 |
Learning Rate | 0.001 |
Epochs | 100 |
Image Shape | (256, 256) |
Batch Size | 16 |
4 Experimental Setup and Results
This section outlines the findings and discussion achieved from training the proposed models. The experiments were conducted in a system with the following specifications: CPU - AMD Ryzen 7 4800H with Radeon Graphics, x86_64 architecture, running at a speed of 3GHz with 8 cores; GPU - NVIDIA GeForce RTX 3050-PCI Bus 1; and 32GB of RAM. These details are summarized in Table 3.
Component | Specification |
---|---|
CPU | AMD Ryzen 7 4800H with Radeon Graphics |
ARCHITECTURE | x86_64 |
BASE SPEED | 3GHz |
CORES | 8 |
GPU | NVIDIA GeForce RTX 3050-PCI Bus 1 |
RAM | 32GB |
4.1 Segmentation Evaluation Metrics
The proposed segmentation framework’s performance was evaluated during the training and validation phase using the following segmentation metrics:
4.1.1 Accuracy
Accuracy measures the proportion of pixels that were classified correctly in the segmentation map compared to the ground truth.
(29) |
4.1.2 IoU Score
The IoU score, often termed the Jaccard index, assesses the intersection of the ground truth mask with the predicted segmentation mask divided by their union. It represents the amount of tumor region correctly segmented regarding the total tumor region (ground truth).
(30) |
4.1.3 Dice Coefficient
The Dice coefficient, often recognized as the Dice similarity index, assesses the overlap between the ground truth and the predicted segmentation mask.
(31) |
Figures 8 and 9 depict the training and validation curves for accuracy and total loss, respectively, obtained while training the proposed segmentation framework. From the graphs, it is evident that the model has achieved a high accuracy of 98.1%, with a minimal loss of 0.06 at the end of 100 epochs. The model also achieved an impressive Dice Coefficient score of 97.2% and an IoU score of 96.9%. The training and validation curves of these metrics have been shown in Figures 10 and 11 respectively.
4.1.4 Performance Evaluation and Discussion
From the segmentation results, it can be inferred that this model has demonstrated impressive performance. The high values obtained from IoU, Dice Coefficient, and Accuracy scores, along with the minimal total loss imply that the InceptionResNet backbone managed to successfully extract important characteristics from input preprocessed images, and the dual-attention mechanism in the decoder blocks helped fine-tune the segmentation maps during segmentation.
Grad-CAMs, which stands for Gradient-weighted Class Activation Mapping, is a method in deep learning used to visualize important regions in an input image that guide the model’s decision-making process [31]. They are particularly useful in understanding how Convolutional Neural Networks (CNNs) make their predictions, especially in tasks like medical image segmentation, where it is necessary to observe if the attention mechanism carries out its operations properly.
The GradCAMs of the attention block at the topmost decoder block, as provided in Figure 12, show how the attention mechanism focuses on specific regions of the feature map, highlighting the importance of these regions for the segmentation task. This visualization helps in understanding how the attention mechanism contributes to the segmentation performance by emphasizing the most relevant features and their spatial locations. From GradCAMs, it can be observed that the attention mechanism progressively shifts its focus towards the tumor region, with an improvement in localization accuracy as the number of training epochs increases.
Performance Scores (in %) | ||
---|---|---|
Segmentation Model | Dice Coefficient (%) | IoU Score (%) |
U-Net [31] | 82.52 | 69.76 |
Res-U-Net [32] | 88.01 | 80.21 |
U-Net with DenseNet backbone [33] | 89.86 | 79.12 |
Multi-scale Fusion U-Net [34] | 95.35 | 91.12 |
Proposed Spatial-Channel Attention LinkNet Framework with InceptionResNet Backbone | 97.20 | 96.91 |
U-Net [32] model attains a Dice coefficient of 82.52% and an IoU score of 69.76%. These scores reflect a foundational capability in segmenting tumors from breast ultrasound images and highlight the model’s limitations in capturing the full extent of tumor boundaries and internal structures, particularly in the nuanced textures and densities often found in breast tissues. Res U-Net [33] enhances the original U-Net with a Dice coefficient of 88% and an IoU score of 80%, demonstrating enhanced performance through the incorporation of residual connections, but further refinements in its network architecture and feature extraction are necessary to achieve optimal segmentation accuracy, especially in dealing with the variable echo intensities and shadowing effects commonly encountered in breast ultrasound imaging. By integrating a DenseNet backbone, the U-Net with DenseNet Backbone [34] reaches a Dice coefficient of 89.8% and an IoU score of 79.1%, showcasing the benefits of dense connectivity in improving segmentation outcomes. However, additional strategies may be required to fully leverage the complex patterns inherent in breast ultrasound images, such as the differentiation between cystic and solid components of tumors, which is critical for accurate diagnosis. The Multi-scale Fusion U-Net [35] achieves a Dice coefficient of 95.35% and an IoU score of 91.12%, marking a significant improvement over earlier models. But it shows suboptimal performance when handling the heterogeneity of breast tissues and the dynamic nature of tumor growth observed in ultrasound sequences. The proposed Spatial-Channel Attention LinkNet Framework with InceptionResNet Backbone stands out with a Dice coefficient of 97.20% and an IoU score of 96.91%. This performance is attributed to the integration of spatial-channel attention mechanisms and the robust InceptionResNet backbone, which together enable precise localization and delineation of tumors, including the ability to distinguish between different types of breast lesions based on their texture, shape, and boundary characteristics.
4.2 Classification Evaluation Metrics
The outcomes of the proposed DCNNIMAF model for breast cancer classification are evaluated during the training and validation phase using the following classification metrics:
4.2.1 Accuracy
Accuracy is a fundamental metric that evaluates the overall performance of a model across all classes. It measures the proportion of true classifications (both true positives and true negatives) in the total images classified, providing a comprehensive view of the model’s effectiveness in correctly classifying instances.
(32) |
Where:
-
•
denotes the number of true positives.
-
•
denotes the number of true negatives.
-
•
denotes the number of false positives.
-
•
denotes the number of false negatives.
-
•
denotes the total number of classes.
4.2.2 Precision
Precision focuses on the proportion of true positive predictions among all positive predictions made by the classifier. It is particularly important in situations where false positives are costly, as it helps in minimizing the impact of false positives on the overall performance of the model.
(33) |
4.2.3 Recall
Recall, also known as sensitivity, measures the ability of the classifier to identify all relevant instances within a specific class. It is crucial in situations where missing a positive instance (false negative) is more detrimental than identifying a negative instance as positive (false positive). Recall helps in ensuring that the model does not overlook any relevant instances.
(34) |
4.2.4 F1-Score
F1-Score combines precision and recall into a single measure, providing a balanced view of the model’s performance. It is useful in scenarios where both false positives and false negatives are equally important, and a balance between these two metrics is desired.
(35) |
The proposed DCNNIMAF classifier was trained for 100 epochs, and the evaluation metrics were recorded after each epoch. The training and validation curves obtained for accuracy coupled with categorical cross-entropy loss have been depicted in Figures 13 and 14 respectively. From the graph plots, it can be seen that the classification model has obtained a high accuracy of 99.2% at a minimal loss of 0.03. Figures 15, 16, and 17 display the training and validation precision, recall, and F1-score curve, respectively. It can be inferred from the graphs, that the proposed model has minimized false positives and false negatives, thereby achieving a remarkable precision of 99.3% and a recall of 99.1%. The high values of precision and recall contribute to the high F1-score value of 99.1%.
4.2.5 Performance Evaluation and Discussion
The normalized confusion matrix obtained on the validation data using the trained DCNNIMAF classification model has been presented in Figure 18. A normalized confusion matrix is a type of confusion matrix where the values are normalized to show proportions or percentages. It is useful for comparing classification performance across classes, since the values are between 0 to 1, making it easy to interpret.
In Figure 18, the normalized confusion matrix depicts the proposed model’s classification performance across the three breast cancer classes: "benign," "normal," and "malignant." Each row corresponds to the actual class, with each column representing the predicted class. The matrix’s values show the proportion of true-class cases that were successfully classified (along the diagonal) or misclassified (off-diagonal).
From the matrix, it can be observed that the model has obtained remarkable accuracy. With most values along the diagonal close to one, it indicates that the majority of the samples were categorized correctly. For the "benign" class, the model had a true positive rate of 0.99, indicating that 99% of benign tumors were properly categorized. In the "normal" class, the true positive rate was 0.98, implying that 98% of normal cases were correctly identified. Similarly, in the "malignant" class, the true positive rate was 0.99, indicating that 99% of malignant tumors were correctly identified. Misclassification errors were minor, with extremely low false positive and false negative rates.
The proposed DCNNIMAF model is compared with other pretrained CNNs, including EfficientNetV2[36], MobileNetV2 [37], [38], NASNetMobile[39], Xception[40], InceptionV3[41], InceptionResNetV2[30], MobileNet[42], VGG16[43], and ResNet50[44]. This comparison aims to provide an overall assessment of the proposed model relative to existing baseline CNNs widely utilized for breast cancer classification. All models, including the proposed one, are trained utilizing the identical dataset, and the outcomes are presented in Table 5. The performance of these models is evaluated based on the following metrics: Accuracy (Acc), Precision (Prec), Recall (Rec), and F1 Score (F1).
Training Phase Metrics | Validation Phase Metrics | |||||||
---|---|---|---|---|---|---|---|---|
Model | Acc | Prec | Rec | F1 | Acc | Prec | Rec | F1 |
EfficientNetV2 | 0.926 | 0.931 | 0.920 | 0.925 | 0.871 | 0.871 | 0.871 | 0.871 |
MobileNetV2 | 0.935 | 0.948 | 0.925 | 0.936 | 0.858 | 0.857 | 0.852 | 0.854 |
DenseNet121 | 0.928 | 0.938 | 0.925 | 0.931 | 0.906 | 0.906 | 0.906 | 0.906 |
NASNetMobile | 0.942 | 0.947 | 0.941 | 0.944 | 0.911 | 0.913 | 0.904 | 0.908 |
Xception | 0.925 | 0.926 | 0.920 | 0.923 | 0.917 | 0.917 | 0.917 | 0.917 |
InceptionV3 | 0.878 | 0.897 | 0.862 | 0.879 | 0.774 | 0.774 | 0.771 | 0.772 |
InceptionResNetV2 | 0.958 | 0.961 | 0.958 | 0.959 | 0.947 | 0.957 | 0.901 | 0.928 |
MobileNet | 0.956 | 0.957 | 0.948 | 0.952 | 0.872 | 0.874 | 0.871 | 0.873 |
VGG16 | 0.86 | 0.889 | 0.841 | 0.864 | 0.861 | 0.877 | 0.803 | 0.838 |
ResNet50 | 0.837 | 0.866 | 0.805 | 0.834 | 0.761 | 0.761 | 0.761 | 0.761 |
DCNNIMAF (Proposed) | 0.989 | 0.994 | 0.992 | 0.993 | 0.992 | 0.993 | 0.991 | 0.991 |
From Table 5, it is evident that the proposed DCNNIMAF model has outperformed all baseline CNN models in terms of performance evaluation metrics. EfficientNetV2 overfits on the data due to difficulty in generalizing the nuanced features of breast cancer like irregular margins of malignant lesions or varying degrees of echogenicity observed in ultrasound images. MobileNetV2’s lightweight architecture struggles with the detailed analysis required to detect early signs of breast cancer, such as subtle changes in echotexture or the presence of microcalcifications within lesions. While DenseNet121 benefits from dense connectivity for feature reuse, its performance in identifying specific breast cancer markers like the orientation and distribution of calcifications or the assessment of lesion vascularity is compromised. NASNetMobile, designed for mobile applications, lacks the precision needed to capture the complex interplay of features indicative of breast cancer, such as the irregular shapes of masses or variations in posterior acoustic shadowing. Xception does not fully exploit the spatial dependencies crucial for identifying specific indicators of breast cancer, such as the pattern of calcifications or the echogenicity of surrounding tissue.
InceptionV3’s design compromise for computational efficiency limits its capacity to analyze the multidimensional data characteristic of breast cancer ultrasound images, particularly in detecting subtle architectural distortions or changes in tissue echotexture. Despite its sophisticated architecture, InceptionResNetV2 does not optimally align with the need to identify specific, disease-related features like the texture and margin irregularities of masses or the presence of ductal abnormalities. MobileNet’s focus on efficiency limits its depth necessary for detailed feature extraction from breast cancer ultrasound images. VGG16’s simplicity and relative shallowness struggles with the detailed analysis required to detect and classify features such as the presence of posterior acoustic enhancement, leading to lower accuracy in validation tests. Features such as the assessment of lesion margins might not be adequately learned due to limitations in the ResNet50’s depth and focus. The proposed DCNNIMAF model distinguishes itself by effectively integrating multiple spatial and self-attention mechanisms, enabling precise identification of critical features such as calcifications, architectural distortions, and mass margins. These enhancements allow the model to capture the complex, heterogeneous pathology of breast cancer evident in ultrasound imagery.
Classification Model | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
---|---|---|---|---|
Fine Tuned VGG16 and Fine Tuned VGG19 ensemble model [45] | 95.29 | 95.46 | 95.20 | 95.29 |
CNN-based Ensemble Learner with MLP meta classifier [46] | 98.08 | 98.41 | 98.82 | 98.81 |
BCCNN [47] | 98.31 | 98.39 | 98.30 | 98.28 |
ResNet50 hybrid with SVM [48] | 97.98 | 96.51 | 97.63 | 95.97 |
Deep CNN with Fuzzy merging [49] | 98.62 | 92.31 | 94.70 | 93.53 |
Xception + SVM R [50] | 96.25 | 96.12 | 96.02 | 96.01 |
Grid-based deep feature generator + DNN classifier [51] | 97.18 | 97.45 | 96.18 | 96.79 |
InceptionV3 with residual connections [52] | 91.03 | 85.05 | 96.01 | 92.02 |
EDLCDS-BCDC [53] | 95.15 | 97.35 | 94.74 | 96.92 |
AlexNet, ResNet50 and MobileNetV2 Hybrid feature extractor + mRMR + SVM [54] | 95.60 | 95.69 | 95.61 | 95.65 |
DCNNIMAF (Proposed) | 99.20 | 99.32 | 99.14 | 99.1 |
From the results presented in Table 6, it is apparent that the DCNNIMAF model proposed in this research outperforms all other models in existing research. The assembly of Fine Tuned VGG16 and VGG19 [45] achieves moderate performance with accuracy and F1-scores around 95%. Its performance is relatively low, indicating potential limitations in its ability to capture the complexity of breast cancer pathology fully. CNN-based Ensemble Learner with MLP Meta Classifier [46] has shown high performance with an accuracy of 98% but has struggled with identifying subtle changes in the irregular shapes of masses. BCCNN [47] shows promising results with metrics around 98%. However, the slight variation in F1-score compared to the highest performers suggests it faces challenges in maintaining a balance between precision and recall, essential for minimizing errors in breast cancer diagnosis.
ResNet50 Hybrid with SVM [48] presents strong recall but exhibits a lower precision score. This discrepancy indicates that while the model is capable of identifying many positive cases, it struggles with accurately distinguishing between benign and malignant lesions, leading to potential false positives. The precision score of Deep CNN with Fuzzy Merging [49] drops significantly highlighting a critical issue in its ability to classify breast cancer cases precisely. This suggests that while the model captures broad patterns effectively, it overlooks finer details necessary for accurate diagnosis. Xception combined with SVM R [50] shows a balanced performance of around 96% but indicates a relative inefficiency in comparison to other models in terms of feature extraction capabilities, leading to inefficiency in real-world use. Grid-based Deep Feature Generator with DNN Classifier [51] demonstrates a high precision score, but the minor discrepancies in recall and F1-score indicate potential inefficiencies in capturing all relevant pathological features, affecting its overall efficacy.
InceptionV3 with Residual Connections [52] achieves a high recall but significantly lower precision, indicating a significant imbalance in its diagnostic capabilities. This suggests challenges in accurately discriminating between similar-looking benign and malignant cases, which is crucial for reducing false positives. EDLCDS-BCDC [53] presents moderate performance across metrics, around 95% to 97%, highlighting potential shortcomings in accurately identifying subtle differences. AlexNet, ResNet50, and MobileNetV2 Hybrid Feature Extractor with mRMR and SVM [54]. shows solid performance with accuracy and F1-scores around 95%. However, its limitations suggest shortcomings in fully adapting to the complex and varied nature of breast cancer pathology, indicating areas for potential enhancement.
The proposed DCNNIMAF model demonstrates remarkable performance across all metrics evaluated, surpassing all other models in this comparison. This can be attributed to its meticulously designed architecture that incorporates advanced feature extraction techniques and multiple attention mechanisms, allowing for the precise and effective identification of the nuanced pathological features associated with breast cancer. This specialized approach ensures not only high accuracy but also maintains excellent precision and recall, showcasing its robustness and reliability in clinical applications for breast cancer classification.
5 Conclusion and Future Direction
The primary objective of this research is to detect and segment tumor regions within breast ultrasound images, subsequently categorizing them as benign, malignant, or normal. The objective of this work is to develop an accurate and efficient system for breast cancer tumor segmentation and classification, aiming to improve diagnosis and treatment outcomes for patients. The proposed segmentation model utilizes an InceptionResNet-based LinkNet framework with an intelligent dual-attention mechanism to precisely segment the tumor region. Leveraging spatial and self-attention mechanisms across multiple layers, the DCNNIMAF classification framework enables accurate classification of breast cancer types or the absence of cancerous conditions. The proposed models have excelled in performance, in comparison to existing works. In segmentation tasks, they showcase exceptional accuracy, IoU score, and Dice coefficient score. Furthermore, the classification metrics reveal impressive accuracy, precision, F1-score, and recall rates. Future work could extend the framework’s utility to other medical imaging modalities, facilitating the detection and classification of abnormalities beyond breast ultrasound images.
References
- [1] B. Stewart and C. Wild, World Cancer Report 2014. Geneva, Switzerland: WHO Press, 2014.
- [2] World Health Organization, “Breast cancer,” http://www.who.int/cancer/prevention/diagnosis-screening/breast-cancer/en/, accessed: 2024-07-03.
- [3] Y. Sun, Z. Zhao, Z. Yang, F. Xu, H. Lu, Z. Zhu, W. Shi, J. Jiang, P. Yao, and H. Zhu, “Risk factors and preventions of breast cancer,” Int J Biol Sci, Nov 2017.
- [4] M. Tarique, F. Elzahra, A. Hateem, and M. Mohammad, “Fourier transform based early detection of breast cancer by mammogram image processing,” J Biomed Eng Med Imaging, vol. 24, p. 17, 2015.
- [5] American Cancer Society, “How is breast cancer diagnosed?” http://www.cancer.org/cancer/breastcancer/detailedguide/breast-cancer-diagnosis, 2014, accessed: September 20, 2017.
- [6] F. Sadoughi, Z. Kazemy, F. Hamedan, L. Owji, M. Rahmanikatigari, and T. Azadboni, “Artificial intelligence methods for the diagnosis of breast cancer by image processing: a review,” Breast Cancer (Dove Med Press), vol. 10, pp. 219–230, Nov 2018.
- [7] J. Benson, I. Jatoi, M. Keisch, F. Esteva, A. Makris, and V. Jordan, “Early breast cancer,” Lancet, vol. 373, no. 9673, pp. 1463–79, Apr 2009.
- [8] G. Litjens, T. Kooi, B. Bejnordi, A. Setio, F. Ciompi, M. Ghafoorian, J. van der Laak, B. van Ginneken, and C. Sánchez, “A survey on deep learning in medical image analysis,” Med Image Anal, vol. 42, pp. 60–88, Dec 2017.
- [9] E. Rashed and M. El Seoud, “Deep learning approach for breast cancer diagnosis,” in Proceedings of the 8th International Conference on Software and Information Engineering, Apr 2019, pp. 243–247.
- [10] S. Ramesh, S. Sasikala, S. Gomathi et al., “Segmentation and classification of breast cancer using novel deep learning architecture,” Neural Comput & Applic, vol. 34, pp. 16 533–16 545, 2022.
- [11] A. Osareh and B. Shadgar, “Machine learning techniques to diagnose breast cancer,” in 2010 5th International Symposium on Health Informatics and Bioinformatics, Ankara, Turkey, 2010, pp. 114–120.
- [12] Y. Li, J. Wu, and Q. Wu, “Classification of breast cancer histology images using multi-size and discriminative patches based on deep learning,” IEEE Access, vol. 7, pp. 21 400–21 408, 2019.
- [13] J. Zheng, D. Lin, Z. Gao, S. Wang, M. He, and J. Fan, “Deep learning assisted efficient adaboost algorithm for breast cancer detection and early diagnosis,” IEEE Access, 2020.
- [14] W. Lotter, A. Diab, B. Haslam et al., “Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach,” Nat Med, vol. 27, pp. 244–249, 2021.
- [15] A. Saber, M. Sakr, O. Abo-Seida, A. Keshk, and H. Chen, “A novel deep-learning model for automatic detection and classification of breast cancer using the transfer-learning technique,” IEEE Access, vol. 9, pp. 71 194–71 209, 2021.
- [16] S. Cho, N. Baek, and K. Park, “Deep learning-based multi-stage segmentation method using ultrasound images for breast cancer diagnosis,” J King Saud Univ - Comput Inf Sci, vol. 34, no. 10, pp. 10 273–10 292, 2022.
- [17] D. Wang, A. Khosravi, R. Gargeya, H. Irshad, and A. Beck, “Deep learning for identifying metastatic breast cancer,” arXiv preprint arXiv:1606.05718, 2016.
- [18] G. Hamed, T. Helmy, H. Badawi, and M. Shawky, “Deep learning in breast cancer detection and classification,” in Advances in Intelligent Systems and Computing, 2020.
- [19] L. Balkenende, J. Teuwen, and R. Mann, “Application of deep learning in breast cancer imaging,” Semin Nucl Med, vol. 52, no. 5, 2022.
- [20] L. Shen, L. Margolies, J. Rothstein et al., “Deep learning to improve breast cancer detection on screening mammography,” Sci Rep, vol. 9, p. 12495, 2019.
- [21] Z. Han, B. Wei, Y. Zheng et al., “Breast cancer multi-classification from histopathological images with structured deep learning model,” Sci Rep, vol. 7, p. 4172, 2017.
- [22] Y. Wang, B. Acs, S. Robertson et al., “Improved breast cancer histological grading using deep learning,” Ann Oncol, vol. 33, no. 1, 2022.
- [23] G. Sizilio, C. Leite, A. Guerreiro, and A. Neto, “Fuzzy method for pre-diagnosis of breast cancer from the fine needle aspirate analysis,” Biomed Eng Online, vol. 11, no. 1, p. 83, 2012.
- [24] M. Sarkar and T. Leong, “Application of k-nearest neighbours algorithm on breast cancer diagnosis problem,” in Proceedings of the AMIA Symposium. American Medical Informatics Association, 2000, pp. 793–797.
- [25] Y. Song, C. Liu, and Z. Wang, “A machine learning approach for accurate annotation of noncoding rnas,” IEEE/ACM Trans Comput Biol Bioinform, vol. 12, no. 3, pp. 551–559, 2014.
- [26] K. Foster, R. Koprowski, and J. Skufca, “Machine learning, medical diagnosis, and biomedical engineering research-commentary,” Biomed Eng Online, vol. 13, no. 1, p. 94, 2014.
- [27] L. Wei, Y. Yang, and R. Nishikawa, “Microcalcification classification assisted by content-based image retrieval for breast cancer diagnosis,” Pattern Recognition, vol. 42, no. 6, pp. 1126–1132, 2009.
- [28] A. Shah, “Breast ultrasound images dataset,” 2020, accessed: 2024-07-03. [Online]. Available: https://www.kaggle.com/datasets/aryashah2k/breast-ultrasound-images-dataset
- [29] A. Chaurasia and E. Culurciello, “Linknet: Exploiting encoder representations for efficient semantic segmentation,” in 2017 IEEE Visual Communications and Image Processing (VCIP). IEEE, 2017, pp. 1–4.
- [30] C. Szegedy, S. Ioffe, and V. Vanhoucke, “Inception-v4, inception-resnet and the impact of residual connections on learning,” CoRR, vol. abs/1602.07261, 2016. [Online]. Available: http://arxiv.org/abs/1602.07261
- [31] R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 618–626.
- [32] R. Almajalid, J. Shan, Y. Du, and M. Zhang, “Development of a deep-learning-based method for breast ultrasound image segmentation,” in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2018, pp. 1103–1108.
- [33] W. Yue, H. Zhang, J. Zhou et al., “Deep learning-based automatic segmentation for size and volumetric measurement of breast cancer on magnetic resonance imaging,” Front Oncol, vol. 12, p. 984626, 2022.
- [34] S. Zhang, M. Liao, J. Wang et al., “Fully automatic tumor segmentation of breast ultrasound images with deep learning,” J Appl Clin Med Phys, vol. 24, p. e13863, 2023.
- [35] J. Li, L. Cheng, T. Xia, H. Ni, and J. Li, “Multi-scale fusion u-net for the segmentation of breast lesions,” IEEE Access, vol. 9, pp. 137 125–137 139, 2021.
- [36] M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” CoRR, vol. abs/2104.00298, 2021. [Online]. Available: https://arxiv.org/abs/2104.00298
- [37] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen, “Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation,” CoRR, vol. abs/1801.04381, 2018. [Online]. Available: http://arxiv.org/abs/1801.04381
- [38] G. Huang, Z. Liu, and K. Weinberger, “Densely connected convolutional networks,” CoRR, vol. abs/1608.06993, 2016. [Online]. Available: http://arxiv.org/abs/1608.06993
- [39] B. Zoph, V. Vasudevan, J. Shlens, and Q. Le, “Learning transferable architectures for scalable image recognition,” CoRR, vol. abs/1707.07012, 2017. [Online]. Available: http://arxiv.org/abs/1707.07012
- [40] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” CoRR, vol. abs/1610.02357, 2016. [Online]. Available: http://arxiv.org/abs/1610.02357
- [41] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” CoRR, vol. abs/1512.00567, 2015. [Online]. Available: http://arxiv.org/abs/1512.00567
- [42] A. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand et al., “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” CoRR, vol. abs/1704.04861, 2017. [Online]. Available: http://arxiv.org/abs/1704.04861
- [43] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv [Cs.CV], vol. abs/1409.1556, 2015. [Online]. Available: http://arxiv.org/abs/1409.1556
- [44] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015. [Online]. Available: http://arxiv.org/abs/1512.03385
- [45] Z. Hameed, S. Zahia, B. Garcia-Zapirain, J. Javier Aguirre, and A. María Vanegas, “Breast cancer histopathology image classification using an ensemble of deep learning models,” Sensors, vol. 20, no. 16, p. 4373, 2020.
- [46] A. Das, M. Mohanty, P. Mallick, P. Tiwari, K. Muhammad, and H. Zhu, “Breast cancer detection using an ensemble deep learning method,” Biomed Signal Process Control, vol. 70, p. 103009, 2021.
- [47] B. Abunasser, M. Al-Hiealy, I. Zaqout, and S. Abu-Naser, “Convolution neural network for breast cancer detection and classification using deep learning,” Asian Pac J Cancer Prev, vol. 24, no. 2, pp. 531–544, 2023.
- [48] W. Salama, A. Elbagoury, and M. Aly, “Novel breast cancer classification framework based on deep learning,” IET Image Process, vol. 14, no. 13, pp. 3254–3259, 2020.
- [49] R. Krithiga and P. Geetha, “Deep learning based breast cancer detection and classification using fuzzy merging techniques,” Mach Vis Appl, vol. 31, p. 63, 2020.
- [50] S. Sharma and S. Kumar, “The xception model: A potential feature extractor in breast cancer histology images classification,” ICT Express, vol. 8, no. 1, pp. 101–108, 2022.
- [51] H. Liu, G. Cui, Y. Luo, Y. Guo, L. Zhao, Y. Wang et al., “Artificial intelligence-based breast cancer diagnosis using ultrasound images and grid-based deep feature generator,” Int J Gen Med, vol. 15, pp. 2271–2282, 2022.
- [52] N. Sirjani, M. Ghelich Oghli, M. Tarzamni, M. Gity, A. Shabanzadeh, P. Ghaderi et al., “A novel deep learning model for breast lesion classification using ultrasound images: A multicenter data evaluation,” Phys Medica, vol. 107, p. 102560, 2023.
- [53] M. Ragab, A. Albukhari, J. Alyami, and R. Mansour, “Ensemble deep-learning-enabled clinical decision support system for breast cancer diagnosis and classification on ultrasound images,” Biology, vol. 11, p. 439, 2022.
- [54] Y. Eroğlu, M. Yildirim, and A. Çinar, “Convolutional neural networks based classification of breast ultrasonography images by hybrid method with respect to benign, malignant, and normal using mrmr,” Comput Biol Med, vol. 133, p. 104407, 2021.