Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Toward_Improving_Breast_Cancer_Classification_Using_an_Adaptive_Voting_Ensemble_Learning_Algorithm

This study presents an adaptive voting ensemble learning algorithm aimed at improving breast cancer classification using the Wisconsin Breast Cancer Diagnostic dataset. The proposed model combines four machine learning classifiers—Extra Trees Classifier, Light Gradient Boosting Machine, Ridge Classifier, and Linear Discriminant Analysis—achieving an accuracy of 97.6%, precision of 96.4%, recall of 100%, and F1 score of 98.1%. The research highlights the effectiveness of ensemble methods in addressing class imbalance and enhancing classification performance compared to individual classifiers.

Uploaded by

reskarthic
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Toward_Improving_Breast_Cancer_Classification_Using_an_Adaptive_Voting_Ensemble_Learning_Algorithm

This study presents an adaptive voting ensemble learning algorithm aimed at improving breast cancer classification using the Wisconsin Breast Cancer Diagnostic dataset. The proposed model combines four machine learning classifiers—Extra Trees Classifier, Light Gradient Boosting Machine, Ridge Classifier, and Linear Discriminant Analysis—achieving an accuracy of 97.6%, precision of 96.4%, recall of 100%, and F1 score of 98.1%. The research highlights the effectiveness of ensemble methods in addressing class imbalance and enhancing classification performance compared to individual classifiers.

Uploaded by

reskarthic
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Received 20 December 2023, accepted 10 January 2024, date of publication 22 January 2024, date of current version 26 January 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3356602

Toward Improving Breast Cancer Classification


Using an Adaptive Voting Ensemble
Learning Algorithm
AMREEN BATOOL 1 AND YUNG-CHEOL BYUN 2
1 Department of Electronic Engineering, Institute of Information Science and Technology, Jeju National University, Jeju-si 63243, South Korea
2 Department of Computer Engineering, Major of Electronic Engineering, Institute of Information Science and Technology, Jeju National University,

Jeju-si 63243, South Korea


Corresponding author: Yung-Cheol Byun (ycb@jejunu.ac.kr)
This work was supported in part by the Ministry of Small and Medium-Sized Enterprises (SMEs) and Startups (MSS), South Korea, under
the Regional Specialized Industry Development Plus Program (Research and Development) Supervised by the Korea Institute for
Advancement of Technology (KIAT) under Grant S3246057; in part by KIAT funded by the Korean Government [Ministry Of Trade,
Industry & Energy (MOTIE)] (The Establishment Project of Industry-University Fusion District) under Grant P0016977; and in part
by the Regional Innovation Strategy (RIS) through the National Research Foundation of Korea (NRF) funded by the Ministry
of Education (MOE).

ABSTRACT Over the past decade, breast cancer has been the most common type of cancer in women.
Different methods were proposed for breast cancer detection. These methods mainly classify and categorize
malignant and Benign tumors. Machine learning is a practical approach for breast cancer classification.
Data mining and classification are effective methods to predict and categorize breast cancer. The optimum
classification for detecting Breast Cancer (BC) is ensemble-based. The ensemble approach involves using
multiple ways to find the best possible solution. This study used the Wisconsin Breast Cancer Diagnostic
(WBCD) dataset. We created a voting ensemble classifier that combines four different machine learning
models: Extra Trees Classifier (ETC), Light Gradient Boosting Machine (LightGBM), Ridge Classifier
(RC), and Linear Discriminant Analysis (LDA). The proposed ELRL-E approach achieved an accuracy
of 97.6%, a precision of 96.4%, a recall of 100%, and an F1 score of 98.1%. Various output evaluations
are used to evaluate the performance and efficiency of the proposed model and other classifiers. Overall,
the recommended strategy performed better. Results are directly compared with the individual classifier
and different recognized state-of-the-art classifiers. The primary objective of this study is to identify the
most influential ensemble machine learning classifier for breast cancer detection and diagnosis in terms of
accuracy and AUC score.

INDEX TERMS Breast cancer, classification, machine learning, voting classifier, ensemble learning.

I. INTRODUCTION women, 43,250. The disease mainly affects women and can
Breast cancer is a disease that grows in the human body be diagnosed at any stage. If diagnosed at an early stage,
through abnormal cells. Men are less affected than women. the survival chances are increased, but in the advanced stage,
Breast cancer cases calculated in women are 287,850 in the survival chances in a breast cancer patient are reduced.
2022 and 2,710 in men, according to the American Cancer There are many types of breast cancer. Breast cancer types
Society (ACS) [1]. However, the breast cancer death rate is also refer to whether it has spread or not and whether it
also high in women; the death rate in men is 530, and in is invasive or non-invasive. Invasive cancer spreads to the
lymph nodes or milk ducts. Lobules to other breast tissues,
The associate editor coordinating the review of this manuscript and whereas Non-invasive ones cannot invade others. Tissues of
approving it for publication was Mohammad Zia Ur Rahman . breast cancer that are non-invasive are called ‘‘in situ’’ and
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
VOLUME 12, 2024 12869
A. Batool, Y.-C. Byun: Toward Improving BC Classification Using an Adaptive Voting Ensemble Learning Algorithm

may remain dormant for an extended period of a lifetime [2]. Classifier (ETC), Light Gradient Boosting Machine
Moreover, breast cancer affects 40.3% of the population (LightGBM), Ridge Classifier (RC), and Linear Dis-
in Indonesia and dies 16.6% of those diagnosed [3], [4]. criminant Analysis (LDA).
Drinking and smoking excessively, as well as an unhealthy • Breast cancer data often has a class imbalance, with a
diet, increase the risk of breast cancer. It is predicted that higher number of benign cases than malignant cases.
breast cancer will increase by 2% in 2030 [5]. Early diagnosis The proposed study could demonstrate how to handle
of BC can significantly improve the prognosis and survival class imbalance and improve classification performance
probability by allowing patients to receive timely clinical effectively.
treatment [6]. In recent literature, classification techniques • The proposed study could compare the performance of
such as RF, SVM, KNN, and XGB classifiers have been the voting ensemble model with other state-of-the-art
used [7]. Several researchers conducted research for the breast cancer classification methods.
prediction of breast cancer using various machine-learning The subsequent sections of this manuscript are structured
techniques. Regarding the researcher’s concern, the RF and as follows: Section I introduction of the manuscript Section II
ET strategies use decision trees as proper classifiers to attain offers an overview of relevant research in breast cancer
the ultimate classification. This work evaluated the quality of classification and ensemble learning. In Section III, we delve
each algorithm data classification [8] in terms of efficiency into the methodology underpinning our adaptive voting
and effectiveness. In [9], the author proposed an ensemble ensemble algorithm, highlighting its adaptive framework
learning-based voting classifier that combines the logistic and distinctive features. Section IV meticulously describes
regression and stochastic gradient descent classifier to detect exploratory data analysis, encompassing dataset details,
breast cancer patients accurately. evaluation metrics, and reference algorithms for performance
Moreover, the motivation of this study is that the ensemble benchmarking. The ensuing Section V discusses the empiri-
classifier method used in the previous research is still limited cal findings, shedding light on the strengths and limitations
to detecting and classifying breast cancer. One of the biggest observed during the evaluation process. Finally, Section VI
challenges in healthcare research is the timely and accurate encapsulates our conclusions, explores the implications of
detection of various diseases [10], [11], [12]. Breast cancer is our research, and outlines potential avenues for future
one of the significant causes of death for women worldwide, exploration.
which has prompted a lot of interest in the health field.
Detection and classification of breast cancer in its early stages II. RELATED WORK
is the primary objective of this study, which uses machine Machine learning algorithms are used to predict an accurate
learning methods for accurate classification and evaluation model for breast cancer, but selecting the best classifier
in terms of accuracy. is a critical challenge. Data scientists produced excellent
Therefore, this article compares the performance of outcomes when they applied different algorithms to various
different classifiers on the breast cancer dataset. While medical datasets [13]. Many scientists have worked on
various machine learning classifiers like RF, SVM, and designing and assessing breast cancer detection methods.
KNN have been explored, the study introduces a novel Many researchers predict breast cancer using multiple
ensemble-based approach, including ETC, LightGBM, RC, machine learning algorithms such as Decision Tree [14],
and LDA. Addressing class imbalance, the research assesses NN [9], RF [15], LR [16], Naïve Bayes [17], SVM [18]
the proposed ensemble against state-of-the-art methods. The In this article, [19] author employed various sorts of
study’s contributions lie in evaluating strategies, offering classifiers. Author [20] conducted a comparative analysis
a novel ensemble framework, handling class imbalance that included several classifiers and anticipated that the SVM
effectively, and comparing the model’s performance. Breast without the rapid co-relation-based Streamlines provides the
cancer detection is a crucial challenge in healthcare as it is maximum accuracy of 97%. The author in [21] uses logistic
one of the leading causes of death for women worldwide. regression for categorization purposes. KNN, SVM, and RFE
This study aims to use machine learning to improve early classifications provide automatic digital data and facts for
detection and refine breast cancer classification methods. breast cancer diagnosis [22]: linear Regression algorithm and
The research will introduce an adaptive voting ensemble Machine Learning train modules to classify a breast cancer
algorithm and thoroughly evaluate its performance. This way, dataset.
it can contribute to better patient outcomes and advance the Moreover, in [21], article classification accuracy is 95%,
field of medical decision-making. and the author achieved the accuracy using texture classi-
The main contributions of the proposed study are given fication and maximum perimeter. The authors of [23] and
below. [24] presented a method for detecting and characterizing
• This study evaluates machine learning approaches and cell structure. The study [25] on breast cancer categorized
algorithms to determine the best strategy for breast as C3 and C4 on fine needle aspiration cytology aims to
cancer classification. correlate with the histopathology examination. This [26]
• Proposed a novel ensemble-based framework for pre- study compared different classification and clustering strate-
dicting breast cancer. Which includes the Extra Trees gies. According to the findings, classification algorithms beat
12870 VOLUME 12, 2024
A. Batool, Y.-C. Byun: Towards Improving BC Classification Using an Adaptive Voting Ensemble Learning Algorithm

clustering methods. Similarly, [27], [28], [29], [30], [31], and was found to be challenging when applied to multi-class
[32] and Bala et al. [33], [34] have elaborated soft computing, classification tasks. These studies contribute insights into the
data mining, and machine learning techniques for diabetes strengths and limitations of various machine learning algo-
and thunderstorm classification, respectively. In [35], the rithms for breast cancer classification, addressing issues such
author compared the Bayesian Network, Random Forest, as feature selection, class imbalance, and the complexities of
and Support Vector Machine algorithms and found that the ensemble methods.
Bayesian Network produced the best results. Bhat et al. [36]
created an algorithm that allows adaptive resonance theory III. MATERIALS AND METHODS
to be used in breast cancer research. The best-performing This section delineates the dataset and classification models
models from previous studies using the Wisconsin Breast employed to enhance classifier compression. The outlined
Cancer Dataset for breast cancer detection are in Table 1. approach, ELRL-E, is illustrated in Figure 1. Our proposed
The table provides a comprehensive overview of studies methodology encompasses four key categories: Preprocess-
conducted in breast cancer classification using various ing, Training data, Ensemble classifiers, and validation. In the
machine learning algorithms. Each row in the table corre- preprocessing, we perform an Exploratory Data Analysis
sponds to a specific research, highlighting the year of the (EDA) analysis process that involves visually and statistically
study, the algorithms employed, the advantages observed, and exploring and summarizing the main characteristics, patterns,
the limitations encountered. In 2021, one study evaluated and relationships within a dataset and extracting features to
the performance of Support Vector Machines (SVM) and find a correlation between features and optimized parameters.
Random Forest (RF) classifiers. The study emphasized using We used the gird search hyperparameter tuning technique
a limited number of features in the classification process. to maximize the model’s performance with the right com-
An investigation into ensemble methods was conducted in bination of hyperparameters. As part of the training section,
2021 using a Stacking Classifier. While this approach showed we trained four models and adapted the training data results
promise in improving classification outcomes, it was noted into ensemble models using the voting classifier.
that the complexity of the ensemble models was substantial.
In 2021, a study explored a combination of classifiers, A. DATASET
including Multi-Layer Perceptron (MLP), Sequential Mini- The dataset is obtained from the Wisconsin Breast Cancer
mal Optimization (SMO), Naïve Bayes (NV), and J48, both Dataset (WBCD) Diagnostic [45].
individually and as ensembles. While ensembles exhibited This dataset contains 569 patients, each characterized
good performance, it was observed that complexity increased by 32 features. The first feature is a unique identifier,
significantly when using more than two ensemble classifiers representing the patient ID in the subsequent 31 instances.
for predictions. In 2023, a study focused on the Averaged The enhanced process that applied the dataset reduced the
Perceptron classifier and its impact on false-positive and number of features. Accordingly, the top 32 features with con-
false-negative predictions. The study highlighted the impor- siderable weight have been selected, and the other features
tance of threshold selection in influencing these prediction (redundant and unweighted) need to be addressed. These
outcomes. features provide real-valued measurements that contribute to
Another investigation in 2023 emphasized the challenges understanding the cell nuclei’s properties. Among the chosen
posed by imbalanced datasets in logistic regression models features are key parameters such as radius mean (f1), texture
for breast cancer classification, which could lead to biased mean (f2), perimeter mean (f3), area mean (f4), smoothness
classification results. In 2022, a study explored using mean (f5), concavity mean (f6), concave point mean (f7),
K-Nearest Neighbors (KNN), Random Forest (RF), and symmetry mean (f8), and fractal dimension mean (f9). These
Naïve Bayes algorithms to detect additional illnesses and pro- features collectively contribute to a nuanced understanding
vide insights into the nature of breast cancer. However, it was of the characteristics and behaviors of cell nuclei, providing
noted that accurately detecting breast cancer remained a valuable insights for further analysis. Each instance in the
challenging task. An approach using AdaBoost and Synthetic dataset is assigned a label indicating whether the breast mass
Minority Over-sampling Technique (SMOTE) was studied in is classified as benign or malignant. A total of 357 are labeled
2022 to address class imbalance. While effective in dealing as benign, indicating non-cancerous conditions. In contrast,
with imbalanced classes, the study acknowledged problems the remaining 212 are labeled as malignant, signifying the
related to classification boundary definitions. In 2023, the presence of cancer. Table 4 explains the feature description.
classification of breast cancer microarray data using Random Table 2 provides detailed information about the dataset,
Forest (RF), Extra Trees (ET), Support Vector Machines including its features and classes. The 70% of the dataset
(SVM), and Cross-Validation (CV) was explored. This study is used for training, while the remaining 30% is kept for
identified limited optimal features that could lead to improved testing. This division ensures that the classification models
classification accuracy. Finally, in the same year, a study are evaluated fairly and comprehensively. In the training
employed Radial Basis Function (RBF) and Support Vector phase, the models study the patterns and relationships within
Machines (SVM) to extract more representative features most of the data, which helps them make predictions on
for breast cancer classification. However, this architecture new, unseen data. The model’s performance is evaluated on

VOLUME 12, 2024 12871


A. Batool, Y.-C. Byun: Toward Improving BC Classification Using an Adaptive Voting Ensemble Learning Algorithm

TABLE 1. Summarising literature related work of breast cancer.

FIGURE 1. Proposed Model Overview: A concise examination of the key components and methodologies employed in the proposed model.
This overview provides a high-level understanding of the model’s structure, algorithms, and intended contributions to the addressed
problem or task.

independent data during testing to determine its effectiveness that correlate with each other. The correlation is individual
and generalizability. between the diagnosis outcome and every dataset attribute.
Figure 2 shows a correlation bar-plot between diagnosis Some of the features are negatively correlated. In this bar
and dataset attributes. The proposed model has 32 attributes graph, the four attributes are as smoothness_se negatively

12872 VOLUME 12, 2024


A. Batool, Y.-C. Byun: Towards Improving BC Classification Using an Adaptive Voting Ensemble Learning Algorithm

TABLE 2. Detailed description of dataset.

FIGURE 2. Correlation Bar-Plot with Target Features. This visual representation illustrates the correlation between
various features.

correlates with the correlation barplot diagnosis, and the • Data Understanding: This step involves understanding
fractal_dimension_mean, symenetry_se, and symmetry_se the structure of the data and the different types
are associated significantly less negatively correlated. Other of available variables. It can include tasks such as
remaining attributes are highly positively correlated. After- reviewing the data dictionary and variable definitions.
ward, metrics such as standard error, mean, and maximum are • Data Cleaning: This step involves identifying and
calculated for the 10 characteristics, resulting in 30 features. correcting errors and inconsistencies in the data. This
Table 3 presents the computed metrics, which represent the can include filling in missing values, removing outliers,
tumor features of the WBCD database. Further information and updating data entry errors.
on these features can be found in [46]. • Data Visualization: This step involves creating visual-
izations of the data to gain insights and identify patterns.
IV. EXPLORATORY DATA ANALYSIS This can include creating histograms, scatter plots,
Exploratory data analysis (EDA) is an important step in and box plots to visualize the distribution of different
understanding and preparing the data for breast cancer variables.
classification. Some of the key steps involved in EDA for • Data Transformation: This step involves transforming
breast cancer classification include: the data to make it suitable for analysis. This can include

VOLUME 12, 2024 12873


A. Batool, Y.-C. Byun: Toward Improving BC Classification Using an Adaptive Voting Ensemble Learning Algorithm

TABLE 3. Detailed description of dataset features.

tasks such as normalizing or scaling the data and creating depend on the task being performed and the type of model
dummy variables for categorical variables. being used. For example, classification models may use
• Feature Selection: This step involves selecting the most metrics such as accuracy, precision, recall, and F1 score,
relevant features for the classification task. This can while regression models may use metrics such as AUC-ROC.
include using correlation and mutual information to
identify the features most strongly correlated with the B. CLASSIFICATION ALGORITHM
outcome variable. The suggested design aims to enhance machine learning
Initially, the average of the distributions for each feature algorithms, establishing an initial breast cancer detection
was used to determine the statistics of 32 features, among model capable of predicting cancer types as benign or
which only 9 attributes were selected and extracted. The malignant [47]. Recent research underscores the effectiveness
graph illustrates each attribute’s standard deviations and of machine and deep learning as precious methods for clas-
ranges for the most significant characteristics from the sifying breast cancer. Using all machine learning classifiers
real-valued dataset. It demonstrates the distributions of the in this experiment produced promising results for predicting
9 attributes with the mean. The distribution is fairly normal breast cancer. Algorithm 1 of this research is presented below
in most of the dataset. We create a histogram to visualize for better understanding.
the correlations between the mean features provided. Fig. 3
displays the relationships between the total and selected
attributes. The relationship between radius and perimeter Algorithm 1 Working Procedure of Breast Cancer
should be linear, while the relationship between radius and Prediction
area should be polynomial. In addition, other characteristics Input: UCIMachine LearningRepository Breast
show linear correlations. We will analyze these characteristics Cancer Dataset
using feature selection to investigate their relationship with Output: Predicted value Malignant or Benign
diagnosis values. Selecting 9 attributes from a pool of 32 for 1. Begin
breast cancer classification involves careful consideration 2. data ← load dataset
of criteria and methods to ensure the chosen features are a. if data. value is equal to NaN or empty
relevant and contribute significantly to the classification b. replace NaN or missing_value
task. Correlation analysis helps identify attributes with a 3. pre-processing:
solid connection to the target variable while avoiding high a. if data. target is equal to M
correlations among selected features to maintain model b. replace M with 0
interpretability. Information gain metrics assist in quantifying d. else
the importance of each attribute, and a variance threshold e. replace B with 1
eliminates low-variance features. Additionally, recursive 4. x ← data.drop[target]
feature elimination (RFE) iteratively selects features based on 5. y ← data.target
their impact on model performance. The selection methods 6. x1, x2, y1, y2 ← split_data of x and y
include filter methods or tree-based techniques. At this stage, 7. model ← train_model using x1and y1
the dataset is split into training and testing the data for 8. predict ← testing_model using x2 and y2
calculating the covariance between the models. 9. s_x ← scaling data of x
10. s_x ← compress s_x data
A. DATA PROCESSING AND PERFORMANCE METRICS 11. apply hyperparameter tuning for each classifier
Data preprocessing is preparing data for use in a machine- 12. classifier ← train_model using s_x
learning model. This can include cleaning and formatting 13. model ← apply_voting_classifier using classifier
the data, filling in missing values, and normalizing the data. 14. predict ← cross_validation with model
Performance metrics are used to evaluate the effectiveness 15. computer performance evaluation metrics
of a machine-learning model. The specific metric used will End

12874 VOLUME 12, 2024


A. Batool, Y.-C. Byun: Towards Improving BC Classification Using an Adaptive Voting Ensemble Learning Algorithm

TABLE 4. Wisconsin breast cancer dataset features.

FIGURE 3. Visualizing the Distribution Histogram of Mean Features in the Dataset. This histogram provides insights into the
distribution patterns of the dataset’s mean features, offering a comprehensive overview of the central tendencies and
variations within the dataset.

1) EXTRA TREE) random cut point [48]. The parent node is divided randomly
Extra trees are the large number of decision trees generated into two selected child nodes. Each child node is repeated
from the training data. A split rule for the root is considered until the leaf node is reached. The majority votes determine
randomly from the root node features k subset, and a partially final predictions. The user selects the top k features used in

VOLUME 12, 2024 12875


A. Batool, Y.-C. Byun: Toward Improving BC Classification Using an Adaptive Voting Ensemble Learning Algorithm

regularization in which model coefficients are penalized to


prevent over-fitting. This classifier converts the target values
to −1 and +1 before treating the problem as a regression in
training data.

4) LINEAR DISCRIMINANT ANALYSIS (LDA)


Linear discriminant analysis is used for classification, and
dimensional reduction is used for supervised classification
problems. The primary purpose of LDA is to maximize
between-class variance and minimize within-the-class vari-
ance through linear discriminant function. In other words, it is
based on the search for variables in a linear combination to
FIGURE 4. Exploring classification with extremely randomized trees. ensure the best distinguishing characteristics for multi-class
labels [51].
z = β1 x1 + β1 x1 + . . . + βd xd (1)
β t µ1 − β t µ2
s(β) = (2)
β t cβ
z1 − z2
S(β) = (3)
z(varianceinthegroup)
β = C −1 µ1 − µ2

(4)
1
C= (N1 C1 + N2 C2 ) (5)
(N1 + N2 )
FIGURE 5. Illuminating predictive power: A closer examination of the The following equations are estimated as the linear coeffi-
light gradient boosting method.
cients and maximize the discriminant function score. In the
equations, the c represents the linear model of the coefficient,
the β, the covariance matrix of the function, and the µ shows
the classification model as a final step. Extra tree predicts the average vector of the function.
the decision in cases of regression or classification shown in
Figure 4. 5) VOTING CLASSIFIER
• Regression: Averaged predictions based on decision Voting classifier is a machine-learning model that trains
trees. an ensemble of various models. The finding of each
• Classification: Tree-based predictions based on majority classifier passed into the voting classifier and predicted the
voting. output class based on the highest voting majority. Voting
ensemble techniques are used in ensemble machine learning
2) LIGHT GRADIENT BOOSTING MACHINE (LIGHTGBM) models to combine predictions from multiple models [52].
LightGBM is a gradient-boosting algorithm based on deci- In our research, we applied the hard voting method, which
sion trees. A regression analysis was used to classify data identifies the class with the highest votes based on the
rank. In training and separating the data from each decision combined predictions of each classifier, as shown in 6.
tree, two strategies can be used: one that focuses on the level Voting ensemble classifiers are used in the context of breast
of the tree and the other that focuses on the tree’s leaves. cancer classification to improve the accuracy and robustness
A level-wise approach grows the tree while maintaining of the classification. In some breast cancer datasets, one
its balance, whereas a leaf-wise method keeps splitting the class may have many more instances than another. This
leaves and reduces the loss, as shown in Figure 5. The can make it difficult for a single classifier to predict both
leaf-wise growing tree structure of LightGBM selects and classes accurately. By combining the predictions of multiple
splits losses in a specific branch based on their contribution to classifiers, the voting ensemble classifier can provide a more
the overall loss. A growing tree-based model with a low error balanced and accurate prediction. In this study, the voting
rate typically learns more quickly [49]. The mainly horizontal ensemble model uses four base classifiers. The Extra Tree
growth of the LightGBM model prevents over-learning. As a (ET) is employed as a meta-classifier. Basic classifiers were
result, large datasets produce better results [50]. initially trained on the base model’s whole training input
data set. The meta-model classifier takes the prediction from
3) RIDGE CLASSIFIER (RC) each base model as its input. The adaptive voting ensemble
Ridge classification is a machine-learning technique for classifier can improve outliers and noisy data robustness. This
analyzing linear discriminant models. It is a type of is because multiple classifiers are trained on the same dataset,

12876 VOLUME 12, 2024


A. Batool, Y.-C. Byun: Towards Improving BC Classification Using an Adaptive Voting Ensemble Learning Algorithm

TABLE 5. Configuration of the proposed model system’s environment


involves the setup process.

V. RESULT AND DISCUSSION


A. PEFORMANCE METRICS
The confusion matrix is the best method for evaluating a
classification model. Observations that the model correctly
predicts are True positive and True negative, while False
FIGURE 6. A voting-based ensemble classifier is compared to the positive and False negative are minimized [54].
performance of multiple classifiers combined into one model. Accuracy
In training models, accuracy represents the degree of
correctness. In other words, it is the ratio of correct
and their predictions are combined in a way that minimizes predictions to all predictions.
the impact of outliers and noisy data [53] Moreover, adaptive TP + TN
voting ensemble classifiers in breast cancer classification Accuracy(success rate) =
TP + TN + FP + FN
can result in improved accuracy, robustness, and balance in
the predictions, making it a useful tool in the analysis and Recall
diagnosis of breast cancer. A false negative is a difference between True Positives and
False Negatives. The recall equation is shown below:
Algorithm 2 Pseudocode of the Voting Ensemble TP
Learning Algorithm TP + FN
1. Input: Input Breast cancer training data Precision
2. Base level classifiers = (ET, LGBM, RC, LDA) Precision measures accuracy in determining the proportion
a. Meta Level Classifier ET of True Positives to all positive predictions. Here is the
3. Output: Trained ensemble classifier precision equation below:
4. Step 1: Train base learner by applying classifiers to Tp
dataset TP + Fp
a. For training set of classifiers, use cross
validation(k-fold) F1 Measure
b. for ■ = < -1 to,. . . , do; where ( =10) In terms of precision and recall, F1 is a harmonic mean.
c. ß =() Hence, it considers false positives and false negatives. It is
5. Step 2: learn a classifier from often more helpful than accuracy in cases where class
6. end for distributions are uneven.
7. Step 3: Training set for the meta-level classifier ET; Precision(P) ∗ Recall(R)
F1 = 2 ∗
a. Train meta classifier ET; Precision (P) + Recall(R)
8. =() AUC It provides an overall performance measure across all
9. Return classification criteria. In other words, ROC/AUC measures a
10. END classifier’s ability to distinguish between
 classes.
Additionally, the true positive rate TPR = TPP , true neg-
   
ative rate TNR = TNN , false positive rate FPR = FPF+T P
,
6) ENVIRONMENT SETUP   N

To carry out our research accurately and efficiently, We cre- and false negative rate FNR = FNF+T N
P
are used to examine
ated specific environments for completing this research work. the proposed approach.
We have provided a detailed presentation of our environment
setup in Table 5, which includes all the intricate details. B. EXPERIMENTAL RESULTS
This approach helped us conduct a thorough exploration and The experiment conducted for breast cancer diagnosis
analysis during the research process, enhancing our findings’ involves the use of the Wisconsin Breast Cancer dataset
reliability and validity. (WBCD), which is split into two subsets: 70% of the data

VOLUME 12, 2024 12877


A. Batool, Y.-C. Byun: Toward Improving BC Classification Using an Adaptive Voting Ensemble Learning Algorithm

is allocated for training. In contrast, the remaining 30% To visually represent the comparison of the proposed
is reserved for testing. The proposed classification model approach with these baseline models, Figure 8 is provided.
undergoes evaluation based on performance metrics such This figure likely depicts a graphical representation, such as
as accuracy, f-score, recall, and precision. The emphasis is a bar chart or line graph, illustrating the overall accuracy of
on predicting optimal features for effective breast cancer each model. The analysis in this figure allows for a quick
detection. and intuitive comparison of the performance of the proposed
A confusion matrix is employed to assess the accuracy of approach against the baseline learners.
the classification model and identify potential issues. This In evaluating model performance on an imbalanced
matrix is beneficial when dealing with datasets with uneven dataset, ROC curves were employed as specific metrics
class distributions, preventing misleading interpretations of due to their effectiveness in assessing the ability to detect
classification accuracy. The evaluation involves analyzing false positives and negatives. The ROC curve is particularly
Figure 7, which shows four confusion matrices for different well-suited for such evaluations. Figure 9, illustrates the
machine learning classifiers: Extra Trees Classifier, LGBM ROC curves and confusion matrix for the proposed ensemble
Classifier, Ridge Classifier, and Linear Discriminant Anal- categorization and an additional ensemble model. Confusion
ysis. A confusion matrix is a table often used to describe Matrix and ROC (Receiver Operating Characteristic) graph
the performance of a classification model on a set of test for ensemble classifiers. The Confusion Matrix illustrates
data for which the actual values are known. Each matrix the performance of these classifiers by depicting the counts
has two rows and two columns, representing the counts of true positive, true negative, false positive, and false
of true negatives, false positives, false negatives, and true negative instances, offering a detailed breakdown of their
positives. These counts are used to calculate performance classification accuracy. Simultaneously, the ROC graph
metrics such as accuracy, precision, recall, and F1 score. The provides a graphical portrayal of the classifiers’ ability to
matrices suggest a binary classification problem with two discriminate between different classes, offering insights into
classes (0 and 1). For instance, the Extra Trees Classifier their overall performance and trade-offs between sensitivity
correctly predicted 60 instances of class 0 (true negatives) and specificity across various classification thresholds. These
and 106 instances of class 1 (true positives) while incorrectly visualizations comprehensively assess the ensemble classi-
predicting 3 instances as class 1 (false positives) and fiers’ effectiveness in handling classification tasks. Notably,
2 instances as class 0 (false negatives). our results indicate that the proposed model achieved the
Notably, the proposed model correctly classifies highest Area Under the Curve (AUC) value, reaching a
106 benign breast cancer samples, contributing significantly perfect score of 1.00.
to overall accuracy. Moreover, compared to other models,
it exhibits fewer errors, highlighting its effectiveness in C. DISCUSSION
improving the breast cancer detection process. In recent years, many researchers have explored different
Classification models are used to predict the best feature. techniques and methodologies to analyze breast cancer.
This model reduces the number of features and can handle Based on our comparative analysis, demonstrated in Table 7,
extensive data for a more accurate prediction of breast cancer. we propose a better method than previous research on the
The evaluation analysis indicates that in Tab 6, the proposed same WBCD dataset, where we employ a sophisticated
approach ELRL-E achieved 97.6% testing accuracy, 96.46% voting ensemble classifier, termed ELRL-E, comprising four
precision, 100% recall, and 98.1% F1 score, which indicates integrated machine learning models: Extra Trees Classifier
that the proposed approach was significantly better and (ETC), Light Gradient Boosting Machine (LightGBM),
outperformed existing ML and ensemble models Ridge Classifier (RC), and Linear Discriminant Analysis
the evaluation results Of the proposed approach with (LDA). Our results demonstrate the promising performance
baseline learning models, specifically Extra Trees (ET), of the ELRL-E approach, achieving an accuracy of 97.6%,
LightGBM, Ridge Classifier (RC), and Linear Discriminant precision of 96.4%, recall of 100%, and an F1 score of
Analysis (LDA). The analysis provides detailed metrics for 98.1%. These metrics surpass the performance of previ-
the ET and LightGBM classifiers, showcasing their accuracy ously employed machine learning and ensemble models.
and F1 scores. Our methodology excels in feature optimization and the
The Extra Trees (ET) classifier achieved an accuracy of strategic use of relevant features, addressing a critical aspect
96.49%, indicating the percentage of correctly classified often overlooked in prior studies. Compared to well-known
instances and an F1 score of 97.24%. The F1 score is a metric classifiers, such as k-NN, NB, and SVM, evaluated by
that balances precision and recall, providing a comprehensive Acquisition et al. (2019) using Weka, our approach cir-
measure of a model’s performance. cumvents challenges in cross-language implementation and
Similarly, the LightGBM classifier demonstrated an accu- integrates seamlessly into the existing architecture. Further-
racy of 95.99% and an F1 score of 96.86%. These metrics more, we contribute to the discourse on ethical implications
collectively convey the model’s effectiveness in accurately and reliability in healthcare applications, as emphasized by
classifying instances and balancing precision and recall. Commission et al. (2019). Our work aims for accuracy and

12878 VOLUME 12, 2024


A. Batool, Y.-C. Byun: Towards Improving BC Classification Using an Adaptive Voting Ensemble Learning Algorithm

FIGURE 7. Exploring the performance through a detailed Confusion Matrix Comparison for four distinct classifiers: Extra Trees (ET),
LightGBM (LGBM), Ridge Classifier (RC), and Linear Discriminant Analysis (LDA).

TABLE 6. Comparison evaluation performance of proposed and baseline ML models.

TABLE 7. Comparison of baseline ML models with existing predicted models.

a comprehensive evaluation of the ensemble model’s efficacy ELRL-E approach against existing state-of-the-art classifiers.
in the critical context of breast cancer detection and diagnosis. Sharma et al. [60] utilized t-SNE and snapshot ensembling,
While Assegie et al. [57] highlighted the significance of acknowledging potential limitations. Sara et al. (2023) This
parameter tuning in a K-Nearest Neighbor (KNN) model, our paper introduces a machine learning CAD system for breast
approach builds on this foundation by integrating multiple cancer classification, leveraging feature selection, PCA, and
classifiers to enhance performance. Jabbar et al. [58] achieved seven ML classifiers. The XGboost model achieved high
a remarkable accuracy of 97%, and we acknowledge their recall for the Mammographic Mass dataset, while AdaBoost
contribution. Still, our study goes beyond by providing a with S-LR excelled for the WBCD dataset. The stacking
robust comparative analysis, showcasing the strengths of the with the logistic regression ensemble model demonstrated

VOLUME 12, 2024 12879


A. Batool, Y.-C. Byun: Toward Improving BC Classification Using an Adaptive Voting Ensemble Learning Algorithm

FIGURE 8. A comprehensive examination of the evaluation scores for the proposed approach is conducted in contrast to those of
established baseline learners. This analysis delves into the numerical metrics and performance indicators of the proposed model,
scrutinizing how it compares to the baseline models—namely, Extra Trees (ET), LightGBM, Ridge Classifier (RC), and Linear
Discriminant Analysis (LDA).

FIGURE 9. This evaluation incorporates a Confusion Matrix to assess the classification accuracy, along with an ROC graph, providing a visual
representation of the ensemble classifiers’ ability to discriminate between classes.

the highest accuracies. However, limitations include dataset- depends on a range of model factors. Ensemble learning
specificity, potential challenges in clinical implementation, generally outperforms a single-base classifier because it
and the simplification of complex decision-making processes combines several independent learning algorithms. Conse-
in the ML application. The comprehensive evaluation, strate- quently, it has gained popularity and proven an effective
gic feature selection, and integration of advanced classifiers machine-learning method. One of the most significant issues
in ELRL-E substantiate its superiority, addressing limitations is finding a way to combine the most accurate base
and significantly advancing breast cancer classification. classifiers. To solve these issues, we propose applying a
unique model known as the ELRL-E model. To select the
VI. CONCLUSION AND FUTURE WORK most practical combination of base classifiers, ELRL-E
Breast cancer is one of the leading causes of death in women; uses various Machine Learning algorithms, including ET,
thus, early identification is critical. Implementing robust LightGBM, RC, and LDA, to classify breast cancer tumors
machine learning classifiers can improve early breast cancer accurately. In addition, we used a voting classifier to analyze
tumor identification. Predictive performance enhancement the significance and effectiveness of the proposed ELRL

12880 VOLUME 12, 2024


A. Batool, Y.-C. Byun: Towards Improving BC Classification Using an Adaptive Voting Ensemble Learning Algorithm

E model. The experiment results show that the proposed [15] B. He, C. Dai, J. Lang, P. Bing, G. Tian, B. Wang, and J. Yang, ‘‘A machine
approach ELRL-E achieves the highest accuracy of 97.66%, learning framework to trace tumor tissue-of-origin of 13 types of cancer
based on DNA somatic mutation,’’ Biochim. et Biophys. Acta (BBA)-Mol.
a precision of 96.43%, and a recall of 100%—F1 score Basis of Disease, vol. 1866, no. 11, 2020, Art. no. 165916.
of 98.18% compared to the other implemented ensemble [16] A. Bharat, N. Pooja, and R. A. Reddy, ‘‘Using machine learning algorithms
models. Furthermore, the experiment results indicate that the for breast cancer risk prediction and diagnosis,’’ in Proc. 3rd Int. Conf.
Circuits, Control, Commun. Comput. (I4C), Oct. 2018, pp. 1–4.
proposed ELRL improved the accuracy compared to the ET,
[17] K. Jain and M. S. Sharma, ‘‘Breast cancer diagnosis using machine earning
LightGBM, RC, and LDA models. Combining models can echniques,’’ Int. J. Innov. Sci., Eng. Technol., vol. 5, no. 5, 2018.
increase diagnosis quality and provide a significant advantage [18] A. S. Elkorany, M. Marey, K. M. Almustafa, and Z. F. Elsharkawy,
over previous work. ‘‘Breast cancer diagnosis using support vector machines optimized by
whale optimization and dragonfly algorithms,’’ IEEE Access, vol. 10,
Moving forward, our future research aims to assess the pp. 69688–69699, 2022.
applicability of the proposed model on diverse disease [19] L. Liu, ‘‘Research on logistic regression algorithm of breast cancer
datasets for comprehensive validation. Acknowledging cur- diagnose data by machine learning,’’ in Proc. Int. Conf. Robots Intell. Syst.
(ICRIS), May 2018, pp. 157–160.
rent limitations, such as evaluating a relatively small dataset,
[20] V. Amudha, R. G. Babu, K. Arunkumar, and A. Karunakaran, ‘‘Machine
it is crucial to extend validation efforts to significantly more learning-based performance comparison of breast cancer detection using
extensive datasets. Moreover, we aim to improve the model’s support vector machine,’’ in Proc. AIP Conf., vol. 2519, no. 1, 2022,
performance by refining hyperparameters and exploring Art. no. 050011.
[21] E. Halim, P. P. Halim, and M. Hebrard, ‘‘Artificial intelligent models for
optimization algorithms considering hyperparameters like breast cancer early detection,’’ in Proc. Int. Conf. Inf. Manage. Technol.
learning rate, tree depth, and regularization, addressing (ICIMTech), Sep. 2018, pp. 517–521.
challenges in tuning for robustness and scalability on larger [22] B. S. Abunasser, M. R. J. Al-Hiealy, I. S. Zaqout, and S. S. Abu-Naser,
‘‘Breast cancer detection and classification using deep learning Xception
datasets. algorithm,’’ Breast Cancer, vol. 13, no. 7, 2022.
[23] D. Albashish, ‘‘Ensemble of adapted convolutional neural networks (CNN)
REFERENCES methods for classifying colon histopathological images,’’ PeerJ Comput.
Sci., vol. 8, p. e1031, Jul. 2022.
[1] Breast Cancer Statistics, NBCC, 2022.
[24] M. Dabass, S. Vashisth, and R. Vig, ‘‘A convolution neural network with
[2] B. He, H. Sun, M. Bao, H. Li, J. He, G. Tian, and B. Wang, ‘‘A cross-
multi-level convolutional and attention learning for classification of cancer
cohort computational framework to trace tumor tissue-of-origin based on
grades and tissue structures in colon histopathological images,’’ Comput.
RNA sequencing,’’ Sci. Rep., vol. 13, no. 1, p. 15356, Sep. 2023.
Biol. Med., vol. 147, Aug. 2022, Art. no. 105680.
[3] W. Wang, R. Jiang, N. Cui, Q. Li, F. Yuan, and Z. Xiao, ‘‘Semi-supervised
[25] P. Goyal, S. Sehgal, S. Ghosh, D. Aggarwal, P. Shukla, A. Kumar,
vision transformer with adaptive token sampling for breast cancer
R. Gupta, and S. Singh, ‘‘Histopathological correlation of atypical (C3)
classification,’’ Frontiers Pharmacol., vol. 13, Jul. 2022, Art. no. 929755.
and suspicious (C4) categories in fine needle aspiration cytology of the
[4] S. Chen, Y. Chen, L. Yu, and X. Hu, ‘‘Overexpression of SOCS4
breast,’’ Int. J. Breast Cancer, vol. 2013, Sep. 2013, Art. no. 965498.
inhibits proliferation and migration of cervical cancer cells by regulating
JAK1/STAT3 signaling pathway,’’ Eur. J. Gynaecol. Oncol., vol. 42, no. 3, [26] R. Mitchell and E. Frank, ‘‘Accelerating the XGBoost algorithm using
pp. 554–560, 2021. GPU computing,’’ PeerJ Comput. Sci., vol. 3, p. e127, Jul. 2017.
[5] I. A. for Research on Cancer, W. H. Organization, Globocan 2012: Cervi- [27] D. Sharma, P. Jain, and D. K. Choubey, ‘‘A comparative study of
cal Cancer-Estimated Incidence, Mortality and Prevalence Worldwide in computational intelligence for identification of breast cancer,’’ in Proc. Int.
2012, 2018. Conf. Mach. Learn., Image Process., Netw. Secur. Data Sci. Springer, 2020,
[6] Z.-R. Jiang, L.-H. Yang, L.-Z. Jin, L.-M. Yi, P.-P. Bing, J. Zhou, and pp. 209–216.
J.-S. Yang, ‘‘Identification of novel cuproptosis-related lncRNA signatures [28] D. K. Choubey, P. Kumar, S. Tripathi, and S. Kumar, ‘‘Performance
to predict the prognosis and immune microenvironment of breast cancer evaluation of classification methods with PCA and PSO for diabetes,’’
patients,’’ Frontiers Oncol., vol. 12, Sep. 2022, Art. no. 988680. Netw. Model. Anal. Health Informat. Bioinf., vol. 9, no. 1, pp. 1–30,
[7] N. Kumar Sinha, ‘‘Developing a web based system for breast cancer Dec. 2020.
prediction using XGboost classifier,’’ Int. J. Eng. Res., vol. V9, no. 6, [29] D. K. Choubey, S. Tripathi, P. Kumar, V. Shukla, and V. K. Dhandhania,
pp. 852–856, Jun. 2020. ‘‘Classification of diabetes by kernel based SVM with PSO,’’ Recent Adv.
[8] M. Umer, M. Naveed, F. Alrowais, A. Ishaq, A. A. Hejaili, S. Alsubai, Comput. Sci. Commun., vol. 14, no. 4, pp. 1242–1255, Jul. 2021.
A. A. Eshmawi, A. Mohamed, and I. Ashraf, ‘‘Breast cancer detection [30] D. K. Choubey, M. Kumar, V. Shukla, S. Tripathi, and V. K. Dhandhania,
using convoluted features and ensemble machine learning algorithm,’’ ‘‘Comparative analysis of classification methods with PCA and LDA
Cancers, vol. 14, no. 23, p. 6015, Dec. 2022. for diabetes,’’ Current Diabetes Rev., vol. 16, no. 8, pp. 833–850,
[9] B. He, Y. Zhang, Z. Zhou, B. Wang, Y. Liang, J. Lang, H. Lin, P. Bing, Sep. 2020.
L. Yu, D. Sun, H. Luo, J. Yang, and G. Tian, ‘‘A neural network framework [31] M. O. Adebiyi, M. O. Arowolo, M. D. Mshelia, and O. O. Olugbara,
for predicting the tissue-of-origin of 15 common cancer types based on ‘‘A linear discriminant analysis and classification model for breast cancer
RNA-Seq data,’’ Frontiers Bioeng. Biotechnol., vol. 8, p. 737, Aug. 2020. diagnosis,’’ Appl. Sci., vol. 12, no. 22, p. 11455, Nov. 2022.
[10] S. A. Yazdan, R. Ahmad, N. Iqbal, A. Rizwan, A. N. Khan, and [32] O. J. Egwom, M. Hassan, J. J. Tanimu, M. Hamada, and O. M. Ogar,
D.-H. Kim, ‘‘An efficient multi-scale convolutional neural network based ‘‘An LDA–SVM machine learning model for breast cancer classification,’’
multi-class brain MRI classification for SaMD,’’ Tomography, vol. 8, no. 4, BioMedInformatics, vol. 2, no. 3, pp. 345–358, Jun. 2022.
pp. 1905–1927, Jul. 2022. [33] K. Bala, D. K. Choubey, and S. Paul, ‘‘Soft computing and data mining
[11] Imran, N. Iqbal, S. Ahmad, and D. H. Kim, ‘‘Health monitoring system for techniques for thunderstorms and lightning prediction: A survey,’’ in Proc.
elderly patients using intelligent task mapping mechanism in closed loop Int. Conf. Electron., Commun. Aerosp. Technol. (ICECA), vol. 1, Apr. 2017,
healthcare environment,’’ Symmetry, vol. 13, no. 2, p. 357, Feb. 2021. pp. 42–46.
[12] J. Zhang, Q. Shen, Y. Ma, L. Liu, W. Jia, L. Chen, and J. Xie, ‘‘Calcium [34] K. Bala, D. K. Choubey, S. Paul, and M. G. N. Lala, ‘‘Classification
homeostasis in Parkinson’s disease: From pathology to treatment,’’ techniques for thunderstorms and lightning prediction: A survey,’’ in Soft-
Neurosci. Bull., vol. 38, no. 10, pp. 1267–1270, Oct. 2022. Computing-Based Nonlinear Control Systems Design. Hershey, PA, USA:
[13] S. Kapoor and A. Narayanan, ‘‘Leakage and the reproducibility crisis IGI Global, 2018, pp. 1–17.
in machine-learning-based science,’’ Patterns, vol. 4, no. 9, Sep. 2023, [35] M. M. Islam, M. R. Haque, H. Iqbal, M. M. Hasan, M. Hasan, and
Art. no. 100804. M. N. Kabir, ‘‘Breast cancer prediction: A comparative study using
[14] Y. Li, ‘‘Performance evaluation of machine learning methods for breast machine learning techniques,’’ Social Netw. Comput. Sci., vol. 1, no. 5,
cancer prediction,’’ Appl. Comput. Math., vol. 7, no. 4, pp. 212–216, 2018. pp. 1–14, Sep. 2020.

VOLUME 12, 2024 12881


A. Batool, Y.-C. Byun: Toward Improving BC Classification Using an Adaptive Voting Ensemble Learning Algorithm

[36] A. Bhardwaj, H. Bhardwaj, A. Sakalle, Z. Uddin, M. Sakalle, and [54] A. Batool and Y.-C. Byun, ‘‘An ensemble architecture based on deep
W. Ibrahim, ‘‘Tree-based and machine learning algorithm analysis learning model for click fraud detection in Pay-Per-Click advertisement
for breast cancer classification,’’ Comput. Intell. Neurosci., vol. 2022, campaign,’’ IEEE Access, vol. 10, pp. 113410–113426, 2022.
Jul. 2022, Art. no. 6715406. [55] B. Akbugday, ‘‘Classification of breast cancer data using machine learning
[37] M. R. Basunia, I. A. Pervin, M. Al Mahmud, S. Saha, and M. Arifuzzaman, algorithms,’’ in Proc. Med. Technol. Congr. (TIPTEKNO), Oct. 2019,
‘‘On predicting and analyzing breast cancer using data mining approach,’’ pp. 1–4.
in Proc. IEEE Region 10 Symp. (TENSYMP), Jun. 2020, pp. 1257–1260. [56] Md. I. H. Showrov, M. T. Islam, Md. D. Hossain, and Md. S. Ahmed,
[38] G. I. Salama, M. Abdelhalim, and M. A. Zeid, ‘‘Breast cancer diagnosis on ‘‘Performance comparison of three classifiers for the classification of
three different datasets using multi-classifiers,’’ Breast Cancer (WDBC), breast cancer dataset,’’ in Proc. 4th Int. Conf. Electr. Inf. Commun. Technol.
vol. 32, no. 569, p. 2, 2012. (EICT), Dec. 2019, pp. 1–5.
[39] V. Birchha and B. Nigam, ‘‘Performance analysis of averaged perceptron [57] T. A. Assegie, ‘‘An optimized K-nearest neighbor based breast cancer
machine learning classifier for breast cancer detection,’’ Proc. Comput. detection,’’ J. Robot. Control (JRC), vol. 2, no. 3, pp. 115–118, 2021.
Sci., vol. 218, pp. 2181–2190, 2023. [58] M. A. Jabbar, ‘‘Breast cancer data classification using ensemble machine
[40] V. A. M. De Barros, H. M. Paiva, and V. T. Hayashi, ‘‘Using PBL and agile learning,’’ Eng. Appl. Sci. Res., vol. 48, no. 1, pp. 65–72, 2021.
to teach artificial intelligence to undergraduate computing students,’’ IEEE [59] T. A. Assegie, R. L. Tulasi, and N. K. Kumar, ‘‘Breast cancer prediction
Access, vol. 11, pp. 77737–77749, 2023. model with decision tree and adaptive boosting,’’ IAES Int. J. Artif. Intell.,
[41] V. D. P. Jasti, A. S. Zamani, K. Arumugam, M. Naved, H. Pallathadka, vol. 10, no. 1, p. 184, 2021.
F. Sammy, A. Raghuvanshi, and K. Kaliyaperumal, ‘‘Computational [60] N. Sharma, K. P. Sharma, M. Mangla, and R. Rani, ‘‘Breast cancer
technique based on machine learning and image processing for medical classification using snapshot ensemble deep learning model and t-
image analysis of breast cancer diagnosis,’’ Secur. Commun. Netw., distributed stochastic neighbor embedding,’’ Multimedia Tools Appl.,
vol. 2022, Mar. 2022, Art. no. 1918379. vol. 82, no. 3, pp. 4011–4029, Jan. 2023.
[42] F. Budiman, I. A. Saputro, P. Purwanto, and P. N. Andono, ‘‘Optimization [61] ‘‘An improved breast cancer disease prediction system using ml and pca
of classification results by minimizing class imbalance on decision tree multimedia tools and applications,’’ 2023.
algorithm,’’ in Proc. Int. Seminar Mach. Learn., Optim., Data Sci.
(ISMODE), Jan. 2022, pp. 6–11.
[43] N. Mohd Ali, R. Besar, and N. A. A. Aziz, ‘‘A case study of microarray
breast cancer classification using machine learning algorithms with grid
search cross validation,’’ Bull. Electr. Eng. Informat., vol. 12, no. 2,
pp. 1047–1054, Apr. 2023. AMREEN BATOOL received the bachelor’s
[44] F. Atban, E. Ekinci, and Z. Garip, ‘‘Traditional machine learning
degree from GC University, Pakistan, the M.C.S.
algorithms for breast cancer image classification with optimized deep
degree from the Virtual University of Pakistan,
features,’’ Biomed. Signal Process. Control, vol. 81, Mar. 2023,
Art. no. 104534.
and the M.S. degree in computer science and
[45] S. W. Wolberg and M. O. William, ‘‘Breast cancer Wisconsin (diagnos- technology from Tiangong University, Tianjin,
tic),’’ UCI Machine Learning Repository, Tech. Rep., 1995. China, in 2021. She is currently pursuing the
[46] L. Dora, S. Agrawal, R. Panda, and A. Abraham, ‘‘Optimal breast cancer Ph.D. degree with the Department of Electronic
classification using Gauss–Newton representation based algorithm,’’ Engineering, Jeju National University, Republic of
Expert Syst. Appl., vol. 85, pp. 134–145, Nov. 2017. Korea. She is also a Project Coordinator with EUT
[47] S. Sharma and S. Deshpande, ‘‘Breast cancer classification using machine Global Ltd. Her primary role is coordinating with
learning algorithms,’’ in Proc. Mach. Learn. Predictive Anal. (ICTIS). clients and field engineers to plan project delivery. Her research interests
Springer, 2021, pp. 571–578. include machine learning, deep learning, and blockchain technology.
[48] G. Alfian, M. Syafrudin, I. Fahrurrozi, N. L. Fitriyani, F. T. D. Atmaji,
T. Widodo, N. Bahiyah, F. Benes, and J. Rhee, ‘‘Predicting breast cancer
from risk factors using SVM and extra-trees-based feature selection
method,’’ Computers, vol. 11, no. 9, p. 136, Sep. 2022.
[49] Q. Chen, Z. Meng, and R. Su, ‘‘WERFE: A gene selection algorithm based
on recursive feature elimination and ensemble strategy,’’ Frontiers Bioeng. YUNG-CHEOL BYUN received the B.S. degree
Biotechnol., vol. 8, p. 496, May 2020. from Jeju National University, in 1993, and the
[50] S. Akbulut, I. Balikci Cicek, and C. Colak, ‘‘Classification of breast M.S. and Ph.D. degrees from Yonsei Univer-
cancer on the strength of potential risk factors with boosting models: A sity, in 1995 and 2001, respectively. He was a
public health informatics application,’’ Med. Bull. Haseki, vol. 60, no. 3, Special Lecturer with SAMSUNG Electronics,
pp. 196–203, Jun. 2022. in 2000 and 2001. From 2001 to 2003, he was
[51] H. Mandelkow, J. A. de Zwart, and J. H. Duyn, ‘‘Linear discriminant
a Senior Researcher with the Electronics and
analysis achieves high classification accuracy for the BOLD fMRI
Telecommunications Research Institute (ETRI).
response to naturalistic movie stimuli,’’ Frontiers Hum. Neurosci., vol. 10,
p. 128, Mar. 2016.
He was promoted to join Jeju National University
[52] H. Jamil, F. Qayyum, N. Iqbal, F. Jamil, and D. H. Kim, ‘‘Optimal as an Assistant Professor, in 2003, where he
ensemble scheme for human activity recognition and floor detection based is currently an Associate Professor with the Computer Engineering
on AutoML and weighted soft voting using smartphone sensors,’’ IEEE Department. His research interests include the areas of AI machine learning,
Sensors J., vol. 23, no. 3, pp. 2878–2890, Feb. 2023. pattern recognition, blockchain and deep learning-based applications, big
[53] S. Chatterjee and Y.-C. Byun, ‘‘Voting ensemble approach for enhancing data and knowledge discovery, time series data analysis and prediction,
Alzheimer’s disease classification,’’ Sensors, vol. 22, no. 19, p. 7661, image processing and medical applications, and recommendation systems.
Oct. 2022.

12882 VOLUME 12, 2024

You might also like