16 The Chi-Square Test o

Multimedia Tools and Applications
https://doi.org/10.1007/s11042-021-11114-5
1211: AIOT SUPPORT AND APPLICATIONS WITH MULTIMEDIA
Predicting breast cancer biopsy outcomes from BI-RADS

findings using random forests with chi-square
and MI features
Sheldon Williamson, et al. [full author details at the end of the article]
Received: 15 October 2020 / Revised: 2 March 2021 / Accepted: 28 May 2021
# The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021
Abstract
To look for early breast cancer signs and indications, mammography screening is
one of the best approaches available. Screening mammograms are the most com-
monly recognized procedure and remain the gold standard for early breast cancer
screening. But many times, a relatively low positive predictive rate of breast biopsy
demonstrated by this diagnostic technique leads to unneeded biopsies for abnormal
findings that are ultimately proven benign in many cases. Random Forest (RF)—
which evolves from Decision Trees (DTs)—is one of the most practical and pow-
erful ensemble learning concepts (or meta estimators). Breast Imaging Reporting
and Data System (BI-RADS) is developed as a standardized system or tool for
reporting breast mammograms. This technique is used to locate unusual findings
into groups. In this study, the RF classifier with Chi-Square (χ2) and Mutual
Information (MI) procedures of relevant Feature Selection (FS) has been applied
successfully, in an attempt to predict cancer biopsy outcomes from BI-RAD findings
and the patient’s age. For validation purposes, the UCI Mammographic Mass dataset
has been used and assessed using accuracy, AUC, and several other performance
criteria through a 10-fold CV approach. The prediction findings from the proposed
method were very encouraging (84.70% accuracy and AUC 0.9023). Similarly, the
proposed system gave better results in terms of MCC and F1-score. The results were
directly compared with the RF classifiers and other state-of-the-art classification
methods. This comparative analysis indicates that the proposed model is superior in
terms of various efficiency indicators to the RF classifier and all standard models
used in the study. These findings also confirm that the χ2 and MI FS approaches
correctly as well as efficiently obtained the relevant and discriminating feature
subset. The result also points out that the suggested approach is a comparable
approach to different classification models present in the relevant literature. It is
an advantageous, practical, and sound method to predict cancer biopsy outcomes.
Keywords Breast cancer . Biopsy . BI-RADS . Random forests . Chi-square test . Mutual
information
1 Introduction
One of the best early detection approaches for breast cancer is a classification based on
mammography images. However, the weak positive predictive rate of breast biopsy arising
from mammogram examination contributes to needless biopsies for abnormal findings that are
ultimately proven benign in many cases. The BI-RADS is a standard system to express and
communicate mammogram findings and outcomes. The research has been carried out to
design and develop an ensemble classification scheme, using proper FS methods and proper
imputation approach, for prediction of cancer biopsy outcome by making use of BIRADS
findings and in order to overcome the issue of “relatively low positive predictive rate of biopsy
demonstrated by mammography, which leads to unneeded biopsies for abnormal findings”. RF
classifier with χ2 and MI attribute selection procedures have been proposed in order to predict
cancer biopsy outcomes from BI-RADS features and patient’s age.
1.1 Mammography and its limitations
A mammogram, a low-energy X-ray imaging test of the human breast, is used by a radiologist
to recognize tumors, abnormal growth, or irregularities in the breast tissue. One particular
advantage arising from this method is that it can be practiced frequently with acceptable side
effects. It is considered as a golden standard for diagnosis of breast cancer as per American
Cancer Society. It is also the most reliable, safest, and most efficient screening tool (modality)
and best available radiological technique to diagnose breast cancer early. Therefore, to test
breast cancer, it is used as the primary tool. It assists doctors in their conclusions about whether
a biopsy is needed or not. Mammography and (if necessary, then) biopsy are the usual
practices in breast cancer diagnosis. But mammogram screening has its limits. The relatively
low positive predictive rate of biopsy outcomes arising from mammography examination
contributes to needless biopsies for abnormal findings that are ultimately proven benign in
many cases. Researchers have also pursued a few important works with the help of some
classification approaches to assist doctors in their findings, to minimize false-positive predic-
tions, and thus to lessen the many needless biopsies [21, 25, 48, 49].
1.2 Breast imaging reporting and data system
This subsection presents BI-RADS and its categories. This is a popular system introduced at
the American College of Radiology for expressions and communications of the mammogram
(also MRI and ultrasound breast imaging) findings and outcomes. It is a quality assurance
scheme used for the categorization of the results from the mammogram screening test precisely
and consistently. It also helps to reduce the confusion in mammogram interpretations and to
improve communication between medical professionals, patients, and radiologists by reducing
the variability in the terminology. The scheme files its findings into categories 0 to 6. Table 1
provides the details [1, 11, 18, 36, 39].
1.3 Motivation
One of the most straightforward and classic supervised classification algorithms are RF
models. They are useful for both classification (prediction of the discrete target class) and
regression (prediction of a continuous value) applications. It has many DT classifiers.
Table 1 Final assessment categories of BI-RADS [36]
Categories Explanation
0 Imaging is incomplete, additional tests are required

BI--
RADS
1 No cancer and no even benign findings
BI--
RADS
2 No cancer but benign findings
BI--
RADS
3 Probably normal benign findings with likelihood of cancer 2%
BI--
RADS
4 Suspicious finding or abnormality with likelihood of cancer 20 to 35% (subcategories: 4A Low / 4B
BI-- Moderate / 4C High)
RADS
5 High suspicion of cancer with likelihood of cancer >=95%
BI--
RADS
6 Known biopsy-proven
BI--
RADS
Although in many cases, DTs do not produce comparable accuracy with other traditional
classifiers (e.g., K Nearest Neighbor (KNN), Support Vector Machine (SVM), etc.) and are
prone to overfitting [35], they serve as the fundamental building block of the firm and
sophisticated RF ensemble classifiers. And the RF ensemble classifiers overcome the draw-
backs of DT classifiers. Due to their robustness, the RF models have also been popularly
applied by many researchers in their study that applied lesion descriptions based on BI-RADS
to predict breast cancer (e.g., on benchmark UCI Mammographic Mass dataset [12, 14]).
Feature Selection (FS) is a procedure for selecting the attributes that contribute most to the
forecast or output variable. The main benefits of FS are that they prevent overfitting and reduce
training times. Thus, in this study, the effectiveness of the RF algorithm with χ2 and MI-based
FS approaches have been analyzed to predict breast cancer using lesion descriptions based on
BI-RADS. Chi-square and MI are popular methods for FS. The simulation and results of this
analysis point out that the technique proposed is a useful and advantageous technique to
predict cancer biopsy outcomes from BI-RAD data.
The rest of the paper is structured as follows: we give detailed literature review in section 2.
Section 3 includes proposed Model along with Random Forest, Preprocessing and Feature
Selection techniques used. The Datasets, Experiments, and Results are presented in Section 4
and the results are discussed in Section 5. Conclusions are presented in Section 6.
2 Literature review
Machine Learning (ML) is increasingly becoming more popular in cancer classification and
diagnosis, according to the literature. There exist many ML-based methods dealing with the
problem of breast lesion classification with the help of BI-RADS. An important benchmark
dataset available with BI-RADS features is the UCI Mammographic Mass dataset (features:
BI-RADS grades, age, and three BI-RADS characteristics) [12, 14]. In this section, some
important such methods are discussed. J. A. Baker et al. (1995), Mia K. Markey et al. (2002),
and Laila Muši’c et al. (2019) have focused on Neural Network [NN] models for breast lesion
classification using the Bi-RADS dataset [3, 40, 42]. The works of C. E. Floyd et al. (2000),
Wolak et al. (2001), and Wolak et al. (2002) were built on Case-Based Reasoning (CBR) [7, 8,
17]. Fischer et al. (2004) applied Bayesian networks for the breast lesion classification [16]. To
predict biopsy results using BI-RADS attributes, M. Elter et al. (2007) presented two ap-
proaches: DT learning and CBR system. They evaluated the performances using bootstrap
sampling, Receiver Operating Characteristic (ROC) curve, and analysis of variance [14].
Benaki Lairenjam et al. (2009) proposed the hybridized approach using Classification based
on Multiple Association Rule (CMAR) algorithm in NN (sigmoidal activation function and
one hidden layer). They considered confidence 50% and support 10%. With 75%–25%
training-testing samples, they reported accuracy of 84.52%, Root Mean Square Error
(RMSE) rate 0.08278, and 0.89 AUC. They compared their model with a Multilayer
Perceptron (MLP) model (MLP: 0.88 AUC, accuracy 81.61%, and RMSE rate of 0.3487)
[32]. Benaki Lairenjam et al. (2010) proposed the associative Naïve Bayes (NB) model for
classifying Mammography Mass dataset [33]. Alaa M. Elsayad (2010) proposed the different
ensemble models of Bayesian classifiers [13]. Benaki Lairenjam et al. (2010) compared four
models: classification based on association, backpropagation NN, Radial Basis Function
(RBF) NN, and CMAR on the Mammographic Mass dataset. They achieved a higher testing
accuracy of 84% with Backpropagation NN based model [34]. Simone A. Ludwig (2010)
proposed normal and distributed genetic programming methods. The normal and distributed
genetic programming methods achieved 0.859 and 0.860 ROC values, respectively [37]. J.
Novakovic et al. (2011) proposed a novel method with rotation forest and PCA [46]. Veena H.
Bhat et al. (2011) applied CART, Genetic Algorithm (GA), and self-organizing map to impute
the missing values and then used the NN model as a classifier [6]. Ismail Saritas (2012) used a
Backpropagation NN based model on the Mammographic Mass dataset. After the removal of
ambiguous data, and with 15% testing - 5% validation - 80% training dataset, the model
yielded 85.5% accuracy, 90% sensitivity, and 81.4% specificity [52]. R. Halawani et al. (2012)
studied various clustering algorithms (Expectation–Maximization (EM) algorithm, hierarchical
clustering, and k-means clustering) with DT ensembles (bagging, boosting) for breast cancer
detection [19]. Huang ML et al. (2012) applied four models: CBR-logit regression, CBR-DTs,
Particle Swarm Optimization-NN, and Adaptive Neuro-Fuzzy Inference System to UCI
Mammographic Mass dataset [23]. Shu-Ting Luo et al. (2012) applied backward and Forward
selection FS approaches and then used C4.5, SVM-Sequential Minimal Optimization (SMO),
and their ensemble classifier for classification [38]. Sahar A. Mokhtar et al. (2013) compared
the performances of DT, NN model, and SVM on Mammographic Masses dataset. With 70%–
30% training-testing samples, their SVM model gave the highest testing accuracy of 81.25%
[41]. Kuntoro Adi Nugroho et al. (2013) proposed a cascade generalization model with a loose
coupling strategy. With a 10 fold CV, they obtained a high ROC-AUC 0.903 and a high
accuracy 83.689% using NB with SMO-SVM cascade and Bayesian network with SMO-SVM
cascade (using tabu search), respectively [47]. Ceren Güzel et al. (2013) applied KNN data
imputation method and NB classifier. They reported 82.49% accuracy [27]. Venu Rathi et al.
(2014) proposed three NN models: MLP, RBF NN, and fuzzy logic with NN model [50]. G.
Ravi Kumar et al. (2014) used a GA for attribute selection and SVM as classifiers [31]. S.
Kharya et al. (2014) proposed Bayesian belief networks for mammographic decision support
[28]. Nithya et al. (2015) compared outcomes of 12 DT classifiers (best-first tree, alternating
DT, CART, reduced error pruning tree, Random Tree (RT), RF, NB tree, Logistic Model Tree,
Decision Stump (DS), alternating DT using the LogitBoost, C4.5, and functional tree). To
evaluate their models, they carried out experiments using 200 training samples and 50 testing
samples. The highest testing accuracy reported was 90% [45]. Hassim et al. (2015) success-
fully applied an integration of Artificial Bee Colony (ABC) and Firefly Algorithm (FA) as a
learning approach for a novel functional link NN. They fruitfully validated the model for
mammographic decision support. Application of the suggested learning strategy confirmed
that the model could effectively undertake the categorization process with more excellent
results on the unobserved dataset. They reported 83.45% accuracy using the proposed model
[20]. Prakash Bethapudi et al. (2015) proposed the fuzzy supervised learning in quest-DT
classifier. Using a three-fold CV, they reported 81. 4% accuracy [5]. Turgay Ibrikci et al.
(2016) applied various ensemble and meta-learning techniques, for example, dagging, boot-
strap aggregating, decorate tree-based ensembles, and filtered classifiers on eight different
medical datasets, including Mammographic Mass dataset [24]. The results demonstrated the
adequacy of the rotation forest in most cases. Divyansh Kaushik et al. (2016) proposed a novel
ensemble of three constituent models: MLP, RF, and RT using the WEKA. They reported the
best AUC 0.907 and test accuracy 83.5% [26]. Zahriah Sahri et al. (2017) applied six
imputation methods (mean, class-conditional, multiple, KNN, NN, and Support-Vector-
Regression (SVR) imputation methods) to impute missing data. And then, different ML
classifiers (SVM, DS, NB, C4.5, RF, and RT) were applied to verify performance improve-
ment. Making use of accuracy as the effectiveness measurement, the empirical outcomes
demonstrated an improvement in the accuracy of many combinations of classifiers and
imputation approaches [61]. Thuy Tuong Nguyen et al. (2017) presented a classification
system with the absent data managing process based on two mechanisms: Kernel Partial Least
Squares (PLS), and kernel PLS discriminant analysis. These approaches were validated on
Mammographic Mass and the other two UCI medical datasets [43]. Mehrbakhsh Nilashi et al.
(2017) used EM clustering and PCA, then applied CART to generate fuzzy rules [44].
Analytical studies on the Mammographic Mass dataset demonstrated that their proposed
techniques notably increase the accuracy of classification. Yuan-Ting Yan proposed the
ensemble of selective NNs [60]. Debby E. Sondakh compared the performance of MLP,
RBF NN, and voted perceptron on the UCI Mammographic Mass dataset and five other
multivariate medical datasets. They reported the highest accuracy of 81.79% using the MLP
model [57]. Mino Assad Eltieb (2018) performed a comparative study of different ML
classifiers (NB, KNN, gradient boosting, AdaBoost) on the Mammographic Mass dataset
and other breast cancer datasets. With the hold-out validation method (80%–20%), they gained
the highest true prediction accuracy of 84.34% through gradient boosting using selected
features [15]. Batuhan Bakirarar et al. (2019) applied two data mining algorithms: MLP and
RF. They reported accuracies of 83.7% and 81.6% using MLP and RF, respectively [4]. Laila
Muši’c et al. (2019) used the NN model with 80 hidden layer nodes for the severity prediction
of a mammographic mass. With the K-fold CV approach, they reported 82.9% accuracy [42].
3 Proposed method
According to the literature available, the low positive predictive rate of breast biopsy resulting
from mammogram analysis leads to unnecessary biopsies for abnormal findings that are
eventually proven benign in many cases. To address this issue and to implement an ensemble
classification scheme with an appropriate FS method for analyzing and predicting breast
cancer biopsy outcomes from BI-RADS findings, the RF classifier with χ2 and MI FS
procedures was proposed in this paper. The UCI Mammographic Mass dataset data set was
used for the validation. An imputation with the KNN algorithm was used to replace missing
attribute values. Min-Max normalization was applied to transform all values linearly to be in
the range between 0 and 1. The technique proposed is schematically presented in Fig. 1. The
key steps of the technique proposed are as given below:
& Handling missing values using KNN imputation.

& Min-Max scaling of the dataset.
& Selection of the relevant features using the χ2and MI FS.
& Classification using RF.
3.1 Preprocessing
3.1.1 KNN imputation
Missing values in the given data set may decrease a classifier’s power/fit or may result in a
biased classifier because the behavior and relationship with other features have been not
adequately evaluated. Therefore, incorrect predictions may result. Handling missing values
is one of the most critical challenges in real-world data applications. Before further progress is
made, many ML methods require the imputation of those missing attribute values. Imputation
is the process of substituting missing values with replaced values. In this study, imputation
with the KNN algorithm is used to replace missing attribute values. It is an effective
nonparametric algorithm that matches the point in multidimensional space with its nearest k
neighbors. This approach can be applied for categorical, ordinal, discrete, or continuous data,
making it especially useful for handling missing values of all types. The assumption that a
point value may be approximated by the points which are nearest to it, depending on other
variables, is the basis behind the use of the KNN. The average value from the k closest
neighbors contained in the given data is used to impute the missingvalue of each instance [59].
This method is currently the simplest as well as the best (computationally cheapest) approach
of imputing unavailable data and natively supported by Sklearn [55].
Fig. 1 Proposed model: RF with χ2and MI FS

3.1.2 Min-max scaler
Feature scaling is a method used for standardizing the range of variables or data features. It is
usually carried out during the preprocessing. Perhaps the most popular and best-known
approach for scale is the MinMaxscaler (min-max normalization). The transformation takes
place through the equation given below:
X −X min ðaxis ¼ 0Þ
X std ¼ ð1Þ
X max ðaxis ¼ 0Þ−X min ðaxis ¼ 0Þ
X scaled ¼ X std ðmax−minÞ þ min ð2Þ

where feature range is [min, max] [56].
3.2 Random forests
Random Forests (RF) are vital classifiers consisting of several DT models. This ensemble
model was introduced to the literature by Leo Breiman (2001) [10]. It is flexible and one of the
most practiced ensemble classifiers, and it has confirmed its high efficiency and supremacy in
many classification applications. it can also be employed for regressions. While constructing
every Individual constituent tree, RF combines two principal ideas: bootstrap aggregating of
DTs [9] and random selection of features [2, 22]. Simply put, every constituent decision tree
learner is constructed using bagging of samples (with replacement) and attribute bagging
(without replacement); therefore, this algorithm has two levels of randomness: row-level and
column level. It determines the most voted class (in case of classification problem) or the mean
of the individual tree predictions (in case of regression problem) as to the final decision. The
algorithm is given below (Algorithm 1) [2, 10, 22].
It can be explained as follows:

Suppose that the training data is given as T with n samples and F features.
& Draw n _ tree number of bootstrap samples from T, where the number of DTs in RF is n _
tree.
& Grow an unpruned tree for every single bootstrap sample with the modifications given
below:
At every node, randomly sample predictors f from the predictors F and select the best
split from among those predictors.
& Make a prediction of test sample xnewby aggregating the prediction outcomes of all n _ tree
trees.
3.2.1 Classification
C rfn tree ðxnew Þ ¼ majorityVotefC i ðxnew Þgn1 tree

ð3Þ
whereCi(xnew) is class label given by ith tree.
3.2.2 Regression
^f n tree ðxnew Þ ¼ 1 n tree ^ new

∑ f ðx Þ ð4Þ
rf
n tree i¼1 i
where ^f i ðxnew Þ is prediction of ith tree.

Figure 2 demonstrates the training and testing phases of the RF.
Fig. 2 Overview of the training and testing phases of the RF

3.3 Feature selection
Feature Selection (FS)—the selection of an appropriate attribute subset—is an essential pre-

processing phase for classification applications in ML and pattern recognition. It aims to
exclude irrelevant attributes. The FS pre-processing not only decreases dimensionality but also
accounts for increased decision-making accuracy. Here, two FS approaches χ2 test, and MI-
based approaches are applied. The forward and backward FS approaches proposed by Shu-
Ting Luo (2012) on Mammographic Mass dataset were also considered for comparison [38].
3.3.1 Chi-Square statistics
Chi-square (χ2) test of independence is a nonparametric test that offers to examine whether
there is a notable association (e.g., whether they are independent or related) between the two
events. In χ2 FS process, (χ2) statistics between every predictor and the output attribute are
measured, and the aspired number of predictors with the top chi-squared score are selected for
model training. More formally, given the data of the two variables, χ2 measures how expected
count and observed count deviates from each other [54]:
ðOi −E i Þ2
χ2c ¼ ∑ ð5Þ
Ei
Where c degree of freedom.
O observed value.
E expected value.
3.3.2 MI statistics
Mutual Information (MI) is the concept of information theory and is used for estimating
mutual dependence between two attributes. The MI between the two discrete variables X and
Y is shown as follows: [30, 51].

P X i; Y j
I ðY : X Þ ¼ ∑ ∑ P X i ; Y j log2 ð6Þ
i j P ðX i Þ P Y j
Mutual information can be applied for univariate attribute selections. For jointly continuous
random features, the double sum is substituted by a double integral. In Eq. (6), P(Yj) and P(Xi)
are marginals, and joint probability distribution function is P(Xi, Yj). The zero MI value means
that the two attributes are independent, and the larger scores indicate a larger dependency
between variables [29, 53].
4 Experiments and results
4.1 Mammographic mass dataset and data preprocessing
The UCI Mammographic Mass dataset was used in this study [12, 14]. The dataset is available
at https://archive.ics.uci.edu/ml/datasets/mammographic+mass. Dataset contains 961 samples
(benign: 516; malignant: 445) and 6 attributes (5 features and 1 output-attribute). For
classification purposes, the output class labels malignant and benign were considered as
positive class and negative class, respectively. This dataset has many missing values among
individual features (details of missing values are provided in Table 2) for which replacement
method called KNN imputation (scikit-learn’s pre-built functionality KNNImputer) was used
(as stated above). Then, Min-Max normalization (scikit-learn’sMinMaxScaler) all values
linearly to be in the range between 0 and 1 [56].
4.2 Feature ranking and selection
Sklearn built-in functionalities for χ2and MI FS were used to find feature scores and then
rankings. Feature scores obtained by χ2and MI methods are given in Table 3. The forward and
backward FS approaches proposed by Shu-Ting Luo (2012) on Mammographic Mass dataset
were also considered for comparison [38]. The feature ranking of features is provided in
Table 4. Every feature is ranked according to χ2and MI methods. The forward and backward
FS method’s rankings are also given. The number indicates the importance of the feature.
‘Mass Density’ is the least significant feature according to all algorithms. Therefore, the
remaining four features were selected for further processing.
4.3 Experimentation
In this subsection, a complete summary of the conducted experimentation and the performance
analysis of the outcomes (using different performance indices) have been presented. The
performance of the proposed model (hereinafter referred to as RF with FS) was compared
with the RF classifier with all features (hereinafter abbreviated as RF without FS). Both
models were validated against UCI Mammographic Mass dataset using a ten-fold CV and
have been evaluated and compared. Models were implemented in Python language; the
available implementations of classification algorithms from the Scikit-Learn (a Python ML
library) was adopted to avoid the chances of programming mistakes [58]. The split selection
was performed based on the decrease of Gini impurity. The grid optimization algorithm was
applied to get the optimum values for hyper-parameters:max_depth (Max. depth of the
DT),max_features (No. of attributes to consider while looking for best split),
min_samples_split (Min. no. of instances needed to split an internal node), n_estimators
(No. of DTs), and min_samples _leaf (Min. no. of instances needed to be at a leaf node).
The remaining hyper-parameters were set according to the default settings. Table 5 shows the
comparative performance between the RF without FS and RF with FS.
Further, Fig. 3 gives a comparison between them with regard to accuracy, specificity, and
sensitivity. In addition, Fig. 4 gives a comparison between them in terms of MCC, F1-score,
and other performance indices. Figures 5 and 6 give the AUC and ROC curves of both
approaches.
Table 2 Missing attribute values in the Mammographic Mass dataset [12, 14]
Attributes BIRADS† Density Shape Margin Age Severity∥
Missing values 2 76 31 48 5 0
Total missing values = 162; †BIRADS assessment; ∥output-attribute/goal field

Table 3 Feature scores
FS method Features
BIRADS† Age Shape Margin Density
χ2 FS 0.6759 12.8045 86.528 105.62 0.08812

MI FS 0.2231 0.1091 0.1933 0.2079 0.0084
†BIRADS assessment
Besides, the results of proposed model were directly contrasted with traditional and
standard classifiers in terms of different performance indices on Mammographic Mass Data-
set. This comparative analysis are provided in tabular (Table 6) and graphical format (Figs. 7,
8, and 9). These classifiers were implemented in Matlab. The parameter values and other
details of these established state-of-the-art classifiers are as follows:
& Coarse Tree - Maximum number of splits = 4, Split criterion = Gini diversity index
& Medium Tree- Maximum number of splits = 20, Split criterion = Gini diversity index
& Fine Tree- Maximum number of splits = 100, Split criterion = Gini diversity index
& Linear Discriminant- Covariance structure = full
& Quadratic Discriminant- Covariance structure = full
& Logistic Regression- Link function = logit, Distribution = binomial
& Linear SVM- Box constraint level = 1, Kernel: Linear
& Quadratic SVM- Box constraint level = 1, Kernel: Quadratic
& Cubic SVM- Box constraint level = 1, Kernel: Cubic
& Fine Gaussian SVM- Kernel scale = 1.4, Kernel: Gaussian
& Medium Gaussian SVM- Kernel scale = 5.5, Kernel: Gaussian
& Coarse Gaussian SVM- Kernel scale = 22, Kernel: Gaussian
& Fine KNN- Number of neighbors = 1, Distance metric: Euclidean, Distance weight: equal
& Medium KNN- Number of neighbors = 10, Distance metric: Euclidean, Distance weight:
equal
& Coarse KNN- Number of neighbors = 100, Distance metric: Euclidean, Distance weight:
equal
& Weighted KNN- Number of neighbors = 10, Distance metric: Euclidean, Distance weight:
= squared inverse
& Cosine KNN- Number of neighbors = 10, Distance metric: Cosine, Distance weight: equal
& Cubic KNN- Number of neighbors = 10, Distance metric: Minkowski, Distance weight:
equal
Table 4 Selected features using different methods
FS method Features
BIRADS† Margin Age Shape Density
χ2 FS 4 1 3 2 5*
MI FS 1 2 4 3 5*
Forward FS [38] 4 1 2 3 5*
Backward FS [38] 4 1 3 2 5*
Omitted feature is indicates by *; †BIRADS assessment

Table 5 Performance comparison between RF without FS and RF with FS models on UCI Mammographic Mass
dataset
Methods RF without FS RF with FS
Accuracy 83.87 84.7

Sensitivity 82.7 83.37
Specificity 84.88 85.85
Precision 0.8251 0.8356
Recall 0.827 0.8337
FPR 0.1512 0.1415
FNR 0.173 0.1663
NPV 0.8505 0.8569
FDR 0.1749 0.1644
% FOR 0.1495 0.1431
F1-score 0.826 0.8346
MCC 0.6757 0.6923
AUC 0.8967 0.9023
Bold emphasis indicates best performance
Since there is no universal or general assessment protocol for this UCI dataset, the previous
related works provided in the literature have followed varied performance evaluation proto-
cols, leading to different issues in contrasting and evaluating overall performance across the
researches. Therefore, previous studies that have protocols of performance evaluation closest
to 10-fold CV (used in this study) were considered for comparison. This comparative analysis
is given in Table 7.These findings and analyses confirm that the proposed approach demon-
strated better performance.
Fig. 3 Comparison between RF without FS and RF with FS in terms of accuracy, specificity (TNR), and
sensitivity (TPR)
Fig. 4 Comparison between RF without FS and RF with FS in terms of precision, recall, FPR, FNR, NPV, FDR,
F1-score, and MCC
5 Discussion on results
In this study, RF with χ2and MI FS methods has been applied for the cancer biopsy outcome
prediction using BI-RADS data. Used dataset contains many missing values; the KNN
imputation method was used to replace missing values. ‘Mass Density’ was the least
Fig. 5 ROC curve and AUC: RF without FS

Fig. 6 ROC curve and AUC: RF with FS
significant feature according to different FS algorithms. Therefore, the remaining four features
were selected for further processing. The results were directly compared with the RF without
the FS model, established state-of-the-art classifiers, and the previously reported findings.
Table 6 Performance comparison with established state-of-the-art classifiers (10 fold-CV)
Classifier Accu- Sensi- Speci- AUC Precision Recall F- MCC

racy tivity ficity measure
Coarse Tree 82.41 80.9 83.72 0.86 0.811 0.809 0.81 0.646
Medium Tree 81.69 82.7 80.81 0.87 0.788 0.827 0.807 0.634
Fine Tree 78.25 77.75 78.68 0.83 0.759 0.778 0.768 0.564
Linear Discriminant 80.44 84.72 76.74 0.87 0.759 0.847 0.8 0.613
Quadratic Discriminant 80.54 76.63 83.91 0.87 0.804 0.766 0.785 0.608
Logistic Regression 82.52 81.8 83.14 0.89 0.807 0.818 0.813 0.649
Linear SVM 80.54 73.71 86.43 0.87 0.824 0.737 0.778 0.609
Quadratic SVM 83.45 80.45 86.05 0.89 0.833 0.804 0.818 0.667
Cubic SVM 59.21 37.08 78.29 0.59 0.596 0.371 0.457 0.169
Fine Gaussian SVM 78.98 75.73 81.78 0.85 0.782 0.757 0.769 0.577
Medium Gaussian SVM 83.04 82.25 83.72 0.89 0.813 0.822 0.818 0.659
Coarse Gaussian SVM 81.27 84.49 78.49 0.89 0.772 0.845 0.807 0.628
Fine KNN 73.15 71.01 75 0.73 0.71 0.71 0.71 0.46
Medium KNN 79.6 77.08 81.78 0.88 0.785 0.771 0.778 0.589
Coarse KNN 80.96 82.7 79.46 0.88 0.776 0.827 0.801 0.62
Cosine KNN 79.81 78.88 80.62 0.88 0.778 0.789 0.783 0.594
Cubic KNN 78.88 75.73 81.59 0.87 0.78 0.757 0.769 0.575
Weighted KNN 75.86 74.16 77.33 0.82 0.738 0.742 0.74 0.515
RF without FS (this 83.87 82.7 84.88 0.8967 0.8251 0.827 0.826 0.6757
study)
RF with FS (this study) 84.7 83.37 85.85 0.9023 0.8356 0.8337 0.8346 0.6923
Bold face indicates best performance

Fig. 7 Comparison of the RF with FS (and without FS) model with established state-of-the-art classifiers:
accuracy, sensitivity, and specificity
Fig. 8 Comparison of the RF with FS (and without FS) model with established state-of-the-art classifiers:
precision and recall
Fig. 9 Comparison of the RF with FS (and without FS) model with established state-of-the-art classifiers: F-
measure, MCC, and AUC
With RF without FS model, 83.87% classification accuracy, 82.7% sensitivity, 84.88%

specificity, 0.8251 recall, 0.826 F-score, and 0.6757 MCC were obtained. 0.8967 AUC also
was reported. It outperformed all established state-of-the-art classifiers. However, when
comparing the results with the older studies, it must be pointed out that the bagging method
proposed by Yan YT et al. (2017) gave better results (84%) than the RF without FS model.
Yan YT et al. (2017) removed samples with missing values from the dataset, whereas in this
study, for both models (RF with FS, RF without FS), KNN imputation was used for handling
missing values. According to literature, eliminating samples with missing value can generate
bias in the classification. That is to say, compared to the method by Yan YT et al. (2017), both
models (RF without FS and RF with FS) are bias-free. In another study, ZahriahSahri et al.
(2017) applied six imputation methods (Class-conditional, Mean, KNN, Multiple, NN, and
SVR) to replace missing values present in the UCI Mammographic Mass dataset and then
different ML classifiers (DS, NB classifier, C4.5, RT, RF, and SVM) to verify performance
improvement [61]. Two approaches, NB with KNN imputation and NB with class-conditional
imputation, brought the best accuracy across these approaches. These two approaches also
outperform the RF without FS model.
However, with the proposed method (RF with FS model), improved outcomes have been
obtained (i.e., 84.7% classification accuracy, 83.37% sensitivity, 85.85% specificity, 0.8337 recall,
0.8346 F-score, 0.6923 MCC, and 0.9023 AUC). The model performed well, giving goodresults.
This comparative analysis given above shows the proposed method yields superior results than the
RF (without FS) method. This suggests that the proposed χ2 and MI FS methods are a very useful
approach for relevant FS. Moreover, the comparative analysis (Table 6) demonstrates the potential
superiority of the model, proposed in this study, over the established state-of-the-art classification
Table 7 Comparison with other methods in the literature on Mammographic Mass dataset (Results based on 10-
fold CV)
Methodology[Ref.] Accuracy %
NB [61] 82.42
C4.5 [61] 75.45
DS [61] 81.79
RT [61] 78.98
RF [61] 82.2
SVM [61] 82.56
Class-conditional imputation + NB [61] 84.4
Class-conditional imputation + C4.5 [61] 75.96
Class-conditional imputation + DS [61] 82
Class-conditional imputation + RT [61] 78.26
Class-conditional imputation + RF [61] 82.73
Class-conditional imputation + SVM [61] 82.94
Mean imputation + NB [61] 82.42
Mean imputation + C4.5 [61] 74.61
Mean imputation + DS [61] 82
Mean imputation + RT [61] 79.92
Mean imputation + RF [61] 82.73
Mean imputation + SVM [61] 82.94
Multiple imputation + NB [61] 82.42
Multiple imputation + C4.5 [61] 74.61
Multiple imputation + DS [61] 82
Multiple imputation + RT [61] 79.92
Multiple imputation + RF [61] 82.1
Multiple imputation + SVM [61] 83.2
KNN imputation + NB [61] 84.29
KNN imputation + C4.5 [61] 76.17
KNN imputation + DS [61] 82
KNN imputation + RT [61] 79.3
KNN imputation + RF [61] 82.63
KNN imputation + SVM [61] 83.05
NN imputation + NB [61] 83.15
NN imputation + C4.5 [61] 79.29
NN imputation + DS [61] 81.69
NN imputation + RT [61] 81.69
NN imputation + RF [61] 82.11
NN imputation + SVM [61] 83.03
SVR imputation + NB [61] 83.04
SVR imputation + C4.5 [61] 81.48
SVR imputation + DS [61] 81.69
SVR imputation + RT [61] 80.44
SVR imputation + RF [61] 82.63
SVR imputation + SVM [61] 83.77
MLP [26] 82.9
RF [26] 82.8
RT [26] 83.1
Ensemble Classifier of MLP, RF and RT [26] 83.5
DT [38] 83.1
Bagging DT [38] 83.3
AdaboostM1 DT [38] 80.8
MultiBoosting DT [38] 82.1
SVM- SMO [38] 81.2
Bagging SVM-SMO [38] 81.8
AdaboostM1 SVM-SMO [38] 80.8
MultiBoosting SVM-SMO [38] 81.1
Forward/backward FS +DT [38] 83.1
Table 7 (continued)
Methodology[Ref.] Accuracy %
Forward/backward FS +Bagging DT [38] 83.4

Forward/backward FS +AdaboostM1 DT [38] 81.5
Forward/backward FS +MultiBoosting DT [38] 82.2
Forward/backward FS +SVM-SMO [38] 81.2
Forward/backward FS +Bagging SVM-SMO [38] 82
Forward/backward FS +AdaboostM1 SVM-SMO [38] 81.3
Forward/backward FS +MultiBoosting SVM-SMO [38] 81.2
NB [27] 81.69
KNN [27] 80.96
NB imputation + NB [27] 81.73
NB imputation + KNN [27] 80.43
KNN imputation + NB [27] 82.49
KNN imputation + KNN [27] 80.43
MLP [4] 80.9
RF [4] 80.4
MLP-Backpropagation [20] 69.82
Functional Link NN [20] 58.5
FLNN + ABC [20] 81.46
FLNN + Modified ABC [20] 82.68
FLNN + Modified ABC- FA [20] 83.45
Selective NN ensemble [60] 83.4
Multi-granulation ensemble [60] 83.6
Neural Network ensemble [60] 80.1
Mean feature value for missing + Bagging [60] 82.9
Removing the samples with missing values + Bagging [60] 84
RF without FS (this study) 83.87
Proposed RF with FS (this study) 84.7
Bold face indicates best performance
approaches in terms of various indices. In addition, when comparing the results of the proposed
model with the older studies, the results demonstrate the potential superiority of the proposed model
over the established methods (used for comparison).
6 Conclusion
One of the best early detection approaches for breast cancer is a classification based on
mammography images. It is the most efficient and practical method. Nonetheless, the weak
positive predictive rate of biopsy arising from mammogram examination contributes to
needless biopsies for abnormal findings that are ultimately proven benign in many cases. In
this study, a predictive model of RF classifier with FS was implemented on the real datasets
containing BI-RADS evaluation, age, and three other BI-RADS features for prediction of
cancer biopsy outcome and to minimize the false positives predictions. Because the Mammo-
graphic Mass dataset from UCI repository contains missing values and lacks sufficient samples
to adequately diagnose, the KNN based imputation method for replacement of missing entries
was employed instead of eliminating those values. The performance of the classifier with and
without selected features has been evaluated using different performance indices, AUC values,
and ROC charts. This comparison confirms that the prediction accuracy improves after the
elimination of the ‘Mass Density’ attribute. Thus, the χ2 and MI-based FS methods helped to
improve classifier accuracy by choosing the best subset of features. The model achieved better
classification accuracy compared to the other current techniques present and many established
state-of-the-art classifiers. The experimental outcomes and comparative studies confirmed that
the approach proposed is certainly an excellent and effective model to predict the severity of
the breast masses using BI-RADS features. In summary, the results demonstrate this model is
an advantageous, practical, and sound method for the prediction of breast cancer biopsy
outcomes and to minimize the false positives predictions. The proposed models can serve as
a second opinion tool to healthcare experts.
References
1. A. C. of Radiology (ACR). (2003) Breast imaging reporting and data system atlas (bi-rads atlas). reston, va:
© american college of radiology.
2. Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9(7):
1545–1588
3. Baker JA, Kornguth PJ, Lo JY, Williford ME, Floyd CE Jr (1995) Breast cancer: prediction with artificial
neural network based on bi-rads standardized lexicon. Radiology 196(3):817–822
4. Bakirarar B, ˙Kar I, Gökmen D, Elhan AH, Genç V (2019) The prediction of breast biopsy outcomes using
two data mining algorithms based on parameter variations. Turkiye Klinikleri Journal of Biostatistics 11(2)
5. Bethapudi P, Reddy ES, Varma KV (2015) Classification of breast cancer using gini index based fuzzy supervised
learning in quest decision tree algorithm. International Journal of Computer Applications 975:8887
6. Bhat VH, Rao PG, Krishna S, Shenoy PD, Venugopal K, Patnaik LM (2011) An efficient framework for
prediction in healthcare data using soft computing techniques, in International Conference on Advances in
Computing and Communications. Springer, pp. 522–532.
7. Bilska-Wolak AO, Floyd Jr CE (2001) Investigating different similarity measures for a case-based
reasoning classifier to predict breast cancer, in Medical Imaging 2001: Image Processing, vol. 4322.
International Society for Optics and Photonics, pp. 1862–1866
8. Bilska-Wolak AO, Floyd CE Jr (2002) Development and evaluation of a case-based reasoning classifier for
prediction of breast biopsy outcome with bi-rads™ lexicon. Med Phys 29(9):2090–2100
9. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
10. Breiman L (2001) Random forests. Mach Learn 45(1):5–32
11. D’Orsi C, Bassett L, Feig S et al (2018) Breast imaging reporting and data system (bi-rads). Breast Imaging.
In: Lee CI, Lehman CD, Bassett LW (eds) . Oxford University Press, New York
12. Dua C, Dheeru, Graff (2019) UCI machine learning repository. [Online]. Available: http://archive.ics.uci.
edu/ml
13. Elsayad AM (2010) Predicting the severity of breast masses with ensemble of bayesian classifiers. J Comput
Sci 6(5):576
14. Elter M, Schulz-Wendtland R, Wittenberg T (2007) The prediction of breast cancer biopsy outcomes using
two cad approaches that both emphasize an intelligible decision process. Med Phys 34(11):4164–4172
15. Eltieb MA et al (2018) A comparative study of machine learning algorithms to predict Brest cancer. Sudan
University of Science & Technology, Ph.D. dissertation
16. Fischer E, Lo J, Markey M (2004) Bayesian networks of bi-rads/spl trade/descriptors for breast lesion
classification, in The 26th Annual International Conference of the IEEE Engineering in Medicine and
Biology Society, vol. 2. IEEE, pp. 3031–3034.
17. Floyd CE Jr, Lo JY, Tourassi GD (2000) Case-based reasoning computer algorithm that uses mammo-
graphic findings for breast biopsy decisions. Am J Roentgenol 175(5):1347–1352
18. Gastounioti A, McCarthy AM, Pantalone L, Synnestvedt M, Kontos D, Conant EF (2019) Effect of
mammographic screening modality on breast density assessment: digital mammography versus digital
breast tomosynthesis. Radiology 291(2):320–327
19. Halawani S, Alhaddad M, Ahmad A (2012) A study of digital mammograms by using clustering algorithms
20. Hassim YMM, Ghazali R (2015) Improving functional link neural network learning scheme for mammo-
graphic classification, in International Workshop on Neural Networks. Springer, pp. 213–221.
21. Heine JJ, Deans SR, Cullers DK, Stauduhar R, Clarke LP (1997) Multiresolution statistical analysis of high-
resolution digital mammograms. IEEE Trans Med Imaging 16(5):503–515
22. Ho TK (1995) Random decision forests. Proceedings of 3rd international conference on document analysis
and recognition 1. IEEE:278–282
23. Huang M-L, Hung Y-H, Lee W-M, Li R, Wang T-H (2012) Usage of casebased reasoning, neural network
and adaptive neuro-fuzzy inference system classification techniques in breast cancer dataset classification
diagnosis. J Med Syst 36(2):407–414
24. Ibrikci T, Karabulut EM, Uwisengeyimana JD (2016) Meta learning on small biomedical datasets, in
Information Science and Applications (ICISA) 2016. Springer, pp. 933–939.
25. Karssemeijer N (1993) Adaptive noise equalization and recognition of microcalcification clusters in
mammograms. Int J Pattern Recognit Artif Intell 7(06):1357–1376
26. Kaushik D, Kaur K (2016) Application of data mining for high accuracy prediction of breast tissue biopsy
results, in 2016 Third International Conference on Digital Information Processing, Data Mining, and
Wireless Communications (DIPDMWC). IEEE, pp. 40–45.
27. Kaya M, Yıldız O, Bilge HS (2013) Breast cancer diagnosis based on naïve bayes machine learning
classifier with knn missing data imputation. Global Journal on Technology 4(2)
28. Kharya S, Agrawal S, Soni S (2014) Using bayesian belief networks for prognosis & diagnosis of breast
cancer. IJARCCE 3:5423–5427
29. Kozachenko L, Leonenko NN (1987) Sample estimate of the entropy of a random vector. Problemy
Peredachi Informatsii 23(2):9–16
30. Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Physical review E 69(6):
066138
31. Kumar GR, Ramachandra G, Nagamani K (2014) An efficient feature selection system to integrating svm
with genetic algorithm for large medical datasets. Int J 4(2):272–277
32. B. Lairenjam and S. K. Wasan (2009) Neural network with classification based on multiple association rule
for classifying mammographic data, in International Conference on Intelligent Data Engineering and
Automated Learning. Springer, pp. 465–476.
33. Lairenjam B, Wasan SK (2010) Naïve bayes associative classification of mammographic data, in 2010
International Conference on Educational and Network Technology. IEEE, pp. 276–281.
34. Lairenjam B, Wasan SK (2010) A note on analysis of mammography data. Int J Open Problems Compt
Math 3(5)
35. Liberman N (2017) Decision trees and random forests, 01 2017. [Online]. Available: https://
towardsdatascience.com/decision-trees-and-random-forests-df0c3 123f991
36. Liberman L, Menell JH (2002) Breast imaging reporting and data system (bi-rads). Radiologic Clinics
40(3):409–430
37. Ludwig SA (2010) Prediction of breast cancer biopsy outcomes using a distributed genetic programming
approach, in Proceedings of the 1st ACM International Health Informatics Symposium, pp. 694–699.
38. Luo S-T, Cheng B-W (2012) Diagnosing breast masses in digital mammography using feature selection and
ensemble methods. J Med Syst 36(2):569–577
39. Malmartel A, Tron A, Caulliez S (2019) Accuracy of clinical breast examination’s abnormalities for breast
cancer screening: cross-sectional study. European Journal of Obstetrics & Gynecology and Reproductive
Biology 237:1–6
40. Markey MK, Lo JY, Vargas-Voracek R, Tourassi GD, Floyd CE Jr (2002) Perceptron error surface analysis:
a case study in breast cancer diagnosis. Comput Biol Med 32(2):99–109
41. Mokhtar SA, Elsayad A et al. (2013) Predicting the severity of breast masses with data mining methods,
arXiv preprint arXiv:1305.7057
42. Muši’c L, Gabelji’c N (2019) Predicting the severity of a mammographic tumor using an artificial neural
network, in International Conference on Medical and Biological Engineering. Springer, pp. 775–778.
43. Nguyen TT, Tsoy Y (2017) A kernel pls based classification method with missing data handling. Stat Pap
58(1):211–225
44. Nilashi M, Ibrahim O, Ahmadi H, Shahmoradi L (2017) A knowledge-based system for breast cancer
classification using fuzzy logic method. Telematics Inform 34(4):133–144
45. Nithya R, Santhi B (2015) Decision tree classifiers for mass classification. International Journal of Signal
and Imaging Systems Engineering 8(1–2):39–45
46. Novakovic J, Veljovic A (2011) Interpretation of mammograms with rotation forest and pca, in 2011 6th
IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI). IEEE, pp.
571–575.
47. Nugroho KA, Setiawan NA, Adji TB (2013) Cascade generalization for breast cancer detection, in 2013
International Conference on Information Technology and Electrical Engineering (ICITEE). IEEE, pp. 57–61.
48. Priebe C, Lorey R, Marchette D, Solka J, Rogers G (1994) Nonparametric spatio-temporal change point
analysis for early detection in mammography
49. Rakowski W, Clark M (1998) Do groups of women aged 50 to 75 match the national average mammog-
raphy rate? Am J Prev Med 15(3):187–197
50. Rathi V, Aggarwal S (2014) Comparing the performance of ann with fnn on mammography mass data set,
in 2014 IEEE International Advance Computing Conference (IACC). IEEE, pp. 1307–1314.
51. Ross BC (2014) Mutual information between discrete and continuous data sets. PloS one 9(2)
52. Saritas I (2012) Prediction of breast cancer using artificial neural networks. J Med Syst 36(5):2901–2907
53. Sebastiani F (2002) Machine learning in automated text categorization. ACM computing surveys (CSUR)
34(1):1–47
54. sklearn.feature selection.chi2. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.
feature_selection.chi2.html
55. sklearn.impute.knnimputer. [Online]. Available: https://scikit-learn.org/stable/ modules/generated/sklearn.
impute.KNNImputer.html
56. sklearn.preprocessing.minmaxscaler. [Online]. Available: https://scikit-learn.org /stable/modules/generated/
sklearn.preprocessing. MinMaxScaler.html
57. Sondakh DE (2017) Data mining for healthcare data: a comparison of neural networks algorithms. Cogito
Smart Journal 3(1):10–19
58. The Python Standard Library — Python 3.9.2 documentation [Online]. Available: https://docs.python.org/3.
9/library/
59. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001)
Missing value estimation methods for dna microarrays. Bioinformatics 17(6):520–525
60. Yan Y-T, Zhang Y-P, Zhang Y-W, Du X-Q (2017) A selective neural network ensemble classification for
incomplete data. Int J Mach Learn Cybern 8(5):1513–1524
61. Zahriah S, Fahmi A, Sharifah Sakinah Syed A, Rabiah A (2017) Imputing missing values in mammography
mass dataset: Will it increase classification performance of machine learning algorithms? in Proceeding 8th
International Conference on Agricultural, Biological, Environmental and Medical Sciences (ABEMS-2017)
Oct. 11–12, 2017 Bali (Indonesia)
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
Affiliations
Sheldon Williamson 1 & K. Vijayakumar 2 & Vinod J. Kadam 3
* Sheldon Williamson
sheldon.williamson@ieee.org
K. Vijayakumar
mkvijay@msn.com
Vinod J. Kadam
vjkadam@dbatu.ac.in
1
Faculty of Engineering and Applied Science, OntarioTech University, Oshawa, Canada
2
Department of Computer Science and Engineering, St. Joseph’s Institute of Technology, OMR, Chennai,
India
3
Department of Information Technology, Dr. Babasaheb Ambedkar Technological University, Lonere,
Maharashtra, India

16 The Chi-Square Test o

Uploaded by

Copyright:

Available Formats

16 The Chi-Square Test o

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

16 The Chi-Square Test o

Uploaded by

Copyright:

Available Formats

Multimedia Tools and Applications

1211: AIOT SUPPORT AND APPLICATIONS WITH MULTIMEDIA

Predicting breast cancer biopsy outcomes from BI-RADS

Received: 15 October 2020 / Revised: 2 March 2021 / Accepted: 28 May 2021

1.1 Mammography and its limitations

1.2 Breast imaging reporting and data system

Table 1 Final assessment categories of BI-RADS [36]

0 Imaging is incomplete, additional tests are required

& Handling missing values using KNN imputation.

3.1.1 KNN imputation

Fig. 1 Proposed model: RF with χ2and MI FS

3.1.2 Min-max scaler

X scaled ¼ X std ðmax−minÞ þ min ð2Þ

3.2 Random forests

It can be explained as follows:

C rfn tree ðxnew Þ ¼ majorityVotefC i ðxnew Þgn1 tree

whereCi(xnew) is class label given by ith tree.

^f n tree ðxnew Þ ¼ 1 n tree ^ new

where ^f i ðxnew Þ is prediction of ith tree.

Fig. 2 Overview of the training and testing phases of the RF

3.3 Feature selection

Feature Selection (FS)—the selection of an appropriate attribute subset—is an essential pre-

3.3.1 Chi-Square statistics

4 Experiments and results

4.1 Mammographic mass dataset and data preprocessing

4.2 Feature ranking and selection

Attributes BIRADS† Density Shape Margin Age Severity∥

Total missing values = 162; †BIRADS assessment; ∥output-attribute/goal field

Table 3 Feature scores

BIRADS† Age Shape Margin Density

χ2 FS 0.6759 12.8045 86.528 105.62 0.08812

Table 4 Selected features using different methods

BIRADS† Margin Age Shape Density

Omitted feature is indicates by *; †BIRADS assessment

Methods RF without FS RF with FS

Accuracy 83.87 84.7

Bold emphasis indicates best performance

Fig. 5 ROC curve and AUC: RF without FS

Fig. 6 ROC curve and AUC: RF with FS

Table 6 Performance comparison with established state-of-the-art classifiers (10 fold-CV)

Classifier Accu- Sensi- Speci- AUC Precision Recall F- MCC

Bold face indicates best performance

With RF without FS model, 83.87% classification accuracy, 82.7% sensitivity, 84.88%

Forward/backward FS +Bagging DT [38] 83.4

Bold face indicates best performance

Sheldon Williamson 1 & K. Vijayakumar 2 & Vinod J. Kadam 3

You might also like