Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
39 views

Grid Search-Based Hyperparameter Tuning and Classification of Microarray Cancer Data

This document describes a study that proposes using grid search-based hyperparameter tuning to optimize the parameters of a random forest classifier for classifying microarray cancer data. The researchers tuned hyperparameters like the maximum number of features considered at each split, number of trees in the forest, maximum depth of each tree, and minimum samples required to split nodes. They tested their method on five standard microarray cancer datasets, achieving promising results across most datasets according to various performance metrics like accuracy, precision, recall, and F1 score.

Uploaded by

annalystat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Grid Search-Based Hyperparameter Tuning and Classification of Microarray Cancer Data

This document describes a study that proposes using grid search-based hyperparameter tuning to optimize the parameters of a random forest classifier for classifying microarray cancer data. The researchers tuned hyperparameters like the maximum number of features considered at each split, number of trees in the forest, maximum depth of each tree, and minimum samples required to split nodes. They tested their method on five standard microarray cancer datasets, achieving promising results across most datasets according to various performance metrics like accuracy, precision, recall, and F1 score.

Uploaded by

annalystat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)

Grid Search-Based Hyperparameter Tuning and


Classification of Microarray Cancer Data
B. H Shekar, Department of Computer Science, Mangalore University, India
Guesh Dagnew, Department of Computer Science, Mangalore University, India

Abstract—Cancer is a group of diseases caused due to abnor- the optimal model. Achieving optimal hyperparameters is a
mal cell growth. Due to the innovation of microarray technology, challenging ahead of time and demands model tuning in a
a large variety of microarray cancer datasets are produced and trial-and-error basis.Hyperparameters are optimizer variables
hence open up avenues to carry out research work across several
disciplines such as Statistics, Computational Biology, Genomic that are executing during the training phase to get optimized
studies and other related fields. The main challenges in analyzing average values after several trial-and-error processes. To over-
microarray cancer data are the curse of dimensionality, small come the overfitting constraint with the ordinary Grid search,
sample size, noisy data, and imbalance class problem. In this the stratified cross-validation is applied where samples are
work, we are proposing grid search-based hyperparameter tuning divided into K-folds at random. The GridSearchCV model
(GSHPT) for random forest parameters to classify Microarray
Cancer Data. A grid search is designed by a set of fixed parameter taken from Scikit learn [3] is used to get the best parameters.
values which are essential in providing optimal accuracy on the The focus is to get four optimal parameters namely maximum
basis of n-fold cross-validation. In our work, the 10-fold cross number of features to split in a certain node, a number of
validation is considered. The grid search algorithm provides best estimators, which are a number of trees in the forest, the
parameters such as the number of features to consider at each Gini-index and level of the trees in the forest. To validate
split, number of trees in the forest, the maximum depth of the
tree and the minimum number of samples required to be split the method, five standard microarray cancer data are used.
at the leaf node. The maximum number of trees considered are To confirm the validity of the proposed method, extensive
10, 20 and 70 respectively for Ovarian, 3-class Leukemia, and numbers of experiments are carried out and promising results
3-class Leukemia cancer data. In the case of MLL and SRBCT, are obtained across most of the test datasets. To measure
50 trees are generated to achieve the maximum classification the performance of the method, several standard metrics are
accuracy. The Gini index is employed as criteria to split the
nodes and the maximum depth of the tree is set to 2 for all employed such as classification accuracy, precision, recall,
datasets. Experimental results of the proposed work show an f1-score, misclassification error, Out-of-bag (OOB) error and
improvement over the state of the art methods. The performance confusion matrix.
of the proposed method is evaluated using standard metrics such The rest of the paper is organized as follows. Section 2 deals
as classification accuracy, precision, recall, f1-score, confusion with related works. In Section and its subsection the proposed
matrix and misclassification rate and comparative analysis is
performed and the results are provided to reveal the performance method. Experimental results and discussion are covered in
of the proposed method. Sections 4 and 5 respectively. Finally, Section 6 covers the
concluding remarks.
Index Terms—Grid Search, Random Forest, Feature Selection,
Classification, Microarray
II. R ELATED WORKS
I. I NTRODUCTION Analysis of microarray cancer data is becoming a hot
Due to the introduction of the Microarray technology in the research area across several multidisciplinary areas including,
late 1980s, the massive volume of gene expression cancer data computer science, computational Biology, Genomic studies
is producing to launch a hot research area across several dis- machine learning, pattern recognition, statistics and other
ciples including machine learning, pattern recognition, com- related fields including engineering. Medjahed et al. [4] pro-
putation Biology aiming in the diagnosis of cancer patients, posed a complete cancer diagnostic method through kernel-
identification of cancer types and differentiation. [1]. The based learning. Salem et al. [5] proposed a classification
main constraints to be addressed in the analysis of microarray of human cancer by combining Information Gain (IG) and
cancer data are related to the high curse of dimensionality, Standard Genetic Algorithm (SGA). Liu et al. [6] proposed a
noisy data, class imbalance, and small sample size problems hybrid method to handle class imbalance at the feature and
[2]. To address these challenges, some of the research di- algorithmic level. Ramos-Gonzalez et al. [7] introduced an
rections include feature selection, dimensionality reduction, application of supervised machine learning for classification
and classification and optimization techniques. In this work, of cancer via deep learning. Farid et al. [8] proposed an adap-
we are proposing grid search-based hyperparameters tuning tive combination of feature selection with dissimilarity based
to optimize the parameters of Random forest tree classifier representation paradigm using classifiers such as Decision Tree
and apply to classify binary and multi-class microarray cancer (DT), Nave Bayes (NB) and KNN. Dashtban and Balafar [9]
datasets. The core challenge to be tackled in this work is to introduced an evolutionary-based genetic algorithm and AI
find the optimal values of the hyperparameter which produces to identify predictive genes for cancer classification applying

Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on June 26,2023 at 12:54:56 UTC from IEEE Xplore. Restrictions apply.
978-1-5386-7989-0/19/$31.00 ©2019 IEEE
2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)

feature selection followed by classification. To address the


problem of ranking algorithms for a given gene, Dash and X − Min(X)
X̂ = (1)
Misra [10] proposed pipelining of ranking methods to address Max(X) − Min(X)
the ranking problem. Wang et al. [11] introduced weighted
feature selection using bacterial-based algorithms to handle
dimensionality reduction. Aziz and Srivastava [12] proposed A. Grid-based Hyperparameter Tuning and Optimization
feature selection by identifying the independent component of In our work, optimizing of the model based on grid-based
the features using independent component analysis (ICA) and hyperparameter tuning is carried out. The main parameters
fuzzy backward feature elimination (FBFE). Das et al. [13] which are optimized are a number of trees in a forest, the
introduced ensemble learning based feature selection to select maximum number of features to split in the child node, level
informative features using mutual information theory and Bi- of the trees in each decision trees and the criterion used for
objective genetic algorithm. Thanh et al. [14] introduced splitting. Table II shows optimal parameters of Random Forest
cancer classification approach through gene expression profiles on five of microarray cancer data, the Gini-Index is used and
by designing supervised learning Hidden Markov Models the level of each decision tree is set to 2 for the splitting
(HMMs). Khashei et al.[15] proposed a hybrid model that criterion in all datasets. Moreover, the number of features
combines Artificial Intelligence with fuzzy logic in order to considered for splitting a given node are log2 for 2-class and
benefit from unique advantages of both fuzzy logic and the 3-class Leukemia datasets, for MLL dataset the number of
classification power of the Artificial Neural Networks (ANNs), features are None. Finally, the number of features for Ovarian
to construct an efficient and accurate hybrid classifier in less and SRBCT dataset are set to Auto.
available data situations. To address the problems such as
sparsity, imbalance, and noise in a cancer data, Yu et al.
[16] introduced hybrid K-Nearest Neighbour (KNN) classifier B. Data set Description
which considers the global and local optima information. To To validate our work, we use 5 standard microarray which
address the problem of bias and inconsistency in gene selec- includes both binary and multiclass cancer data namely 2-class
tion, Algamal, and Lee [17] proposed an adaptive penalized Leukemia, 3-class Leukemia, MLL, Ovarian, and SRBCT. The
logistic regression (PLR) detail description of the datasets is shown in Table III.
1) Evaluation Methods:
III. P ROPOSED M ETHOD 1Õ
n
OOB = 1yi ,yˆi (2)
In this section, we are presenting the proposed work to n 1
classify microarray cancer data. Microarray cancer data are
characterized by the high curse of dimensionality whereby To validate the optimal number of trees, leading to a minimum
most of the variables do not have a significant contribution error, generated by the grid search in random forest, the out-of-
to the development of cancer. Three possible options of bag (OOB) error is computed for the parameters sqrt, log2
maximizing number of features are used namely, log2(N f ), and None for each datasets as shown in Figures 2, 3, 4, 5
auto(N f ), where Auto is equivalent to the square root (sqrt) and 6. The OOB error is computed as zi = (xi, yi ) which
shows the mean for predictions zi in the random forest tree
p
N f parameter and None(N f ) that takes all features N f for
each split. When the maximum number of features are log2, containing zi in each bootstrap sample excluding zi in the
the model takes log to the base 2 of a number of the total respective bootstrap sample.
number of features log2 (N f )) where N f is the total number The absolute classification error, which is the difference
of features. Accordingly, the number of features considered to between the predicted class and actual test class labels is
split at each node in the case of 2-class Leukemia and 3-class computed based on Equation 3 where YA is the actual label of
Leukemia is 4.95 and 5.67 respectively. In the case of MLL the test data and YP is the label of the predicted label.
data, the None parameter is taken to optimize the required
number of features, which is 26 in this case. Moreover, for Eabsolute = |y A − y P | (3)
Ovarian and SRBCT datasets, an optimal number of features
considered for a split at each node are 4.78 and 7.81 respec-
1 |YA − YP |
tively as shown in Table I. M APE = Σ (4)
n |YA |
Given features of microarray cancer data, normalization is
carried out to bring all values to a limited range, in this case To compute the mean absolute percentage error (MAPE),
between [0, 1] so as to fit the classifier properly as shown Equation 4 is used. The difference of the actual class label
in Equation 1, where X is the original feature value and X̂ and predicted class label is divided by the actual class label
is a normalized value of the features. We split the data into to get the MAPE for each dataset. As shown in Table IV,
training and test samples right before defining and train using the MAPE error scored in three datasets namely, 2-class
the random forest tree classifier on the basis of 5-fold cross- Leukemia, Ovarian and SRBCT is 0 and the scored accuracy is
validation. The best parameters of the grid search are extracted 100%. However, for the datasets, 3-class Leukemia and MLL
so as to use them for prediction of the test samples as shown the MAPE error scored is 1.49 and an accuracy of 98.85% is
in Figure 1. registered.

Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on June 26,2023 at 12:54:56 UTC from IEEE Xplore. Restrictions apply.
2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)

Fig. 1: Full Experimental setup of the method

TABLE I: Optimized Parameters of Random Forest tree on all datasets


Maximum # features
Dataset #Features Parameters
in a node
2-class Leukemia 31 log2 log2 (31) = 4.95
3-class Leukemia 51 log2 log2 (51) = 5.67
MLL 26 None Featur
√ es = 26
Ovarian 23 Sqrt √26 = 4.79
SRBCT 61 Sqrt 61 = 7.81

TABLE II: Best Parameters of the grid search for all data sets
Dataset Maximum # of features Criterion Maximum depth # of Trees
3-class Leukemia log2 Gini 2 20
2-class Leukemia log2 Gini 2 70
MLL None Gini 2 50
Ovarian auto Gini 2 10
SRBCT auto Gini 2 50

TABLE III: Dataset Description


Dataset Sample size # of Features # Classes Selected #Features Training size Test size
2-class Leukemia 72 7129 2 31 43 29
3-class Leukemia 72 7129 3 51 43 29
MLL 72 12600 3 26 43 29
Ovarian 253 15154 2 23 151 102
SRBCT 83 2308 5 61 58 25

To validate our method, we employe several evaluation and wrong predictions.


metrics such as classification accuracy, precision, recall, f-
score, misclassification error, and OOB Errors are used. FP + F N
E= (5)
Moreover, a misclassification rate is employed as an eval- T P + T N + FP + F N
uation metric which measures the number of misclassified
samples in the test set. Misclassification rate is computed as IV. E XPERIMENTAL R ESULTS
the ratio of false positives and false negatives to the total To show the soundness of the proposed method, an extensive
size of test sets as shown in Equation 5. It is computed in number of experiments are carried out and results are provided
every 10% of the training size and an average of the 10 using several standard metrics such as classification accuracy,
errors is computed to get the final error. Even when the precision, recall, and F-score which are presented in Table
classification rate is 100%, there is still misclassification error V. The performance of the proposed method in the case
which indicates the penalty of overconfidence during correct of training accuracy is 100% for all datasets. The proposed

Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on June 26,2023 at 12:54:56 UTC from IEEE Xplore. Restrictions apply.
2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)

TABLE IV: MAPE Error and related accuracy on all datasets


Dataset Absolute MAPE Error Avg MAPE Error Accuracy
2-class Leukemia 0.00 0.00 1.00
3-class Leukemia 1.149 0.03 98.85
MLL 1.149 0.03 98.85
Ovarian 0.00 0.00 1.00
SRBCT 0.00 0.00 1.00

Fig. 2: OOB error and number of trees for 2-class Leukemia Fig. 5: OOB error and number of trees for Ovarian

Fig. 3: OOB error and number of trees for 3-class Leukemia Fig. 6: OOB error and number of trees for SRBCT

Fig. 4: OOB error and number of trees for MLL Fig. 7: Accuracy of 2-class Leukemia cancer data

method achieves an accuracy of 100% in three datasets namely uation metric which measures the number of misclassified
2-class Leukemia, Ovarian and SRBCT and a test accuracy samples in the test set. Misclassification rate is computed as
of 0.97 is achieved for both MLL and 3-class Leukemia the ratio of false plus true positives to the size of test sets
respectively. as shown in Equation 5. It is computed in every 10% of the
Moreover, a misclassification rate is employed as an eval- training size and an average of the 10 errors is computed to

Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on June 26,2023 at 12:54:56 UTC from IEEE Xplore. Restrictions apply.
2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)

TABLE V: Experimental results of the proposed method in Training and Test Accuracy, Precision, Recall and F-score
Dataset Training Accuracy Test accuracy Precision Recall Fscore
2-class Leukemia 1.00 1.00 1.00 1.00 1.00
3-class Leukemia 1.00 0.97 0.97 0.97 0.96
MLL 1.00 0.97 0.97 0.97 0.97
Ovarian 1.00 1.00 1.00 1.00 1.00
SRBCT 1.00 1.00 1.00 1.00 1.00

Fig. 8: Accuracy of 3-class Leukemia cancer data Fig. 11: Accuracy of 4-class SRBCT cancer data

Fig. 9: Accuracy of 3-class MLL cancer data Fig. 12: Confusion Matrix of 2-class Leukemia

Fig. 10: Accuracy of 2-class Ovarian cancer data Fig. 13: Confusion Matrix of 3-class Leukemia

get the final error. Even when the classification is 100%, there SRBCT cancer data.
is still misclassification error which indicates the penalty of
overconfidence during correct and wrong predictions. Figures FP + F N
E= (6)
17, 18, 19, 20, and 21 shows the misclassification error rate T P + T N + FP + F N
of 2-class Leukemia, 3-class Leukemia, MLL, Ovarian and

Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on June 26,2023 at 12:54:56 UTC from IEEE Xplore. Restrictions apply.
2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)

Fig. 14: Confusion Matrix of MLL Fig. 17: Misclassification error of 2-class Leukemia cancer
data

Fig. 15: Confusion Matrix of Ovarian


Fig. 18: Misclassification error of 3-class Leukemia cancer
data

Fig. 16: Confusion Matrix of SRBCT


Fig. 19: Misclassification error of 3-class MLL cancer data

We explore the quality of the method by applying the ROC


curve using cross-validation. The ideal value of the ROC curve TP
Precision = (8)
is the top left most of the curve which indicates maximum T P + FP
true positive rate and minimum false positive rate. Since the 5-
TP
fold cross-validation is employed to train the method, the ROC Recall = (9)
curve for each fold is computed to see how the model performs TP + FN
with different training subsets. The mean of 5-folds ROC and Precision × Recall
along with the standard deviation is computed. Figures 22 and Fs core = 2. (10)
Precision + Recall
23 show the roc curve 2-class and Ovarian cancer data.
V. D ISCUSSION
TP + TN In this work, two binary and three multi-class microarray
Accuracy = (7) cancer data are classified using grid search based optimized
T P + T N + FP + F N
Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on June 26,2023 at 12:54:56 UTC from IEEE Xplore. Restrictions apply.
2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)

Fig. 20: Misclassification error of 2-class Ovarian cancer data Fig. 23: 5-fold CV ROC of Ovarian

A. Comparison of the proposed method with state-of-the-art


works on accuracy
We have provided a comparative analysis with state-of-the-
art works in terms of classification accuracy. We noticed that
many works are focusing on feature selection and parameter
tuning to optimize the classification accuracy [18], [19] [4],
[20], [9]. It shall be noticed that our approach is performing
better than many of the latest works on most of the datasets,
as presented in Table VI. The hyphenated fields indicate the
corresponding dataset was not used in the particular method.

VI. C ONCLUSION
In this work, we propose microarray cancer classification
Fig. 21: Misclassification error of 4-class SRBCT cancer data
using optimized hyperparameters of random forest tree using
grid search approach. Optimization of Random Forests algo-
rithm is carried out to get the best parameters and applied
to validate the method. The proposed method provides best
parameters that give the maximum number of features to split
a node, number of decision trees in a forest, depth of the trees
and criterion to split a given node into child node. To check the
optimal parameter which leads to maximum classification ac-
curacy and minimum error, the Out-of-bag error is employed.
In the proposed approach, we have used five standard microar-
ray medical data. Performance measures such as classification
accuracy, precision, recall, f-score, misclassification error and
confusion matrix is employed to confirm the validity of the
method. The experimental results of the proposed method
exhibit perfect classification on three datasets namely 2-class
Fig. 22: 5-fold CV ROC of 2-class Leukemia leukemia, Ovarian, and SRBCT by scoring 100% and 0.97 test
accuracy is achieved on two datasets namely 3-class Leukemia
and MLL.

parameters of RF algorithm. The classification accuracy ob- R EFERENCES


tained on 2-class Leukemia, Ovarian and SRBCT is 100% [1] V. Garcı́a and J. S. Sánchez, “Mapping microarray gene expression data
and results of 3-class Leukemia and MLL is 0.97 as shown into dissimilarity spaces for tumor classification,” Information Sciences,
vol. 294, pp. 362–375, 2015.
in Table V. Moreover, Figures 7, 8, 9, 10 and 11 shows the [2] K.-H. Chen, K.-J. Wang, K.-M. Wang, and M.-A. Angelia, “Applying
test obtained from 2-class Leukemia, 3-class Leukemia, MLL, particle swarm optimization-based decision tree classifier for cancer
Ovarian and SRBCT datasets respectively. classification on gene expression data,” Applied Soft Computing, vol. 24,
pp. 773–780, 2014.
To validate our work, we employ confusion matrix, which [3] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
indicates correctly and wrongly predicted samples in the test O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-
plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duches-
data, in Figures 12, 13, 14, 15 and 16 for 2-class Leukemia, nay, “Scikit-learn: Machine Learning in Python ,” Journal of Machine
3 class Leukemia, MLL, Ovarian and SRBCT cancer data. Learning Research, vol. 12, pp. 2825–2830, 2011.

Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on June 26,2023 at 12:54:56 UTC from IEEE Xplore. Restrictions apply.
2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)

TABLE VI: Comparison of the proposed method with state-of-the-art methods


2-class 3-class
Method MLL Ovarian SRBCT
Leukemia Leukemia
MOBBA-LS [18] 0.97 - - - 0.85
MOEDA[19] 1.00 - - - 0.91
Kernel-based [4] 0.96 - - 0.98 -
Game theory-based optimization[20] 0.97 0.98 0.95 - 0.98
Bacterial-based algorithms[11] 1.00 - - - 1.00
Black Hole Algorithm (BHA)[21] 0.97 - - 0.99 -
Evolutionary method based
on genetic algorithms [9] 0.97 - - - 1.00
Evolutionary method based
on genetic algorithms [22] 0.97 - 0.93 - 0.95
Proposed method (GSHPT) 1.00 0.97 0.97 1.00 1.00

[4] S. A. Medjahed, T. A. Saadi, A. Benyettou, and M. Ouali, “Kernel-based pso and adaptive k-nearest neighborhood technique,” Expert Systems
learning and feature selection analysis for cancer diagnosis,” Applied Soft with Applications, vol. 42, no. 1, pp. 612–627, 2015.
Computing, vol. 51, pp. 39–48, 2017.
[5] H. Salem, G. Attiya, and N. El-Fishawy, “Classification of human cancer
diseases by gene expression profiles,” Applied Soft Computing, vol. 50,
pp. 124–134, 2017.
[6] Z. Liu, D. Tang, Y. Cai, R. Wang, and F. Chen, “A hybrid method
based on ensemble welm for handling multi class imbalance in cancer
microarray data,” Neurocomputing, vol. 266, pp. 641–650, 2017.
[7] J. Ramos-González, D. López-Sánchez, J. A. Castellanos-Garzón, J. F.
de Paz, and J. M. Corchado, “A cbr framework with gradient boosting
based feature selection for lung cancer subtype classification,” Comput-
ers in biology and medicine, vol. 86, pp. 98–106, 2017.
[8] D. M. Farid, M. A. Al-Mamun, B. Manderick, and A. Nowe, “An
adaptive rule-based classifier for mining big biological data,” Expert
Systems with Applications, vol. 64, pp. 305–316, 2016.
[9] M. Dashtban and M. Balafar, “Gene selection for microarray cancer
classification using a new evolutionary method employing artificial
intelligence concepts,” Genomics, vol. 109, no. 2, pp. 91–107, 2017.
[10] R. Dash and B. B. Misra, “Pipelining the ranking techniques for
microarray data classification: A case study,” Applied Soft Computing,
vol. 48, pp. 298–316, 2016.
[11] H. Wang, X. Jing, and B. Niu, “A discrete bacterial algorithm for feature
selection in classification of microarray gene expression cancer data,”
Knowledge-Based Systems, vol. 126, pp. 8–19, 2017.
[12] R. Aziz, C. Verma, and N. Srivastava, “A fuzzy based feature selection
from independent component subspace for machine learning classifica-
tion of microarray data,” Genomics data, vol. 8, pp. 4–15, 2016.
[13] A. K. Das, S. Das, and A. Ghosh, “Ensemble feature selection using bi-
objective genetic algorithm,” Knowledge-Based Systems, vol. 123, pp.
116–127, 2017.
[14] T. Nguyen, A. Khosravi, D. Creighton, and S. Nahavandi, “Hidden
markov models for cancer classification using gene expression profiles,”
Information Sciences, vol. 316, pp. 293–307, 2015.
[15] M. Khashei, A. Z. Hamadani, and M. Bijari, “A fuzzy intelligent
approach to the classification problem in gene expression data analysis,”
Knowledge-Based Systems, vol. 27, pp. 465–474, 2012.
[16] Z. Yu, H. Chen, J. Liu, J. You, H. Leung, and G. Han, “Hybrid k-nearest
neighbor classifier,” IEEE transactions on cybernetics, vol. 46, no. 6, pp.
1263–1275, 2016.
[17] Z. Y. Algamal and M. H. Lee, “Penalized logistic regression with the
adaptive lasso for gene selection in high-dimensional cancer classifica-
tion,” Expert Systems with Applications, vol. 42, no. 23, pp. 9326–9332,
2015.
[18] M. Dashtban, M. Balafar, and P. Suravajhala, “Gene selection for tu-
mor classification using a novel bio-inspired multi-objective approach,”
Genomics, vol. 110, no. 1, pp. 10–17, 2018.
[19] J. Lv, Q. Peng, X. Chen, and Z. Sun, “A multi-objective heuristic
algorithm for gene expression microarray data classification,” Expert
Systems With Applications, vol. 59, pp. 13–19, 2016.
[20] S. Sasikala, S. A. alias Balamurugan, and S. Geetha, “A novel adaptive
feature selector for supervised classification,” Information Processing
Letters, vol. 117, pp. 25–34, 2017.
[21] E. Pashaei and N. Aydin, “Binary black hole algorithm for feature
selection and classification on biological data,” Applied Soft Computing,
vol. 56, pp. 94–106, 2017.
[22] S. Kar, K. D. Sharma, and M. Maitra, “Gene selection from microarray
gene expression data for classification of cancer subgroups employing

Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on June 26,2023 at 12:54:56 UTC from IEEE Xplore. Restrictions apply.

You might also like