A Deep Learning Method With Filter Based Feature Engineering For Wireless Intrusion Detection System
A Deep Learning Method With Filter Based Feature Engineering For Wireless Intrusion Detection System
A Deep Learning Method With Filter Based Feature Engineering For Wireless Intrusion Detection System
April 5, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2905633
ABSTRACT In recent years, the increased use of wireless networks for the transmission of large volumes of
information has generated a myriad of security threats and privacy concerns; consequently, there has been the
development of a number of preventive and protective measures including intrusion detection systems (IDS).
Intrusion detection mechanisms play a pivotal role in securing computer and network systems; however, for
various IDS, the performance remains a major issue. Moreover, the accuracy of existing methodologies for
IDS using machine learning is heavily affected when the feature space grows. In this paper, we propose a
IDS based on deep learning using feed forward deep neural networks (FFDNNs) coupled with a filter-based
feature selection algorithm. The FFDNN-IDS is evaluated using the well-known NSL-knowledge discovery
and data mining (NSL-KDD) dataset and it is compared to the following existing machine learning methods:
support vectors machines, decision tree, K-Nearest Neighbor, and Naïve Bayes. The experimental results
prove that the FFDNN-IDS achieves an increase in accuracy in comparison to other methods.
INDEX TERMS Deep learning, feature extraction, intrusion detection, machine learning, wireless networks.
2169-3536
2019 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 7, 2019 Personal use is also permitted, but republication/redistribution requires IEEE permission. 38597
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
S. M. Kasongo, Y. Sun: Deep Learning Method With Filter-Based Feature Engineering
The most popular ML approaches to intrusion detection • We scrutinize the performance of the following exist-
include K-Nearest-Neighbors (KNN)citeb10, Decision Tree ing classification algorithms applied to IDS without the
(DT)citeb11, Support Vector Machines (SVM)citeb12, Ran- FEU by using the NSL-KDD dataset: k-nearest neigh-
dom Forest (RF) citeb13, Naive Bayes (NB) citeb14 and bor (KNN), support vector machine (SVM), Decision
Multi-Layered Perceptions (MLP) associated with all Deep Tree (DT), Random Forest (RF) and Naive Bayes (NB).
Learning (DL) Methodologies citeb15, b16, b17. An IDS Moreover, we study the performance of those algorithms
generally treats large amount of data that causes ML tech- coupled with the FEU.
niques such as the ones in citeb10,b11,b12, b13,b14 to per- • A feed-forward deep neural network (FFDNN) is intro-
form poorly; therefore is imperative to devise appropriate duced. We study its performance using the FEU and
strategies and classification approaches to overcome the issue the NSL-KDD dataset. After the comparison to KNN,
of under-performance. This paper focuses on DL to try to SVM, DT, RF and ND, the FEU-FFDNN proves to be
improve on the shortcomings of existing systems. very appropriate for intrusion detection systems. Fur-
DL was first proposed by Professor Hinton [18] and it is thermore, Experimental results demonstrate that depth
an advanced sub-field of ML that simplifies the modeling and the number of neurons (nodes) used for an FFDDN
of various complex concepts and relationships using mul- classifier have a direct impact on its accuracy.
tiple levels of representation [19]. DL has achieved a great The rest of this paper is organized as follow: Section II
amount of success in fields such as language identification, of the paper provides a background on wireless networks.
image processing and pharmaceutical research [20]–[22]. Section III gives an account of similar research with a focus
This has prompted researchers to explore the applica- on ML based IDS as well as various methods for features
tion of DL theory to the intrusion detection classification selection. Section IV details a background on traditional
problem. machine learning classifiers that are also explored in this
The major characteristic that distinguishes DL from tradi- work. Section V of this document provides an architecture
tional ML methods is the improved performance of DL as of the proposed method for wireless intrusion detection.
the amount of data increases. DL algorithms are not well Section VI details the experimental setup used in this research
suited for problems involving small volumes of data because as well as the tools used to design, implement, evaluate and
these algorithms require a considerable amount of data to be test the following classifiers: SVM, DT, RF, NB, KNN and
capable of learning more efficiently [9]. Although DL can FFDNN, and the results are discussed. Section VII concludes
handle a high throughput in terms of data, the questions of the paper.
accuracy improvement and lowering of false-positive alarm
rate still remain due to the ever-growing size of datasets II. BACKGROUND: WIRELESS NETWORKS
used for IDS research. Moreover, as the datasets dilate in In recent years, the growth of wireless networks has been
terms of volume; there is also an expansion of the input very predominant over wired ones. Wireless communication
space and attack classification dimension. Consequently, is attractive because it does not require any wired addi-
instances of misclassification are prevalent, which in turn tional infrastructure for the communication media. Today,
trigger an increase in false positive alarm rate and impacts the most popular form of wireless networks are Wireless
negatively the overall system performance. Therefore, it is Local Area networks (WLANs). WLANs form part of the
crucial to implement solutions that are capable of selecting IEEE 802.11 family and are intensively used as an effective
only the needed features to perform an optimal classification alternative to wired communication in various areas such as
operation. industrial communication and in building communication.
Feature engineering (FE) have become a key topic in many A myriad of security mechanisms including Wired Equivalent
ML research domains [23]–[26]. As part of FE, the feature Protection (WEP) and WiFi Protected Access (WAP, WAP2)
selection algorithms fall into the following the categories: have been mainly used to secure and protect WLANs; how-
filter model, wrapper model and hybrid model. The filter ever, they have shown many flaws when it comes to threats
model bases itself on the intrinsic nature of the data and it such as Denial of Service (DoS) attacks, network discovery
is independent of the classifier used. The wrapper method attacks, brute force attacks, etc [2], [41], [43]. In order to
evaluates the performance of the classification algorithm used reinforce WLANs security against those vulnerabilities, IDSs
on a candidate feature subset, whereas the hybrid method is are generally implemented. In this research, we focus on
a combination the wrapper and filter algorithms [27]. The an IDS for WLANs using DL approach. Furthermore, since
methodology proposed in this paper focuses on a filter-based wired and wireless IDS systems research go hand in hand,
approach as the two latter techniques are computationally this work reviews strategies used both in wired and wireless
expensive [28]. IDS research using ML and DL.
The major contributions of this paper are outlined as
follow: III. RELATED WORK
• A Feature Extraction Unit (FEU) is introduced. By using This section provides an account of previous studies on fea-
filter-based algorithms, the FEU generates optimal sub- ture selection methods in general as well as intrusion detec-
sets of features with minimum redundancy. tion systems using ML and DL techniques.
The research conducted in [19] presented a deep learn- UNSW-B15 Dataset. Decision Tree classifiers were applied
ing based intrusion detection system that made use of to candidates feature subsets and the results suggested that
non-symmetric deep auto-encoder (NDAE) for feature learn- GA-LR is an efficient method.
ing and a classification methodology using stacked NDAEs. Wang et al. [39] took a different direction in terms of the
An NDAE is an auto-encoder made of non-symmetrical mul- feature engineering approach by using a feature augmen-
tiple hidden layers. In simple terms, it is a deep neural net- tation (FA) algorithm rather than a feature reduction one.
work composed of many non-symmetrical hidden layers. The The classifier used in this research was the SVM and the
evaluation of the IDS scheme was made using two datasets: FA algorithm used was the logarithm marginal density ratio
the KDDCup 99 and the NSL-KDD. The performance of transformation. The goal was to obtain newly improved fea-
the multiclass classification experiment yielded an accuracy tures that would ultimately lead to a higher performance in
of 85.42% over the NSL-KDD dataset and an accuracy of detection accuracy. The evaluation of the proposed scheme
97.85% on the KDDCup 99 dataset. was conducted using the NSL-KDD dataset and the outcomes
In [26], the researchers gave an account of a multi-objective from the empirical experiments suggested the FA coupled
algorithm for feature selection labeled MOMI. This approach with the SVM yielded a robust and improved overall perfor-
is centered on Mutual Information (MI) and considers the fea- mance in intrusion detection capacity.
tures redundancy and relevancy during the feature evaluation In [40], an intrusion detection system (IDS) was designed
and selection process. The experiments carried out to evaluate and modelled based on DL using Recurrent Neural Networks
MOMI’s performance were conducted using the WEKA (RNNs). RNNs are neural networks whereby the hidden
tool [35] with three separate datasets. Two classifiers, namely layers act as the information storage units. The bench-
Naive Bayes (NB) and support vector machine (SVM) were mark dataset used in this research was the NSL-KDD. The
used. The results of this research suggested that MOMI RNN-IDS was compared to the following commonly used
was able to select only the features needed for the best classification methods: J.48, Random Forest and SVM. The
performance. accuracy (AC) was mainly used as the performance indi-
Chakraborty and Pal [29] presented a feature selection (FS) cator during the experiments and the results suggested that
algorithm using a multilayer percetron (MLP) framework RNN-IDS presented an improved accuracy of intrusion detec-
with a controlled redundancy (CoR). This approach is tion compared to traditional machine learning classification
labelled as FSMLP-CoR. An MLP is a neural network with methods. These results reinforced the assumption that DL
an input layer, multiple hidden layers and an output layer [30] based intrusion detection systems are superior to classic ML
and it is generally used for approximation, classification, algorithms. In the binary classification scheme, a model with
regression, and prediction in many domains [31]–[34]. In this 80 hidden nodes, a learning rate of 0.1 achieved an accu-
case, an MLP was used to identify and drop those features racy of 83.28% whereas in the multiclass classification using
that are not relevant in resolving the problem at hand. The 5 classes, a model with 80 hidden neurons and learning rate
FSMLP-CoR was tested using 23 datasets and the results of 0.5 got an accuracy of 81.29%.
led researchers to conclude that it was effective in selecting The approach proposed in [41] used a deep learning
important features. approach to intrusion detection for IEEE 802.11 wireless
In [36], an ant colony optimization (ACO) technique was networks using stacked auto encoders (SAE). A SAE is a
applied for feature selection on the KDDCup 99 dataset for neural network created by stacking together multiple layers
intrusion detection. The KDD Cup 99 dataset has 41 features. of sparse auto encoder. The experiments undertook in this
ACO was inspired by how ants use pheromones in a bid to research were made using the Aegean Wireless Intrusion
remember their path. ACO has different variations. In this Dataset (AWID) that is comprised of 155 attributes with the
research, the authors used the ant colony system (ACS) with last attribute representing the class that can take the follow-
two level pheromones update. The proposed solution was ing values: injection, flooding, impersonation and normal.
evaluated using the binary SVM classifier library in WEKA According to Thing [41], this was the first work that proposed
(LibSVM) [35]. The results revealed that a higher accuracy is a deep learning approach applied to IEEE 802.11 networks
obtained with an optimal feature subset of 14 inputs. for classification. The overall accuracy achieved in this work
The research in [37] proposed a wrapper based feature was 98.6688%.
selection algorithm for intrusion detection using the genetic Ding and Wang [42] investigated the use of DL for intru-
algorithm (GA) as an heuristic search method and Logistic sion detection technology using the KDDCup 99 Dataset.
Regression (LR) as the evaluating learning algorithm. The The architecture used for the neural network model con-
whole approach is labeled as GA-LR. GA originates from sisted of 5 hidden layers of 10-20-20-40-64 dense feed for-
the natural selection process and it is under the category ward (fully connected layers). The activation function used
of evolutionary based algorithms [38]. GA has the follow- in this research was the ReLU (Rectified Linear Unit) and
ing building blocks: an initial population, a fitness function, the back-propagation method for training this model was
a genetic operator (variation, crossover and selection) and a the Adam optimizer (Ad-op). The Ad-op was used in a bid
stopping criterion. The experiments conducted to evaluate the to increase the training speed and to prevent overfitting.
GA-LR were done using the KDD Cup 99 Dataset and the Although this research yielded some advancements, it equally
showed no significant improvement in detecting rare attacks within the training set and x0 takes the label of k most similar
types (U2R and R2L) present in the dataset. neighbors [46].
In [43], an ML approach to detect flooding Denial of
Service (DoS) in IEEE 802.11 networks was proposed. The C. NAIVE BAYES
dataset used in this research was generated by the authors in Naive Bayes (NB) classifiers are simple classification algo-
a computer laboratory. The setup was made of 40 computers rithms based on Bayes’ Theorem [47]. Given a dataset, an NB
in which seven were designated as attackers to lunch the classifier assumes a ‘‘naive’’ independence between the fea-
flooding DoS and each of the legitimate node was connected tures. Let X an instance with n features to be classified repre-
to any of the available five Access Points (APs). The obtained sented by the vector X = (x1 , . . . , xn ). In order to figure out
dataset was segmented in the following two portions: 66% the class Ck for X , NB does the following:
for ML training and 34% for ML testing. Using the WEKA p(X |Ck )p(Ck )
tool [35], six classifications ML learning algorithms were p(Ck |X ) = (2)
P(X )
applied consecutively, namely: SVM, Naive Bayes, Naive
Bayes Net, Ripple-DOwn Rule Learner (RIDOR), Alternat- And the class for X is assigned using the following
ing Decision Tree and Adaptive Boosting (AdaBoost). The expression:
empirical results based on the accuracy and the recall num- n
Y
bers suggested that AdaBoost was more efficient than the y = argmax p (Ck ) p (Xi |Ck ) (3)
k∈{1,...,K }
other algorithms. i=1
In [44], a performance comparison of SVM, Extreme where y is the predicted label.
Learning Machine (ELM) and Random Forest (RF) for intru-
sion detection was investigated using the NSL-KDD as the D. DECISION TREE AND RANDOM FOREST
benchmark dataset. Each of the ML algorithms used in this Decision Tree (DT) algorithm is widely used in data mining
investigation was evaluated using the following performance and ML. Given a dataset with labeled instances (training),
metrics: Accuracy, Precision and Recall. The outcome of the DT algorithm generates a predictive model in a shape of a
experiments showed that ELM outperformed RF and SVM; tree capable of predicting the class of unknown records [14].
consequently, the authors concluded that ELM is a viable A DT has three main components: a root node, internal nodes
option when designing and implementing intrusion detection and category nodes. The classification processes happens in
systems. a top-down manner and an optimal decision is reached when
the correct category of leaf node is found. A Random Forest
IV. BACKGROUND ON TRADITIONAL MACHINE classifier on the other hand applies multiple DTs on a given
LEARNING CLASSIFIERS dataset for classification.
A. SUPPORT VECTOR MACHINE
Support Vector Machines (SVM) is one of the most pop- V. PROPOSED METHOD FOR WIRELESS
ular ML techniques applied to Big Data and used in ML INTRUSION DETECTION
research. SVM is a supervised machine learning method A. FEED FORWARD DEEP NEURAL NETWORKS
that is used to classify different categories of data. SVM is Deep neural networks (DNNs) are widely used in ML and
able to solve the complexity of both linear and non-linear DL to solve complex problems. The most basic element of
problems. SVM works by generating a hyperplane or several a DNN is an artificial neuron (AN) which is inspired from
hyperplanes within a high-dimensional space to separate data biological neurons within the human brain. An AN computes
and the ones that optimally split the data per class type are and forwards the sum of information received at its input side.
selected as the best [44]. Due the the non-linearity of real life problems and in a bid
to enhance learnability and approximation, each AN applies
B. K-NEAREST NEIGHBOR an activation function before generating an output [48]. This
K-Nearest Neighbor (KNN) is another ML method used to activation function can be a Sigmoid, σ = 1+e1 − t ; a Rectified
classify data. The KNN algorithm bases itself on the standard Linear Unit (ReLU): f (y) = max(0, y); or an hyperbolic
Euclidean distance between instances in a space and can tangent shown in expression (4).
be defined as follow [45]: let x and y instances in space P,
1 − e−2y
the distance between x and y, d(x, y), is given the following tanh(y) = (4)
expression: 1 + e−2y
v
u n The above-mentioned activation functions have advantages
uX and drawbacks; moreover, their optimal performance is prob-
d(x, y) = t (xk − yk )2 (1) lem specific. Traditionally, artificial neural networks (ANNs)
k=1
have an input layer, one to three hidden layers and an output
where n represents the total number of instances. The KNN layer as shown in Fig. 1; whereas DNNs may contain three
method classifies an instance x0 within a space by calculat- to tens or hundreds of hidden layers [49]. There is no general
ing the Euclidean distance between x0 and k closet samples rule for determining whether an ANN is deep or not. For the
B. DATASET
In the proposed research, the NSL-Knowledge Discovery and
Data mining (NSL-KDD) which is an improved version of
the KDDCup 99 [19] is used to train, evaluate and test the
designed system shown in Fig 2. The NSL-KDD is con-
sidered a benchmark dataset in IDS research and it is used
for both wired and wireless systems [19], [39], [40], [44].
FIGURE 1. Feed forward neural network architecture.
The NSL-KDD comprises one class label categorized in the
following major groups: Normal, Probe, Denial of Service
(DoS), User to Root (U2R) and Remote to User (R2L).
sake of our research, we will consider a DNN to be a neural
Furthermore, the NSL-KDD is made of 41 features of which
network with two or more hidden layers. In a Feed Forward
three are nonenumeric and 38 are numeric as depicted
DNN, the flow of information goes in one direction only:
in Table 1.
from the input layers via the hidden layers to the output layers.
The NSL-KDD comes with two set of data: the training
Neurons within the same layer do not communicate. Each AN
set (KDDTrain+ full) and the test sets (KDDTest+ full and
in the current layer is fully connected to all neurons in the next
KDDTest-21). In this research, we use the KDDTrain+ and
layer as depicted in Fig.1.
the KDDTest+. KDDTrain+ is divided into two partitions:
the KDDTrain+75, which is 75 % of the KDDTrain+ and
it will be used for training, the KDDTEvaluation that is
25 % the KDDTrain+ and it will be used for evaluation after
the training process. Table 2 provides a breakdown of the
components in each dataset.
C. FEATURE ENGINEERING
In a dataset, features may take different forms such as numeric
and nonnumeric. DNN models can only process numeric
Algorithm 3 Forward and Back-Propagation Algorithm major classes. In this research, the following rule applies: a
Input: W , b classifier performs better than another one when it yields a
Output: updated W , b higher accuracy on previously unseen data that can be found
1: Forward propagate xi through layers l = L2, L3, . . . Lnl , in the KDDTest+ set.
(nl is the subscript of the last layer) using zl+1 =
W l al + bl and al+1 = f (zl+1 ) with f , a rectified linear A. PHASE 1: BINARY CLASSIFICATION WITH 41 FEATURES
unit (ReLU) of this form f (z) = max(0, z) This phase uses all 41 features for binary classification.
2: Compute the error term ξ for each output unit i as follow: We only apply Algorithm 1 to transform the inputs. In order
d 1 to select the best FFDNN, we ran models with 41 units at
ξinl = ky−outputk2 = −(yi −anl 0 nl
i ).f (zi ) the input layer, two nodes at the output layer and the fol-
d(znl
i ) 2
lowing hidden nodes numbers: 30, 40, 60, 80 and 150. These
3: For each hidden units in l = nl − 1, nl − 2, . . . , 2, numbers were selected by trial and error method. Moreover,
compute the following for each node i in l: we were also varying the number hidden layers as well as the
sl+1
learning rate. The details are presented in Table 4. For better
X performance analysis and for the purpose of comparison,
ξil = Wjil ξjl+1 .f 0 (zli )
we also perform classification using the following classifier:
j=1
SVM, KNN, RF, DT and NB. The obtained results suggested
4: Calculate the required partial derivatives with respect to that for binary classification, a model with a learning rate of
weights and biases for each training example as follow: 0.05, 30 neurons spread over 3 hidden layers got an accuracy
of 99.69% on the KDDEvaluation set and 86.76% on the
d
C(W , b; x, y) = alj ξil+1 KDDTest+. Fig. 3 shows a comparison of this model with
dWijl other classification algorithms. The Random Forest classifier
d with an accuracy of 85.18% for the KDDTest+ came into sec-
C(W , b; x, y) = ξil+1 ond position after the FFDNN model and the SVM classifier
dbli
produced an accuracy of 84.41% on the same test set.
5: Update the weight and biases as follow:
TABLE 4. Accuracy during training of FFDNN - binary classification.
Wijl = Wijl − ηalj ξil+1
bli = bli − ηξil+1
FIGURE 3. Binary classification accuracy comparison. FIGURE 4. Multiclass classification accuracy comparison.
TABLE 7. Accuracy during training of FEU-FFDNN - Binary Classification. TABLE 8. Accuracy during training of FEU-FFDNN - multiclass
classification.
[15] A. Shenfield, D. Day, and A. Ayesh, ‘‘Intelligent intrusion detection [40] C. Yin, Y. Zhu, J. Fei, and X. He, ‘‘A deep learning approach for intru-
systems using artificial neural networks,’’ ICT Express, vol. 4, no. 2, sion detection using recurrent neural networks,’’ IEEE Access, vol. 5,
pp. 95–99, Jun. 2018. pp. 21954–21961, 2017.
[16] L. van Efferen and A. M. Ali-Eldin, ‘‘A multi-layer perceptron approach [41] V. L. Thing, ‘‘IEEE 802.11 network anomaly detection and attack classifi-
for flow-based anomaly detection,’’ in Proc. Int. Symp. Netw., Comput. cation: A deep learning approach,’’ in Proc. Wireless Commun. Netw. Conf.
Commun. ISNCC, May 2017, pp. 1–6. (WCNC), May 2017, pp. 1–6.
[17] Z. Chiba, N. Abghour, K. Moussaid, A. El Omri, and M. Rida, ‘‘A novel [42] S. Ding and G. Wang, ‘‘Research on intrusion detection technology
architecture combined with optimal parameters for back propagation neu- based on deep learning,’’ in Proc. Int. Conf. Comput. Commun. (ICCC),
ral networks applied to anomaly network intrusion detection,’’ Comput. Dec. 2017, pp. 1474–1478.
Secur., vol. 75, pp. 36–58, Jun. 2018. [43] M. Agarwal, D. Pasumarthi, S. Biswas, and S. Nandi, ‘‘Machine learn-
[18] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521, ing approach for detection of flooding DoS attacks in 802.11 networks
pp. 436–444, May 2015. and attacker localization,’’ Int. J. Mach. Learn. Cybern., vol. 7, no. 6,
[19] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, ‘‘A deep learning approach to pp. 1035–1051, Dec. 2016.
network intrusion detection,’’ IEEE Trans. Emerg. Topics Comput. Intell., [44] I. Ahmad, M. Basheri, M. J. Iqbal, and A. Raheem, ‘‘Performance com-
vol. 2, no. 1, pp. 41–50, Feb. 2018. parison of support vector machine, random forest, and extreme learning
[20] I. Lopez-Moreno, J. Gonzalez-Dominguez, D. Martinez, O. Plchot, and machine for intrusion detection,’’ IEEE Access, vol. 6, pp. 33789–33795,
P. J. Moreno, ‘‘On the use of deep feedforward neural networks for auto- 2018.
matic language identification,’’ Comput. Speech Lang., vol. 40, pp. 46–59, [45] B. Trstenjak, S. Mikac, and D. Donko, ‘‘KNN with TF-IDF based frame-
Nov. 2016. work for text categorization,’’ Procedia Eng., vol. 69, pp. 1356–1364,
[21] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Delving deep into rectifiers: May 2014.
Surpassing human-level performance on imagenet classification,’’ in Proc. [46] S. Tan, ‘‘An effective refinement strategy for KNN text classifier,’’ Expert
IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 1026–1034. Syst. Appl., vol. 3, no. 2, pp. 290–298, 2006.
[22] S. Agatonovic-Kustrin and R. Beresford, ‘‘Basic concepts of artificial [47] M. O. Mughal and S. Kim, ‘‘Signal classification and jamming detection
neural network (ANN) modeling and its application in pharmaceutical in wide-band radios using Naïve bayes classifier,’’ IEEE Commun. Lett.,
research,’’ J. Pharmaceutical Biomed. Anal., vol. 22, no. 5, pp. 717–727, vol. 22, no. 7, pp. 1398–1401, Jul. 2018.
2000. [48] W. Mo, C. L. Gutterman, Y. Li, S. Zhu, G. Zussman, and
[23] H. Liu and L. Yu, ‘‘Toward integrating feature selection algorithms for D. C. Kilper, ‘‘Deep-neural-network-based wavelength selection and
classification and clustering,’’ IEEE Trans. Knowl. Data Eng., vol. 17, switching in ROADM systems,’’ IEEE/OSA J. Opt. Commun. Netw.,
no. 4, pp. 491–502, Apr. 2005. vol. 10, no. 10, pp. D1–D11, 2018.
[24] S. S. Kannan and N. Ramaraj, ‘‘A novel hybrid feature selection via [49] F. Farahnakian and J. Heikkonen, ‘‘A deep auto-encoder based approach
symmetrical uncertainty ranking based local memetic search algorithm,’’ for intrusion detection system,’’ in Proc. Int. Conf. Adv. Commun. Technol.
Knowl.-Based Syst., vol. 23, no. 6, pp. 580–585, Aug. 2010. (ICACT), Feb. 2018, pp. 178–183.
[25] A. Taherkhani, G. Cosma, and T. M. McGinnity, ‘‘Deep-FS: A fea- [50] R. Garreta and G. Moncecchi, Learning Scikit-Learn: Machine Learning
ture selection algorithm for deep boltzmann machines,’’ Neurocomputing, in Python. Birmingham, U.K.:Packt Publishing Ltd, 2013.
vol. 322, pp. 22–37, Dec. 2018. [51] Z. Gao, Y. Xu, F. Meng, F. Qi, and Z. Lin, ‘‘Improved information gain-
[26] M. Labani, P. Moradi, M. Jalili, and X. Yu, ‘‘An evolutionary based multi- based feature selection for text categorization,’’ in Proc. Int. Conf. Wire-
objective filter approach for feature selection,’’ in Proc. World Congr. less Commun. Veh. Technol. Inf. Theory Aerosp. Electron. Sys. (VITAE),
Comput. Commun. Tech. (WCCCT), Feb. 2017, pp. 1510–154. Aug. 2014, pp. 1–5.
[27] P. S. Tang, X. L. Tang, Z. Y. Tao, and J. P. Li, ‘‘Research on feature [52] C. E. Shannon, ‘‘A mathematical theory of communication,’’ ACM SIG-
selection algorithm based on mutual information and genetic algorithm,’’ MOBILE Mobile Comput. Commun. Rev., vol. 5, no. 1, pp. 3–55, 2001.
in Proc. 11th Int. Comput. Conf. Wavelet Active Media Tech. Inf. Process. [53] H. Zhou, Z. Deng, Y. Xia, and M. Fu, ‘‘A new sampling method in particle
(ICCWAMTIP), Dec. 2014, pp. 403–406. filter based on Pearson correlation coefficient,’’ Neurocomputing, vol. 216,
[28] C. Liu, W. Wang, Q. Zhao, X. Shen, and M. Konan, ‘‘A new feature pp. 208–215, May 2016.
selection method based on a validity index of feature subset,’’ Pattern
Recognit. Lett., vol. 92, pp. 1–8, Jun. 2017.
[29] R. Chakraborty and N. R. Pal, ‘‘Feature selection using a neural framework SYDNEY MAMBWE KASONGO received the
with controlled redundancy,’’ IEEE Trans. Neural Netw. Learn. Syst., master’s (M.Tech.) degree in computer systems
vol. 26, no. 1, pp. 35–50, Jan. 2015. from the Tshwane University of Technology,
[30] L. Vanneschi and M. Castelli, ‘‘Multilayer perceptrons,’’ Encyclopedia in 2017. He is currently pursuing the Ph.D. degree
Bioinf. Comput. Biol., vol. 1, pp. 612–620, Jun. 2019. in electrical and electronic engineering with the
[31] F. Murtagh, ‘‘Multilayer perceptrons for classification and regression,’’ University of Johannesburg. His current research
Neurocomputing, vol. 2, nos. 5–6, pp. 183–197, Jul. 1991. interests include machine learning, deep learning,
[32] J. George and S. G. Raj , ‘‘Leaf recognition using multi-layer percep-
computer networks security, wireless networks,
tron,’’ in Proc. Int. Conf. Energy Commun. Data Analytics Soft Comput.
(ICECDS), Aug. 2017, pp. 2216–2221. and data science.
[33] H. Amakdouf, M. E. Mallahi, A. Zouhri, A. Tahiri, and H. Qjidaa,
‘‘Classification and recognition of 3D image of charlier moments using
a multilayer perceptron architecture,’’ Procedia Comput. Sci., vol. 127, YANXIA SUN received the joint D.Tech. degree
pp. 226–235, Aug. 2018. in electrical engineering from the Tshwane Uni-
[34] A. Mondal, A. Ghosh, and S. Ghosh, ‘‘Scaled and oriented object tracking
versity of Technology, South Africa, and the Ph.D.
using ensemble of multilayer perceptrons,’’ Appl. Soft Comput., vol. 73,
degree in computer science from University Paris-
pp. 1081–1094, Dec. 2018.
[35] I. H. Witten, M. A. Hall, E. Frank, and C. J. Pal, ‘‘The WEKA workbench,’’ EST, France, in 2012. She is currently an Associate
in Data Mining: Practical Machine Learning Tools and Techniques, 4th ed. Professor or the Head of the Department of Elec-
Burlington, MA, USA: Appendix, 2017, pp. 553–571. trical and Electronic Engineering Science, Uni-
[36] T. Mehmod and H. B. M. Rais, ‘‘Ant colony optimization and feature selec- versity of Johannesburg, South Africa. She has
tion for intrusion detection,’’ in Advances in Machine Learning and Signal 15 years teaching and research experience. She
Processing, vol. 387. New York, NY, USA: Springer, 2016, pp. 305–312. has lectured five courses in the universities. She
[37] C. Khammassi and S. Krichen, ‘‘A GA-LR wrapper approach for fea- has supervised or co-supervised five postgraduate projects to completion.
ture selection in network intrusion detection,’’ Comput. Secur., vol. 70, She is currently supervising four master’s students and six Ph.D. students.
pp. 255–277, Sep. 2017.
[38] J. McCall, ‘‘Genetic algorithms for modelling and optimisation,’’ J. Com-
She published 42 papers including 14 ISI master indexed journal papers. She
put. Appl. Math., vol. 184, no. 1, pp. 205–222, Dec. 2005. is the Investigator or Co-Investigator for six research projects. She is the
[39] H. Wang, J. Gu, and S. Wang, ‘‘An effective intrusion detection framework member of the South African Young Academy of Science. Her research inter-
based on SVM with feature augmentation,’’ Knowl.-Based Syst., vol. 136, ests include renewable energy, evolutionary optimization, neural networks,
pp. 130–139, Nov. 2017. nonlinear dynamics, and control systems.