Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

A Deep Learning Method With Filter Based Feature Engineering For Wireless Intrusion Detection System

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Received March 6, 2019, accepted March 12, 2019, date of publication March 18, 2019, date of current version

April 5, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2905633

A Deep Learning Method With Filter Based


Feature Engineering for Wireless Intrusion
Detection System
SYDNEY MAMBWE KASONGO AND YANXIA SUN
Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg 2006, South Africa
Corresponding author: Sydney Mambwe Kasongo (sydneybleuops@gmail.com)
This work was supported in part by the South African National Research Foundation under Grant 112108 and Grant 112142, in part by the
South African National Research Foundation Incentive Grant under 95687, in part by the Eskom Tertiary Education Support Programme
Grant, and in part by the Research Grant from URC of the University of Johannesburg.

ABSTRACT In recent years, the increased use of wireless networks for the transmission of large volumes of
information has generated a myriad of security threats and privacy concerns; consequently, there has been the
development of a number of preventive and protective measures including intrusion detection systems (IDS).
Intrusion detection mechanisms play a pivotal role in securing computer and network systems; however, for
various IDS, the performance remains a major issue. Moreover, the accuracy of existing methodologies for
IDS using machine learning is heavily affected when the feature space grows. In this paper, we propose a
IDS based on deep learning using feed forward deep neural networks (FFDNNs) coupled with a filter-based
feature selection algorithm. The FFDNN-IDS is evaluated using the well-known NSL-knowledge discovery
and data mining (NSL-KDD) dataset and it is compared to the following existing machine learning methods:
support vectors machines, decision tree, K-Nearest Neighbor, and Naïve Bayes. The experimental results
prove that the FFDNN-IDS achieves an increase in accuracy in comparison to other methods.

INDEX TERMS Deep learning, feature extraction, intrusion detection, machine learning, wireless networks.

I. INTRODUCTION In terms of performance, an IDS is considered effective


Computer networks and wireless networks in particular are or accurate in detecting intrusions when it concurrently
subjects to a myriad of security threats and attacks. The achieves low false alarm rates and high classification
security challenges that have to be solved originate from the accuracy [7]; therefore, decreasing the law false alarm rate
open nature, the flexibility and the mobility of the wireless as well as increasing the detection accuracy of an IDS
communication medium [1], [2]. In an effort to secure these should be one of the crucial tasks when designing an IDS.
networks, various preventive and protective mechanisms such In this paper, the terms wireless intrusion detection sys-
as intrusion detection systems (IDS) were developed [3]. Pri- tem (WIDS) and intrusion detection system (IDS) will be
marily, IDS can be classified as: host based intrusion detec- used interchangeably.
tion systems (HIDS) and network based intrusion detection In a bid to build efficient IDS systems, Machine Learn-
systems (NIDS) [4]. Furthermore, both HIDS and NIDS can ing (ML) approaches are used to identify various types of
be categorized into: signature-based IDS, anomaly-based IDS attacks. ML is the scientific study of procedures, algorithms
and hybrid IDS [5], [6]. An Anomaly based IDS analyses the and statistical models used by computer systems to solve
network under normal circumstances and flags any deviation complex problems and it is considered a subset Artificial
as an intrusion. A signature-based IDS relies on a predefined Intelligence (AI) citeb8. Since the issue of intrusion detection
database of known intrusions to pinpoint an intrusion. In this is a classification problem, it can be modeled using ML
case, a manual update of the database is performed by the techniques. It has been proven that developing IDS using ML
system administrators. methods can produce high levels of accuracy citeb5; however,
citeb9 showed that the most accurate and effective IDS has
The associate editor coordinating the review of this manuscript and not been discovered and that each IDS solution presents its
approving it for publication was Shagufta Henna. own advantages and handicaps under various conditions.

2169-3536
2019 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 7, 2019 Personal use is also permitted, but republication/redistribution requires IEEE permission. 38597
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
S. M. Kasongo, Y. Sun: Deep Learning Method With Filter-Based Feature Engineering

The most popular ML approaches to intrusion detection • We scrutinize the performance of the following exist-
include K-Nearest-Neighbors (KNN)citeb10, Decision Tree ing classification algorithms applied to IDS without the
(DT)citeb11, Support Vector Machines (SVM)citeb12, Ran- FEU by using the NSL-KDD dataset: k-nearest neigh-
dom Forest (RF) citeb13, Naive Bayes (NB) citeb14 and bor (KNN), support vector machine (SVM), Decision
Multi-Layered Perceptions (MLP) associated with all Deep Tree (DT), Random Forest (RF) and Naive Bayes (NB).
Learning (DL) Methodologies citeb15, b16, b17. An IDS Moreover, we study the performance of those algorithms
generally treats large amount of data that causes ML tech- coupled with the FEU.
niques such as the ones in citeb10,b11,b12, b13,b14 to per- • A feed-forward deep neural network (FFDNN) is intro-
form poorly; therefore is imperative to devise appropriate duced. We study its performance using the FEU and
strategies and classification approaches to overcome the issue the NSL-KDD dataset. After the comparison to KNN,
of under-performance. This paper focuses on DL to try to SVM, DT, RF and ND, the FEU-FFDNN proves to be
improve on the shortcomings of existing systems. very appropriate for intrusion detection systems. Fur-
DL was first proposed by Professor Hinton [18] and it is thermore, Experimental results demonstrate that depth
an advanced sub-field of ML that simplifies the modeling and the number of neurons (nodes) used for an FFDDN
of various complex concepts and relationships using mul- classifier have a direct impact on its accuracy.
tiple levels of representation [19]. DL has achieved a great The rest of this paper is organized as follow: Section II
amount of success in fields such as language identification, of the paper provides a background on wireless networks.
image processing and pharmaceutical research [20]–[22]. Section III gives an account of similar research with a focus
This has prompted researchers to explore the applica- on ML based IDS as well as various methods for features
tion of DL theory to the intrusion detection classification selection. Section IV details a background on traditional
problem. machine learning classifiers that are also explored in this
The major characteristic that distinguishes DL from tradi- work. Section V of this document provides an architecture
tional ML methods is the improved performance of DL as of the proposed method for wireless intrusion detection.
the amount of data increases. DL algorithms are not well Section VI details the experimental setup used in this research
suited for problems involving small volumes of data because as well as the tools used to design, implement, evaluate and
these algorithms require a considerable amount of data to be test the following classifiers: SVM, DT, RF, NB, KNN and
capable of learning more efficiently [9]. Although DL can FFDNN, and the results are discussed. Section VII concludes
handle a high throughput in terms of data, the questions of the paper.
accuracy improvement and lowering of false-positive alarm
rate still remain due to the ever-growing size of datasets II. BACKGROUND: WIRELESS NETWORKS
used for IDS research. Moreover, as the datasets dilate in In recent years, the growth of wireless networks has been
terms of volume; there is also an expansion of the input very predominant over wired ones. Wireless communication
space and attack classification dimension. Consequently, is attractive because it does not require any wired addi-
instances of misclassification are prevalent, which in turn tional infrastructure for the communication media. Today,
trigger an increase in false positive alarm rate and impacts the most popular form of wireless networks are Wireless
negatively the overall system performance. Therefore, it is Local Area networks (WLANs). WLANs form part of the
crucial to implement solutions that are capable of selecting IEEE 802.11 family and are intensively used as an effective
only the needed features to perform an optimal classification alternative to wired communication in various areas such as
operation. industrial communication and in building communication.
Feature engineering (FE) have become a key topic in many A myriad of security mechanisms including Wired Equivalent
ML research domains [23]–[26]. As part of FE, the feature Protection (WEP) and WiFi Protected Access (WAP, WAP2)
selection algorithms fall into the following the categories: have been mainly used to secure and protect WLANs; how-
filter model, wrapper model and hybrid model. The filter ever, they have shown many flaws when it comes to threats
model bases itself on the intrinsic nature of the data and it such as Denial of Service (DoS) attacks, network discovery
is independent of the classifier used. The wrapper method attacks, brute force attacks, etc [2], [41], [43]. In order to
evaluates the performance of the classification algorithm used reinforce WLANs security against those vulnerabilities, IDSs
on a candidate feature subset, whereas the hybrid method is are generally implemented. In this research, we focus on
a combination the wrapper and filter algorithms [27]. The an IDS for WLANs using DL approach. Furthermore, since
methodology proposed in this paper focuses on a filter-based wired and wireless IDS systems research go hand in hand,
approach as the two latter techniques are computationally this work reviews strategies used both in wired and wireless
expensive [28]. IDS research using ML and DL.
The major contributions of this paper are outlined as
follow: III. RELATED WORK
• A Feature Extraction Unit (FEU) is introduced. By using This section provides an account of previous studies on fea-
filter-based algorithms, the FEU generates optimal sub- ture selection methods in general as well as intrusion detec-
sets of features with minimum redundancy. tion systems using ML and DL techniques.

38598 VOLUME 7, 2019


S. M. Kasongo, Y. Sun: Deep Learning Method With Filter-Based Feature Engineering

The research conducted in [19] presented a deep learn- UNSW-B15 Dataset. Decision Tree classifiers were applied
ing based intrusion detection system that made use of to candidates feature subsets and the results suggested that
non-symmetric deep auto-encoder (NDAE) for feature learn- GA-LR is an efficient method.
ing and a classification methodology using stacked NDAEs. Wang et al. [39] took a different direction in terms of the
An NDAE is an auto-encoder made of non-symmetrical mul- feature engineering approach by using a feature augmen-
tiple hidden layers. In simple terms, it is a deep neural net- tation (FA) algorithm rather than a feature reduction one.
work composed of many non-symmetrical hidden layers. The The classifier used in this research was the SVM and the
evaluation of the IDS scheme was made using two datasets: FA algorithm used was the logarithm marginal density ratio
the KDDCup 99 and the NSL-KDD. The performance of transformation. The goal was to obtain newly improved fea-
the multiclass classification experiment yielded an accuracy tures that would ultimately lead to a higher performance in
of 85.42% over the NSL-KDD dataset and an accuracy of detection accuracy. The evaluation of the proposed scheme
97.85% on the KDDCup 99 dataset. was conducted using the NSL-KDD dataset and the outcomes
In [26], the researchers gave an account of a multi-objective from the empirical experiments suggested the FA coupled
algorithm for feature selection labeled MOMI. This approach with the SVM yielded a robust and improved overall perfor-
is centered on Mutual Information (MI) and considers the fea- mance in intrusion detection capacity.
tures redundancy and relevancy during the feature evaluation In [40], an intrusion detection system (IDS) was designed
and selection process. The experiments carried out to evaluate and modelled based on DL using Recurrent Neural Networks
MOMI’s performance were conducted using the WEKA (RNNs). RNNs are neural networks whereby the hidden
tool [35] with three separate datasets. Two classifiers, namely layers act as the information storage units. The bench-
Naive Bayes (NB) and support vector machine (SVM) were mark dataset used in this research was the NSL-KDD. The
used. The results of this research suggested that MOMI RNN-IDS was compared to the following commonly used
was able to select only the features needed for the best classification methods: J.48, Random Forest and SVM. The
performance. accuracy (AC) was mainly used as the performance indi-
Chakraborty and Pal [29] presented a feature selection (FS) cator during the experiments and the results suggested that
algorithm using a multilayer percetron (MLP) framework RNN-IDS presented an improved accuracy of intrusion detec-
with a controlled redundancy (CoR). This approach is tion compared to traditional machine learning classification
labelled as FSMLP-CoR. An MLP is a neural network with methods. These results reinforced the assumption that DL
an input layer, multiple hidden layers and an output layer [30] based intrusion detection systems are superior to classic ML
and it is generally used for approximation, classification, algorithms. In the binary classification scheme, a model with
regression, and prediction in many domains [31]–[34]. In this 80 hidden nodes, a learning rate of 0.1 achieved an accu-
case, an MLP was used to identify and drop those features racy of 83.28% whereas in the multiclass classification using
that are not relevant in resolving the problem at hand. The 5 classes, a model with 80 hidden neurons and learning rate
FSMLP-CoR was tested using 23 datasets and the results of 0.5 got an accuracy of 81.29%.
led researchers to conclude that it was effective in selecting The approach proposed in [41] used a deep learning
important features. approach to intrusion detection for IEEE 802.11 wireless
In [36], an ant colony optimization (ACO) technique was networks using stacked auto encoders (SAE). A SAE is a
applied for feature selection on the KDDCup 99 dataset for neural network created by stacking together multiple layers
intrusion detection. The KDD Cup 99 dataset has 41 features. of sparse auto encoder. The experiments undertook in this
ACO was inspired by how ants use pheromones in a bid to research were made using the Aegean Wireless Intrusion
remember their path. ACO has different variations. In this Dataset (AWID) that is comprised of 155 attributes with the
research, the authors used the ant colony system (ACS) with last attribute representing the class that can take the follow-
two level pheromones update. The proposed solution was ing values: injection, flooding, impersonation and normal.
evaluated using the binary SVM classifier library in WEKA According to Thing [41], this was the first work that proposed
(LibSVM) [35]. The results revealed that a higher accuracy is a deep learning approach applied to IEEE 802.11 networks
obtained with an optimal feature subset of 14 inputs. for classification. The overall accuracy achieved in this work
The research in [37] proposed a wrapper based feature was 98.6688%.
selection algorithm for intrusion detection using the genetic Ding and Wang [42] investigated the use of DL for intru-
algorithm (GA) as an heuristic search method and Logistic sion detection technology using the KDDCup 99 Dataset.
Regression (LR) as the evaluating learning algorithm. The The architecture used for the neural network model con-
whole approach is labeled as GA-LR. GA originates from sisted of 5 hidden layers of 10-20-20-40-64 dense feed for-
the natural selection process and it is under the category ward (fully connected layers). The activation function used
of evolutionary based algorithms [38]. GA has the follow- in this research was the ReLU (Rectified Linear Unit) and
ing building blocks: an initial population, a fitness function, the back-propagation method for training this model was
a genetic operator (variation, crossover and selection) and a the Adam optimizer (Ad-op). The Ad-op was used in a bid
stopping criterion. The experiments conducted to evaluate the to increase the training speed and to prevent overfitting.
GA-LR were done using the KDD Cup 99 Dataset and the Although this research yielded some advancements, it equally

VOLUME 7, 2019 38599


S. M. Kasongo, Y. Sun: Deep Learning Method With Filter-Based Feature Engineering

showed no significant improvement in detecting rare attacks within the training set and x0 takes the label of k most similar
types (U2R and R2L) present in the dataset. neighbors [46].
In [43], an ML approach to detect flooding Denial of
Service (DoS) in IEEE 802.11 networks was proposed. The C. NAIVE BAYES
dataset used in this research was generated by the authors in Naive Bayes (NB) classifiers are simple classification algo-
a computer laboratory. The setup was made of 40 computers rithms based on Bayes’ Theorem [47]. Given a dataset, an NB
in which seven were designated as attackers to lunch the classifier assumes a ‘‘naive’’ independence between the fea-
flooding DoS and each of the legitimate node was connected tures. Let X an instance with n features to be classified repre-
to any of the available five Access Points (APs). The obtained sented by the vector X = (x1 , . . . , xn ). In order to figure out
dataset was segmented in the following two portions: 66% the class Ck for X , NB does the following:
for ML training and 34% for ML testing. Using the WEKA p(X |Ck )p(Ck )
tool [35], six classifications ML learning algorithms were p(Ck |X ) = (2)
P(X )
applied consecutively, namely: SVM, Naive Bayes, Naive
Bayes Net, Ripple-DOwn Rule Learner (RIDOR), Alternat- And the class for X is assigned using the following
ing Decision Tree and Adaptive Boosting (AdaBoost). The expression:
empirical results based on the accuracy and the recall num- n
Y
bers suggested that AdaBoost was more efficient than the y = argmax p (Ck ) p (Xi |Ck ) (3)
k∈{1,...,K }
other algorithms. i=1
In [44], a performance comparison of SVM, Extreme where y is the predicted label.
Learning Machine (ELM) and Random Forest (RF) for intru-
sion detection was investigated using the NSL-KDD as the D. DECISION TREE AND RANDOM FOREST
benchmark dataset. Each of the ML algorithms used in this Decision Tree (DT) algorithm is widely used in data mining
investigation was evaluated using the following performance and ML. Given a dataset with labeled instances (training),
metrics: Accuracy, Precision and Recall. The outcome of the DT algorithm generates a predictive model in a shape of a
experiments showed that ELM outperformed RF and SVM; tree capable of predicting the class of unknown records [14].
consequently, the authors concluded that ELM is a viable A DT has three main components: a root node, internal nodes
option when designing and implementing intrusion detection and category nodes. The classification processes happens in
systems. a top-down manner and an optimal decision is reached when
the correct category of leaf node is found. A Random Forest
IV. BACKGROUND ON TRADITIONAL MACHINE classifier on the other hand applies multiple DTs on a given
LEARNING CLASSIFIERS dataset for classification.
A. SUPPORT VECTOR MACHINE
Support Vector Machines (SVM) is one of the most pop- V. PROPOSED METHOD FOR WIRELESS
ular ML techniques applied to Big Data and used in ML INTRUSION DETECTION
research. SVM is a supervised machine learning method A. FEED FORWARD DEEP NEURAL NETWORKS
that is used to classify different categories of data. SVM is Deep neural networks (DNNs) are widely used in ML and
able to solve the complexity of both linear and non-linear DL to solve complex problems. The most basic element of
problems. SVM works by generating a hyperplane or several a DNN is an artificial neuron (AN) which is inspired from
hyperplanes within a high-dimensional space to separate data biological neurons within the human brain. An AN computes
and the ones that optimally split the data per class type are and forwards the sum of information received at its input side.
selected as the best [44]. Due the the non-linearity of real life problems and in a bid
to enhance learnability and approximation, each AN applies
B. K-NEAREST NEIGHBOR an activation function before generating an output [48]. This
K-Nearest Neighbor (KNN) is another ML method used to activation function can be a Sigmoid, σ = 1+e1 − t ; a Rectified
classify data. The KNN algorithm bases itself on the standard Linear Unit (ReLU): f (y) = max(0, y); or an hyperbolic
Euclidean distance between instances in a space and can tangent shown in expression (4).
be defined as follow [45]: let x and y instances in space P,
1 − e−2y
the distance between x and y, d(x, y), is given the following tanh(y) = (4)
expression: 1 + e−2y
v
u n The above-mentioned activation functions have advantages
uX and drawbacks; moreover, their optimal performance is prob-
d(x, y) = t (xk − yk )2 (1) lem specific. Traditionally, artificial neural networks (ANNs)
k=1
have an input layer, one to three hidden layers and an output
where n represents the total number of instances. The KNN layer as shown in Fig. 1; whereas DNNs may contain three
method classifies an instance x0 within a space by calculat- to tens or hundreds of hidden layers [49]. There is no general
ing the Euclidean distance between x0 and k closet samples rule for determining whether an ANN is deep or not. For the

38600 VOLUME 7, 2019


S. M. Kasongo, Y. Sun: Deep Learning Method With Filter-Based Feature Engineering

distribution, it is important to ensure that during the training


process, the selected model doesn’t train on the validation
data because training the model on previously seen data may
cause the final model to perform poorly. The next sections
explain in detail the role of each of the components in Fig. 2.

B. DATASET
In the proposed research, the NSL-Knowledge Discovery and
Data mining (NSL-KDD) which is an improved version of
the KDDCup 99 [19] is used to train, evaluate and test the
designed system shown in Fig 2. The NSL-KDD is con-
sidered a benchmark dataset in IDS research and it is used
for both wired and wireless systems [19], [39], [40], [44].
FIGURE 1. Feed forward neural network architecture.
The NSL-KDD comprises one class label categorized in the
following major groups: Normal, Probe, Denial of Service
(DoS), User to Root (U2R) and Remote to User (R2L).
sake of our research, we will consider a DNN to be a neural
Furthermore, the NSL-KDD is made of 41 features of which
network with two or more hidden layers. In a Feed Forward
three are nonenumeric and 38 are numeric as depicted
DNN, the flow of information goes in one direction only:
in Table 1.
from the input layers via the hidden layers to the output layers.
The NSL-KDD comes with two set of data: the training
Neurons within the same layer do not communicate. Each AN
set (KDDTrain+ full) and the test sets (KDDTest+ full and
in the current layer is fully connected to all neurons in the next
KDDTest-21). In this research, we use the KDDTrain+ and
layer as depicted in Fig.1.
the KDDTest+. KDDTrain+ is divided into two partitions:
the KDDTrain+75, which is 75 % of the KDDTrain+ and
it will be used for training, the KDDTEvaluation that is
25 % the KDDTrain+ and it will be used for evaluation after
the training process. Table 2 provides a breakdown of the
components in each dataset.

C. FEATURE ENGINEERING
In a dataset, features may take different forms such as numeric
and nonnumeric. DNN models can only process numeric

TABLE 1. NSL-KDD Features List.

FIGURE 2. Proposed FFDNN architecture.

The block diagram in Fig. 2 presents the architecture of


the proposed Feed Forward Deep Neural Network (FFDNN)
IDS. In this architecture, the first step consists of the sep-
aration of raw data. It is crucial to split the main training
set between two main sets: the reduced training set and the
evaluation set. The evaluation dataset is used to validate
the training process. The test set has a totally different data
distribution from the training and evaluation (validation) sets.
The second step involves a feature transformation process and
a two-way normalization process of the raw data as well as a
feature extraction (selection) procedure based on Information TABLE 2. Datasets breakdown.
Gain. It is important to transform and to normalize the data
because most of the features within a dataset come in different
formats that can be numerical or nonnumberical. The last
processes of the architecture are the models training and
testing using the FFDNN and the FEU-FFDNN. Since the
training and the validation data originate from the same data

VOLUME 7, 2019 38601


S. M. Kasongo, Y. Sun: Deep Learning Method With Filter-Based Feature Engineering

values; therefore it is crucial to transform all nonnumeric or Algorithm 1 Normalization Algorithm


symbolic features into a numerical counterpart. Within the Input: F(f1 , . . . , fn ), 1 < n < T
NSL-KDD, f 2 ‘protocol_type’, f 3 ‘service’ and f 4 ‘flag’ Output: Ftransformed (f1t , . . . , fnt ):
are symbolic features. We apply a mapping process in Scikit for i from 1 to n do
Learn [50] whereby all symbols are mapped to a unique if (fi a symbolic feature) then
numerical value. Moreover, it is important to transform and apply sckikit learn mapping
normalize features as they may have an uneven distribution. Step 1 normalize using log(fi + 1)
fi −min(fi )
For instance, taking a look at f 5 which represents ‘src_bytes’ Step 2 normalize using (b − a) max(f i )−min(fi )
in Table 1: f 5 has values such as 12983 and values like end if
20; consequently, normalization is required to keep values Step 1 normalize using log(fi + 1)
fi −min(fi )
within the same range for optimal processing. In this research, Step 2 normalize using (b − a) max(f i )−min(fi )
we apply a two-step normalization procedure. We first apply a end for
logarithmic normalization shown in expression (5) to all the
features so that we keep them within acceptable range and Algorithm 2 Features IG Ranking Algorithm
secondly, we linearly cap the values to be in this range [0, 5] Input: Ftransformed (f1t , . . . , fnt )
using equation in (6). Output: Franked
xnormalized = log(xi + 1) (5) for i from 1 to n do
xi − min(xi ) compute IG: IGi (fi |C) = H (fi ) − H (fi |C)
xnormalized = (b − a) (6) if (IGi >= IGtreshold ) then
max(xi ) − min(xi )
load IGi into Franked
where b = 5 and a = 0. end if
After the two step normalization process, the Feature end for
Extraction Unit (FEU) has the role to rank the features using
an algorithm based on Information Gain (IG) [51] which orig-
inates from Information Theory [52]. We will compute the IG E. ALGORITHM FOR TRAINING FFDNNs
of each feature with respect to the class. Unlike the standard Training feed forward neural networks consists of the follow-
correlation algorithms such as Pearson Linear Correlation ing three major steps:
Coefficient [53] that is only capable of establishing linear 1) Forward propagation.
correlations between features, IG is capable of discovering 2) Back-propagation of the computed error.
nonlinear connection as well. In information theory, the mea- 3) Updating the weights and biases.
sure of uncertainty of a variable X is called entropy, H (X ),
The algorithm used to train the FFDNNs is explained
and it is calculated as follow:
in Algorithm 3. Given a set of m training sample
{(x1 , y1 ), . . . , (xm , ym )} and η the learning rate. We train
X
H (X ) = − P(x)log2 (x) (7)
x∈X FFDNNs presented in this research using the back propaga-
And the conditional entropy of two random variables X and tion algorithm backed by a stochastic gradient descent (SDG)
Y is determined using the following expression: for the weights and biases update. Additionally, the cost
X X function used to calculate the difference between the target
H (X |Y ) = − P(x) P(x|y)log2 (P(x|y)) (8) and the obtained output is shown in this expression:
x∈X y∈Y 1
C(W , b; x, y) = ky−outputk2 (10)
where P is the probability. IG is derived from the expressions 2
in (7) and (8) as follow:
VI. EXPERIMENTAL SETTING, RESULTS AND
IG(X |Y ) = H (X ) − H (X |Y ) (9)
DISCUSSIONS
Therefore, a given feature Y possesses a stronger correla- For the purpose of this research, we have used a Python based
tion to feature X than feature V if IG(X |Y ) > IG(V |Y ). library, Scikit-Learn [50] which is widely used in machine
learning and deep learning research. Our simulations were
D. ALGORITHMS FOR FEATURE ENGINEERING executed on an ASUS laptop with the following specifica-
Given a feature vector F(f1 , . . . , fn ) with 1 < n < T , where tions: Intel Core i-3-3217U CPU @1.80GHz and 4.00G of
T is the total number of features and C the class label in RAM. The metrics used to evaluate the performance of the
the dataset, the Transform Features module in Fig 2. applies FFDNNs presented in this research are the accuracy in (11),
Algorithm 1 as follow: the precision in (12) and the recall in (13). These indicators
After the execution of Algorithm 1, we obtained a trans- are derived from the confusion matrix shown in Table 3 and
formed feature vector, Ftransformed (f1t , . . . , fnt ), that is fed into they are defined as follow:
Algorithm 2 to generate a vector, Franked , with features that • True positive (TP): Intrusions that are successfully
are ranked by IG with respect to C. detected by the proposed IDS.

38602 VOLUME 7, 2019


S. M. Kasongo, Y. Sun: Deep Learning Method With Filter-Based Feature Engineering

Algorithm 3 Forward and Back-Propagation Algorithm major classes. In this research, the following rule applies: a
Input: W , b classifier performs better than another one when it yields a
Output: updated W , b higher accuracy on previously unseen data that can be found
1: Forward propagate xi through layers l = L2, L3, . . . Lnl , in the KDDTest+ set.
(nl is the subscript of the last layer) using zl+1 =
W l al + bl and al+1 = f (zl+1 ) with f , a rectified linear A. PHASE 1: BINARY CLASSIFICATION WITH 41 FEATURES
unit (ReLU) of this form f (z) = max(0, z) This phase uses all 41 features for binary classification.
2: Compute the error term ξ for each output unit i as follow: We only apply Algorithm 1 to transform the inputs. In order
d 1 to select the best FFDNN, we ran models with 41 units at
ξinl = ky−outputk2 = −(yi −anl 0 nl
i ).f (zi ) the input layer, two nodes at the output layer and the fol-
d(znl
i ) 2
lowing hidden nodes numbers: 30, 40, 60, 80 and 150. These
3: For each hidden units in l = nl − 1, nl − 2, . . . , 2, numbers were selected by trial and error method. Moreover,
compute the following for each node i in l: we were also varying the number hidden layers as well as the
sl+1
learning rate. The details are presented in Table 4. For better
X performance analysis and for the purpose of comparison,
ξil = Wjil ξjl+1 .f 0 (zli )
we also perform classification using the following classifier:
j=1
SVM, KNN, RF, DT and NB. The obtained results suggested
4: Calculate the required partial derivatives with respect to that for binary classification, a model with a learning rate of
weights and biases for each training example as follow: 0.05, 30 neurons spread over 3 hidden layers got an accuracy
of 99.69% on the KDDEvaluation set and 86.76% on the
d
C(W , b; x, y) = alj ξil+1 KDDTest+. Fig. 3 shows a comparison of this model with
dWijl other classification algorithms. The Random Forest classifier
d with an accuracy of 85.18% for the KDDTest+ came into sec-
C(W , b; x, y) = ξil+1 ond position after the FFDNN model and the SVM classifier
dbli
produced an accuracy of 84.41% on the same test set.
5: Update the weight and biases as follow:
TABLE 4. Accuracy during training of FFDNN - binary classification.
Wijl = Wijl − ηalj ξil+1
bli = bli − ηξil+1

TABLE 3. Confusion matrix.

• False positive (FP): Normal / non-intrusive behaviour


that is wrongly classified as intrusive by the IDS.
• True Negative (TN): Normal / non-intrusive behaviour
that is successfully labelled as normal/non-intrusive by
the IDS.
• False Negative (FN): Intrusions that are missed by the
IDS, and classified as normal / non-intrusive.
TP + TN
Accuracy = (11)
TP + TN + FP + FN
TP
Precision = (12)
TP + FP B. PHASE 2: MULTICLASS CLASSIFICATION
TP WITH 41 FEATURES
Recall = (13)
TP + FN We conducted multiclass classification in this phase by using
The experiments were carried out in multiple phases using five classes of the NSL-KDD dataset with all 41 features.
the NSL-KDD dataset explained in section V. The NSL-KDD As described in Table 5, the FFDNN model with 60 nodes
has the following classes: Normal, DoS, Probe, U2R and spread through three hidden layers with a learning rate
R2L. For binary classification, we map the DoS, Probe, U2R of 0.05 got an accuracy of 86.62% which is a much better
and R2L classes to one class called ‘‘attack’’ and for multi- performance compared to other FFDNN models settings.
class classification, we use the dataset with its original five In order to put this experiment in perspective, we also

VOLUME 7, 2019 38603


S. M. Kasongo, Y. Sun: Deep Learning Method With Filter-Based Feature Engineering

FIGURE 3. Binary classification accuracy comparison. FIGURE 4. Multiclass classification accuracy comparison.

TABLE 5. Accuracy during training of FFDNN - multiclass classification.


TABLE 6. Ranked Features.

conducted a multiclass classification using SVM, KNN, RF,


DT and NB classifiers. As depicted in Fig. 4, the comparison
shows that FFDNN outperformed all other classifier on the
test data; however, the RF classifier performed relatively well
with an accuracy of 86.35% on test data and the SVM model
got an accuracy of 83.83%. D. PHASE 4: BINARY CLASSIFICATION WITH A REDUCED
FEATURES VECTOR
C. PHASE 3: FEATURE EXTRACTION Table 7 shows multiple FEU-FFDNN models that all have
We applied Algorithm 1 and Algorithm 2 in the FEU to the 21 inputs and two outputs. The best performing model with
KDDTrain+ Full dataset in order to extract a reduced vector 30 hidden nodes, a hidden layer size of 3 and a learning
of features. The goal in this step was to select the features rate of 0.05 got an accuracy of 99.37% over the evaluation
with enough information gain (IG) with respect to the class. set and 87.74% on the test data. This is an improvement
The filtering process generated the features in Table 6 which over the best model using 41 inputs in Phase 1. Additionally,
represents 21 features. Fig. 5 shows an accuracy comparison between SVM, KNN,
In the next two phases, we repeat the experiments in RF, DT, NB and FEU-FFDNN classifier for better contrast-
Phase 1 and Phase 2; however, in these instances, a reduced ing. We noticed that the FEU-FFDNN outperformed other
feature vector Franked of 21 ranked features is used. methods.

38604 VOLUME 7, 2019


S. M. Kasongo, Y. Sun: Deep Learning Method With Filter-Based Feature Engineering

TABLE 7. Accuracy during training of FEU-FFDNN - Binary Classification. TABLE 8. Accuracy during training of FEU-FFDNN - multiclass
classification.

FIGURE 5. Binary classification accuracy comparison with reduced


features set.
FIGURE 6. Multiclass classification accuracy comparison with reduced
features set.

E. PHASE 5: MULTICLASS CLASSIFICATION WITH A


REDUCED FEATURES VECTOR in Fig. 7 where class 0 = ‘normal’, class 1 = ‘R2L’, class 2 =
In this stage of the experiments, we ran several FEU-FFDNN ‘U2R’, class 3 = ‘Probe’ and class 4 = ‘DoS’. This curve
models and we used all classes groups present in the gave us more details on how our model performed for differ-
NSL-KDD dataset. The model that performed the best has ent classes.
150 neurons, a learning rate of 0.05 and it yielded an accu-
racy of 99.54% on the validation data and 86.19% on the F. DISCUSSIONS
test data. In comparison to the results in Phase 2 of this Our research explores in detail the application of FFDNNs
research, this model needed more neurons as the feature to wireless intrusion detection using the NSL-KDD dataset.
vector dimension was reduced by the filtering process. Addi- Experiments were carried out for both binary and multiclass
tionally, Fig. 6 shows a comparison of this model to existing classification. In the first two phases of the experimental
ML models and the results showed that the FEU-FFDNN process, the training and testing of the models were done
based model outperformed all other existing models. using the entire feature vector. The results suggested that for
Moreover, for the best performing model (150 neurons, both phases, FFDNNs outperformed other ML models. For
three hidden layers, learning rate = 0.02), we plotted the binary classification, FFDDNNs required less neurons than
precision and recall curve over the test dataset as seen for multiclass classification. In Phase 1, only 30 nodes spread

VOLUME 7, 2019 38605


S. M. Kasongo, Y. Sun: Deep Learning Method With Filter-Based Feature Engineering

detection has yet to be found. The FFDNN models used in


this research were coupled to a FEU using IG in a bid to
reduce the input dimension while increasing the accuracy of
the classifier. The dataset used in this work is the NSL-KDD.
For the binary and the multiclass classifications problems,
the FFDNNs models both with a full and a FEU-reduced
feature space achieved a performance that is superior to SVM,
RF, NB, DT and KNN. In future work, we aim at finding
a strategy to increase the detection rates of R2L and U2R
attacks in the NSL-KDD dataset. Moreover, we will apply
the FEU and the FFDDNs to the AWID dataset in order to
investigate further the superiority of DL based methods for
IDS over other ML approaches.

FIGURE 7. Precision-Recall curve.


VIII. ACKNOWLEDGMENT
This research is partially supported by the South African
over three hidden layers were needed for the generalization
National Research Foundation (Nos: 112108, 112142);
of our model; however, 60 neurons in three hidden layers
SouthAfrican National Research Foundation Incentive Grant
were necessary for better approximation in the multiclass
(No.95687); Eskom Tertiary Education Support Programme
problem. Although the depth (number of hidden layers) was
Grant; Research grant from URC of University of Johannes-
not affected, we can derive that the more attacks classes we
Burg.
have, the more neurons are needed to solve the complexity of
the intrusion detection classification problem.
In Phase 3, a feature transformation and extraction pro- REFERENCES
cedure was executed based on IG and a feature vector with [1] M. E. Aminanto, R. Choi, H. C. Tanuwidjaja, P. D. Yoo, and K. Kim,
21 ranked features was generated. ‘‘Deep abstraction and weighted feature selection for Wi-Fi impersonation
detection,’’ IEEE Trans. Inf. Forensics Security, vol. 13, no. 3, pp. 621–636,
In Phase 4 and Phase 5, the experiment carried out in Mar. 2018.
Phase 1 and Phase 2 were repeated respectively using a [2] C. Kolias, G. Kambourakis, A. Stavrou, and S. Gritzalis, ‘‘Intrusion
feature vector with a reduced dimension obtained from Phase detection in 802.11 networks: Empirical evaluation of threats and a pub-
lic dataset,’’ IEEE Commun. Surveys Tuts., vol. 18, no. 1, pp. 184–208,
3. The results achieved in Phase 4 showed that with the 1st Quart., 2016.
same number of neurons as well as the same learning rate, [3] R. Mitchell and I.-R. Chen, ‘‘A survey of intrusion detection in wire-
less network applications,’’ Comput. Commun., vol. 42, no. 3, pp. 1–23,
the accuracy of the FEU-FFDNN model increased from Apr. 2014. doi: 10.1016/j.comcom.2014.01.012.
86.76% to 87.74% on the KDDTest+. For multiclass clas- [4] J. Hu, X. Yu, D. Qiu, and H. H. Chen, ‘‘A simple and efficient hidden
sification using the FEU in Phase 5, we obtained an overall Markov model scheme for host-based anomaly intrusion detection,’’ IEEE
Netw., vol. 23, no. 1, pp. 42–47, Jan. 2009.
accuracy of 86.19% with a depth of three hidden layers and [5] D. A. Effendy, K. Kusrini, and S. Sudarmawan, ‘‘Classification of intrusion
150 neurons. Here as well, the FEU-FDNN outperformed detection system (IDS) based on computer network,’’ in Proc. Int. Conf. Inf.
other methods as revealed in Fig.6. Moreover, we studied the Tech, Inf. Sys. Elec. Eng., Nov. 2017, pp90-94.
[6] E. Viegas, A. O. Santin, A. França, R. Jasinski, V. A. Pedroni, and
intrinsic details of the classification in Phase 5 by plotting L. S. Oliveira, ‘‘Towards an energy-efficient anomaly-based intrusion
a Precision-Recall curve as depicted in Fig. 7. Based on the detection engine for embedded systems,’’ IEEE Trans. Comput., vol. 66,
no. 1, pp. 163–177, Jan. 2017.
curve area values, Class 1 and class 2 were the classes with the [7] S. M. H. Bamakan, B. Amiri, M. Mirzabagheri, and Y. Shi, ‘‘A new
most misclassifications instances because they do not appear intrusion detection approach using PSO based multiple criteria linear
often in both the training and test datasets. programming,’’ Procedia Comput. Sci., vol. 55, pp. 231–237, Aug. 2015.
[8] P. Louridas and C. Ebert, ‘‘Machine learning,’’ IEEE Softw., vol. 33, no. 5,
Additionally, in comparison to other deep learning based pp. 110–115, May 2016.
methodologies using 41 features for multiclass classifica- [9] Y. Xin et al., ‘‘Machine learning and deep learning methods for cyberse-
curity,’’ IEEE Access, vol. 6, pp. 35365–35381, 2018.
tion such as stacked non-symmetric auto-encoders (S-NDAE) [10] Y. Y. Aung and M. M. Min, ‘‘Hybrid intrusion detection system using K-
used in [19] that got 85.42% and recurrent neural net- means and K-nearest neighbors algorithms,’’ in Proc. IEEE/ACIS 17th Int.
works (RNN) used in [40] that achieved an overall accuracy Conf. Comput. Inf. Sc., Jun. 2018, pp. 34–38.
[11] P. Arumugam and P. Jose, ‘‘Efficient decision tree based data selection and
of 81.29%; the FFDNN in our research produced an accuracy support vector machine classification,’’ Mater. Today Proc., vol. 5, no. 1,
of 86.62% on the test set, which is superior to the S-NDAE pp. 1679–1685, 2018.
[12] A. Dastanpour, S. Ibrahim, R. Mashinchi, and A. Selamat, ‘‘Comparison
and RNN models. of genetic algorithm optimization on artificial neural network and support
vector machine in intrusion detection system,’’ in Proc. IEEE Conf. Open
Syst. (ICOS), Oct. 2014, pp. 72–77.
VII. CONCLUSION [13] N. Farnaaz and M. A. Jabbar, ‘‘Random forest modeling for network
This paper presented the design, implementation and testing intrusion detection system,’’ Procedia Comput. Sci., vol. 89, pp. 213–217,
of a DL based intrusion detection system using FFDNNs. May 2016.
[14] F. Tian, X. Cheng, G. Meng, and Y. Xu, ‘‘Research on flight phase division
A literature review of ML and DL methods was conducted based on decision tree classifier,’’ in Proc. Int. Conf. Comput. Intell.
and it was found that the most efficient approach to intrusion Appl.(ICCIA), Sep. 2017, pp. 372–375.

38606 VOLUME 7, 2019


S. M. Kasongo, Y. Sun: Deep Learning Method With Filter-Based Feature Engineering

[15] A. Shenfield, D. Day, and A. Ayesh, ‘‘Intelligent intrusion detection [40] C. Yin, Y. Zhu, J. Fei, and X. He, ‘‘A deep learning approach for intru-
systems using artificial neural networks,’’ ICT Express, vol. 4, no. 2, sion detection using recurrent neural networks,’’ IEEE Access, vol. 5,
pp. 95–99, Jun. 2018. pp. 21954–21961, 2017.
[16] L. van Efferen and A. M. Ali-Eldin, ‘‘A multi-layer perceptron approach [41] V. L. Thing, ‘‘IEEE 802.11 network anomaly detection and attack classifi-
for flow-based anomaly detection,’’ in Proc. Int. Symp. Netw., Comput. cation: A deep learning approach,’’ in Proc. Wireless Commun. Netw. Conf.
Commun. ISNCC, May 2017, pp. 1–6. (WCNC), May 2017, pp. 1–6.
[17] Z. Chiba, N. Abghour, K. Moussaid, A. El Omri, and M. Rida, ‘‘A novel [42] S. Ding and G. Wang, ‘‘Research on intrusion detection technology
architecture combined with optimal parameters for back propagation neu- based on deep learning,’’ in Proc. Int. Conf. Comput. Commun. (ICCC),
ral networks applied to anomaly network intrusion detection,’’ Comput. Dec. 2017, pp. 1474–1478.
Secur., vol. 75, pp. 36–58, Jun. 2018. [43] M. Agarwal, D. Pasumarthi, S. Biswas, and S. Nandi, ‘‘Machine learn-
[18] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521, ing approach for detection of flooding DoS attacks in 802.11 networks
pp. 436–444, May 2015. and attacker localization,’’ Int. J. Mach. Learn. Cybern., vol. 7, no. 6,
[19] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, ‘‘A deep learning approach to pp. 1035–1051, Dec. 2016.
network intrusion detection,’’ IEEE Trans. Emerg. Topics Comput. Intell., [44] I. Ahmad, M. Basheri, M. J. Iqbal, and A. Raheem, ‘‘Performance com-
vol. 2, no. 1, pp. 41–50, Feb. 2018. parison of support vector machine, random forest, and extreme learning
[20] I. Lopez-Moreno, J. Gonzalez-Dominguez, D. Martinez, O. Plchot, and machine for intrusion detection,’’ IEEE Access, vol. 6, pp. 33789–33795,
P. J. Moreno, ‘‘On the use of deep feedforward neural networks for auto- 2018.
matic language identification,’’ Comput. Speech Lang., vol. 40, pp. 46–59, [45] B. Trstenjak, S. Mikac, and D. Donko, ‘‘KNN with TF-IDF based frame-
Nov. 2016. work for text categorization,’’ Procedia Eng., vol. 69, pp. 1356–1364,
[21] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Delving deep into rectifiers: May 2014.
Surpassing human-level performance on imagenet classification,’’ in Proc. [46] S. Tan, ‘‘An effective refinement strategy for KNN text classifier,’’ Expert
IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 1026–1034. Syst. Appl., vol. 3, no. 2, pp. 290–298, 2006.
[22] S. Agatonovic-Kustrin and R. Beresford, ‘‘Basic concepts of artificial [47] M. O. Mughal and S. Kim, ‘‘Signal classification and jamming detection
neural network (ANN) modeling and its application in pharmaceutical in wide-band radios using Naïve bayes classifier,’’ IEEE Commun. Lett.,
research,’’ J. Pharmaceutical Biomed. Anal., vol. 22, no. 5, pp. 717–727, vol. 22, no. 7, pp. 1398–1401, Jul. 2018.
2000. [48] W. Mo, C. L. Gutterman, Y. Li, S. Zhu, G. Zussman, and
[23] H. Liu and L. Yu, ‘‘Toward integrating feature selection algorithms for D. C. Kilper, ‘‘Deep-neural-network-based wavelength selection and
classification and clustering,’’ IEEE Trans. Knowl. Data Eng., vol. 17, switching in ROADM systems,’’ IEEE/OSA J. Opt. Commun. Netw.,
no. 4, pp. 491–502, Apr. 2005. vol. 10, no. 10, pp. D1–D11, 2018.
[24] S. S. Kannan and N. Ramaraj, ‘‘A novel hybrid feature selection via [49] F. Farahnakian and J. Heikkonen, ‘‘A deep auto-encoder based approach
symmetrical uncertainty ranking based local memetic search algorithm,’’ for intrusion detection system,’’ in Proc. Int. Conf. Adv. Commun. Technol.
Knowl.-Based Syst., vol. 23, no. 6, pp. 580–585, Aug. 2010. (ICACT), Feb. 2018, pp. 178–183.
[25] A. Taherkhani, G. Cosma, and T. M. McGinnity, ‘‘Deep-FS: A fea- [50] R. Garreta and G. Moncecchi, Learning Scikit-Learn: Machine Learning
ture selection algorithm for deep boltzmann machines,’’ Neurocomputing, in Python. Birmingham, U.K.:Packt Publishing Ltd, 2013.
vol. 322, pp. 22–37, Dec. 2018. [51] Z. Gao, Y. Xu, F. Meng, F. Qi, and Z. Lin, ‘‘Improved information gain-
[26] M. Labani, P. Moradi, M. Jalili, and X. Yu, ‘‘An evolutionary based multi- based feature selection for text categorization,’’ in Proc. Int. Conf. Wire-
objective filter approach for feature selection,’’ in Proc. World Congr. less Commun. Veh. Technol. Inf. Theory Aerosp. Electron. Sys. (VITAE),
Comput. Commun. Tech. (WCCCT), Feb. 2017, pp. 1510–154. Aug. 2014, pp. 1–5.
[27] P. S. Tang, X. L. Tang, Z. Y. Tao, and J. P. Li, ‘‘Research on feature [52] C. E. Shannon, ‘‘A mathematical theory of communication,’’ ACM SIG-
selection algorithm based on mutual information and genetic algorithm,’’ MOBILE Mobile Comput. Commun. Rev., vol. 5, no. 1, pp. 3–55, 2001.
in Proc. 11th Int. Comput. Conf. Wavelet Active Media Tech. Inf. Process. [53] H. Zhou, Z. Deng, Y. Xia, and M. Fu, ‘‘A new sampling method in particle
(ICCWAMTIP), Dec. 2014, pp. 403–406. filter based on Pearson correlation coefficient,’’ Neurocomputing, vol. 216,
[28] C. Liu, W. Wang, Q. Zhao, X. Shen, and M. Konan, ‘‘A new feature pp. 208–215, May 2016.
selection method based on a validity index of feature subset,’’ Pattern
Recognit. Lett., vol. 92, pp. 1–8, Jun. 2017.
[29] R. Chakraborty and N. R. Pal, ‘‘Feature selection using a neural framework SYDNEY MAMBWE KASONGO received the
with controlled redundancy,’’ IEEE Trans. Neural Netw. Learn. Syst., master’s (M.Tech.) degree in computer systems
vol. 26, no. 1, pp. 35–50, Jan. 2015. from the Tshwane University of Technology,
[30] L. Vanneschi and M. Castelli, ‘‘Multilayer perceptrons,’’ Encyclopedia in 2017. He is currently pursuing the Ph.D. degree
Bioinf. Comput. Biol., vol. 1, pp. 612–620, Jun. 2019. in electrical and electronic engineering with the
[31] F. Murtagh, ‘‘Multilayer perceptrons for classification and regression,’’ University of Johannesburg. His current research
Neurocomputing, vol. 2, nos. 5–6, pp. 183–197, Jul. 1991. interests include machine learning, deep learning,
[32] J. George and S. G. Raj , ‘‘Leaf recognition using multi-layer percep-
computer networks security, wireless networks,
tron,’’ in Proc. Int. Conf. Energy Commun. Data Analytics Soft Comput.
(ICECDS), Aug. 2017, pp. 2216–2221. and data science.
[33] H. Amakdouf, M. E. Mallahi, A. Zouhri, A. Tahiri, and H. Qjidaa,
‘‘Classification and recognition of 3D image of charlier moments using
a multilayer perceptron architecture,’’ Procedia Comput. Sci., vol. 127, YANXIA SUN received the joint D.Tech. degree
pp. 226–235, Aug. 2018. in electrical engineering from the Tshwane Uni-
[34] A. Mondal, A. Ghosh, and S. Ghosh, ‘‘Scaled and oriented object tracking
versity of Technology, South Africa, and the Ph.D.
using ensemble of multilayer perceptrons,’’ Appl. Soft Comput., vol. 73,
degree in computer science from University Paris-
pp. 1081–1094, Dec. 2018.
[35] I. H. Witten, M. A. Hall, E. Frank, and C. J. Pal, ‘‘The WEKA workbench,’’ EST, France, in 2012. She is currently an Associate
in Data Mining: Practical Machine Learning Tools and Techniques, 4th ed. Professor or the Head of the Department of Elec-
Burlington, MA, USA: Appendix, 2017, pp. 553–571. trical and Electronic Engineering Science, Uni-
[36] T. Mehmod and H. B. M. Rais, ‘‘Ant colony optimization and feature selec- versity of Johannesburg, South Africa. She has
tion for intrusion detection,’’ in Advances in Machine Learning and Signal 15 years teaching and research experience. She
Processing, vol. 387. New York, NY, USA: Springer, 2016, pp. 305–312. has lectured five courses in the universities. She
[37] C. Khammassi and S. Krichen, ‘‘A GA-LR wrapper approach for fea- has supervised or co-supervised five postgraduate projects to completion.
ture selection in network intrusion detection,’’ Comput. Secur., vol. 70, She is currently supervising four master’s students and six Ph.D. students.
pp. 255–277, Sep. 2017.
[38] J. McCall, ‘‘Genetic algorithms for modelling and optimisation,’’ J. Com-
She published 42 papers including 14 ISI master indexed journal papers. She
put. Appl. Math., vol. 184, no. 1, pp. 205–222, Dec. 2005. is the Investigator or Co-Investigator for six research projects. She is the
[39] H. Wang, J. Gu, and S. Wang, ‘‘An effective intrusion detection framework member of the South African Young Academy of Science. Her research inter-
based on SVM with feature augmentation,’’ Knowl.-Based Syst., vol. 136, ests include renewable energy, evolutionary optimization, neural networks,
pp. 130–139, Nov. 2017. nonlinear dynamics, and control systems.

VOLUME 7, 2019 38607

You might also like