A Novel Algorithm For Network Anomaly Detection Using Adaptive Machine Learning
A Novel Algorithm For Network Anomaly Detection Using Adaptive Machine Learning
A Novel Algorithm For Network Anomaly Detection Using Adaptive Machine Learning
net/publication/317185394
CITATIONS READS
4 329
2 authors:
Some of the authors of this publication are also working on these related projects:
Performance Analysis of MLPFF Neural Network Back propagation Training Algorithms for Time Series Data View project
All content following this page was uploaded by Dr Venugopalan S R on 15 December 2017.
1
Department of Computer Science, Govt. Arts College, Thiruchirapalli, Tamilnadu, India
2
Aeronautical Development Agency (Ministry of Defence, GoI), Bangalore - 560017, India.
1 akudaiyar@yahoo.com, 2 venu_srv@yahoo.com
Abstract. Threats in the Internet are posing high risk on Security of Informa-
tion and network anomaly detection has become an important issue/area in In-
formation Security. Data mining algorithms are used to find patters and charac-
teristic rules in huge data and this is very much used in Network Anomaly De-
tection System (NADS). Network traffic has several attributes of qualitative and
quantitative nature, which needs to treated/normalized differently. In general, a
model is built with the existing data and the system is trained with the model
and then used to detect intrusions. The major and important issue with such
NADS is that the network traffic changes over time, in such cases the system
should get trained automatically or retrained. This paper presents an adaptive
algorithm that gets trained according to the network traffic. The presented algo-
rithm is tested with Kyoto University’s 2006+ Benchmark dataset. It can be
observed that the results of the proposed algorithm outperform all the
know/commonly used classifiers and are very much suitable for Network Ano-
maly Detection.
1 INTRODUCTION
Internet has brought huge potential for business and on the other hand it poses lots of
risk for the business. Internet is a global public network [12]. Intrusion is a delibe-
rate, unauthorized, illegal attempt to access, manipulate or taking possession of In-
formation System to render them unreliable or unusable. Intrusion Detection is the
process of identifying various events occurring in a system/network and analyzing
them for possible presence of Intrusion. Intrusion Detection Systems (IDS) can be
classified into three types based on the method on which intrusion are detected name-
ly Signature-Based, Anomaly Based and Hybrid. Statistical methods and clustering
are used for Anomaly detection Systems [12]. The availability of higher bandwidth
and sophisticated hardware and software, the need to detect intrusions in real-time and
the adaptation of the detection algorithm to the ever changing traffic pattern is a big
challenge. IDS should adapt to the traffic behaviors and learn automatically. In this
paper, an algorithm is proposed for network anomaly detection. The results i.e. Per-
formance metrics of the experiment are encouraging. The proposed algorithm can
detect new/unknown attacks and can learn and adapt automatically based on the net-
work traffic.
The organization of the paper is as follows: Section 2 gives the background and the
literature surrounding IDS with necessary performance metrics. The problem descrip-
tion and the algorithm development are discussed in Section 3. In section 4 the data-
set used in this study, data pre-processing, data normalization used in this study and
the training & test dataset generation are discussed. The experiment and the results
are discussed in section 5.Conclusions and future work in given in section 6.
Panda, M. et al proposed Naïve Bayes for Network Intrusion Detection and found that
the performance of Naïve Bayes is better in term of False Positive rate, cost and
Computational time for KDD ’99 datasets and same was compared with back propa-
gation neural networks approach [20]. Jain et al in their work have combined Infor-
mation Gain with Naïve Bayes for improving the attack detection and have observed
higher detection rate and reduced false alarm [21]. Muda. Z. et al in their work have
used K-means to cluster the data and used Naïve Bayes classifier to classify the KDD
Cup99[3] data and have achieved better performance than Naïve Bayes classifier [22].
They have achieved 99.7% accuracy, a detection rate of 99.8% and 0.5 false alarm
rate.
FVBRM model is proposed by the authors of [13] for feature selection and com-
pared it with other selection methods by reducing the features of the dataset and then
classifying with Naive Bayes classifier. There is no mention about how the qualita-
tive and quantitative attributes are treated. The authors of [14] have compared the
results of Naïve Bayes algorithm with decision tree and concluded that from the per-
formance point of view Naïve Bayes provides competitive results for KDD 99[5]
dataset. K-means clustering algorithm was applied for Intrusion Detection and con-
cluded that k-means method id very efficient in partitioning huge dataset and has
better global search ability [15, 1]. K-means Clustering is a good unsupervised algo-
rithm used to find out structured patterns in the data but the computational complexity
is high for its application in intrusion detection. A Novel Density Based K-Means
Cluster was proposed for signature based intrusion detection [16] where results show
improved accuracy and detection rate with reduced false positive rate. It not very
clear that which normalization technique was used and how the discrete and conti-
nuous data was treated. Sharma et al. [17] proposed K-Means clustering via Naïve
Bayes for KDD Cup ’99 dataset. This approach outperforms the Naïve Bayes in terms
of detection rate and higher false positives which is a concern
SM Hussein et al. in their work compared the performance of Naïve Bayes, Bayes
Net and J48graft and recorded that Naïve Bayes performs better in terms of rate of
detection and time to build model whereas J48 was better in terms of false alarm rate
[19]. Earlier works which were reviewed in this section tried in achieving higher per-
formance with the help of pre-processing/feature reduction and have achieved per-
formance improvements. The study of the existing literature reveals the need for a
novel algorithm to detect unknown attacks because they have not considered the fol-
lowing points. a) Ever changing network traffic/speed, new attacks and the need for
the algorithm to adapt itself and learn/get trained automatically from the changing
traffic. b) The ability of the algorithms/methods described in literature to perform well
for datasets other than the tested ones. The algorithms were tested with the only one
dataset. c) Either attack or normal data is used for training and not both
d) Network traffic data contains features that are qualitative or quantitative nature and
has to be treated differently and have to use different pre-processing/normalization
technique and e) Earlier works have measured accuracy, detection rate and false alarm
rate only as a performance measure which may not be sufficient, measure such as F-
Score, sensitivity are required for evaluation an algorithm/method.
F-Score
The harmonic mean between precision and recall is called as F-Score/F-measure. F-
Score is considered as a measure of the accuracy of a test. Good IDS performance is
achieved by improving both precision and recall. Both precision and recall are consi-
dered for computing F-Score. An F-Score of 1 is considered as best and 0 as worst.
2∗𝑃𝑃∗𝑅𝑅
𝐹𝐹 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑃𝑃+𝑅𝑅
(1)
Training Phase: The training dataset is grouped based on the label as attack and
normal sessions. 5000 attack records and 5000 normal records are used for training.
Find the centroid of the attack class and normal class. For numerical attributes the
mean (or) average is calculated and for the categorical attributes median is calculated.
The centroid will be a set of values.
Testing Phase: For each record in the testing data, the following steps are followed.
BEGIN
1. Initialize the
2. Read the attack and normal traffic data. // attack data is referred as a[5000][14]
and normal data as n[5000][14]
3. Evaluate mean for first 12 attributes and median for next 2 attributes for both
attack and normal data //ma referred as mean of train attack data and mn re-
ferred as mean of train normal data.
4. Read the test data // test data is referred as t[5000][14] 15th column is the actual
label and 16thcolumn will be used for computed label
5. Compute the distance between the test data and the centroid of the attack/normal
dataset using 0.8-norm as given in Equation 2.
|𝑋𝑋| = 0.8�∑𝑛𝑛𝑘𝑘=1 |𝑎𝑎𝑎𝑎 − 𝑡𝑡𝑡𝑡|0.8 (2)
6. If the test data is closer to normal centroid and the distance between test data
and normal centroid is less than 1.5 times of the distance between the normal and
attack centroid then it is labelled as normal else an attack.
7. After labelling the test data, decision has to be made whether to replace the test
data with the training data.
8. If the new test data is attack/normal, the decision has to be made whether the new
data has to be replaced with the attack/normal training data or not. This is done
by calculating the distance between the test data and the attack/normal centroid
and the ith (counter used for replacement) row of attack data and the centroid of
the attack/normal. The distance is calculated using 0.8-norm as given in equation
2. If the new test data is closer to the centroid than the ith data, then replace the
ith data with the new one.
9. Repeat the above steps for all the test data. The algorithm is given in the next
section 3.1.
10. Calculate the TP, TN, FP, FN, sensitivity, specificity, FAR, Accuracy, detection
rate, F-Score etc.
END //end of algorithm.
In this paper, the publicly available dataset Kyoto 2006+ datasets are used for ex-
perimentation.
The reason for choosing the mean range (for quantitative attributes) and probability
function (for qualitative attributes) is because these normalization technique yields
better results in terms of time and classification rate [2 and 8]. There are 2 qualitative
attributes i. e. flag and service and all the other 12 attributes are quantitative. The
mean range normalization is applied for quantitative attributes and the above proba-
bility function is used for qualitative attributes.
The reason for choosing the above configuration was that in general, the number
of attacks will not be more than 20% of the records.
5
Naïve Bayes model was built using the same training set with 5000 attack and
5000 normal vectors. All the four test cases were re-evaluated with the model built
and the results are tabulated. In addition to above the test cases were evaluated using
Naïve Bayes (NB) 10-fold cross-validation. The cross-validation is a process of re-
peatedly carrying out the experiment 10times so that each subset is used as test setat
least once. This is used to estimate the accuracy and this has been found to be effec-
tive when there is sufficient data. The results of the NB Train &Test, NB 10-fold
cross validation and ANADA are given in Table 3and the same is depicted as graphs
in Fig 1.
Table 3.IDS Performance Comparison of ANADA with Naïve Bayes (Kyoto 2006+)
FALSE
DETECTION ALARM F-
KYOTO DATASET RATE ACCURACY RATE SCORE
NB Train &
Test 0.7229 0.9616 0.0426 0.8388
NB 10 Fold 0.7223 0.9615 0.0423 0.8380
TEST CASE - ANADA
1 0.8861 0.9773 0.0127 0.8866
NB Train &
Test 0.8499 0.9646 0.0441 0.9187
NB 10 Fold 0.8512 0.9642 0.0435 0.9175
TEST CASE - ANADA
2 0.9402 0.9750 0.0149 0.9373
NB Train &
Test 0.7244 0.9619 0.0422 0.8398
NB 10 Fold 0.7266 0.9621 0.0417 0.8404
TEST CASE - ANADA
3 0.8085 0.9727 0.0251 0.8744
NB Train &
Test 0.8484 0.9641 0.0446 0.9176
NB 10 Fold 0.8525 0.9644 0.0430 0.9178
TEST CASE - ANADA
4 0.9336 0.9666 0.0159 0.9148
Fig. 1.Performance Comparison of ANADA with NB and NB 10 Fold (Kyoto 2006+ dataset)
From the above table it can be clearly observed that DR, Accuracy of ANADA is
higher in all the cases and F-Score of ANADA is also higher in all the cases except
for test case -4 which marginally low. False Alarm Rate (FAR) is lower than NB’s
Train and Test and 10-fold cross validation in all the cases which qualifies the usabili-
ty of the algorithm.
6 CONCLUSIONS&FUTURE WORK
In this study a novel adaptive algorithm has been proposed. The proposed method
uses the labeled dataset for training but can adapt/learn itself and can detect new at-
tacks. .The performance measures of the algorithm can still be improved by combin-
ing this algorithm with feature weights. The algorithm has good potential to be paral-
lelized. The future work shall focus on parallelizing the algorithm using GPGPU pro-
cessors for achieving performance as energy efficiency has become the prime concern
for the Computer industry. Different sensors for different protocol types can be used
for performance improvements. The authors are working on improving the algorithm
and modifying it for flow based Anomaly Detection.
References
1. M¨unz, G., Li, S., & Carle, G., (2007, September). Traffic, Anomaly detection using K-
Means Clustering In GI/ITG Workshop MMBnet
2. Ihsan Z, Idris MY, Abdullah AH. Attribute Normalization Techniques and Performance of
Intrusion Classifiers: A Comparative Analysis. Life Science Journal. 2013;10(4), 2568-
2576.
3. The UCI KDD Archive: KDD Cup 1999 Data, Information and Computer
ScienceUniversity of California, Irvine,
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (1999). Accessed 2 February
2014.
4. Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D. and Nakao, K., (2011) Statistical
Analysis of Honeypot Data and Building of Kyoto 2006+ Dataset for NIDS Evaluation. Pro-
ceedings of the 1st Workshop on Building Analysis Datasets and Gathering Experience Re-
turns for Security, Salzburg, 10-13Apil 2011, 29-36. ACM 2011. http://dx.doi.org/1-
.1145/1978672.1978676.
5. MIT Lincoln Lab., Information Systems Technology Group (1998) The 1998 Intrusion de-
tection off-Line Evaluation Plan. http://www.ll.mit.edu/ideval/files/id98-eval-ll.txt
6. Ammar, A., (2015) Comparison of Feature Reduction Techniques for Binominal Classifica-
tion of Network Traffic, Journal of Data Analysis and Information Processing.
http://dx.doi.org/10.4236/jdaip.2015.32002.
7. Adrian R. Chavez, Jason Hamlet, Erik Lee, Mitchell Martin and William Stout (2015), Net-
work Randomization and Dynamic defence for Critical Infrastructure Systems, Sandia Na-
tional Laboratories, New Mexico. SAN2015-3324.
8. Wang W, Zhang X, Gombault S, Knapskog SJ. Attribute normalization in network intrusion
detection. In Pervasive systems, algorithms, and networks (ISPAN), 2009 10th international
symposium on 2009 Dec 14 (pp. 448-453). IEEE.
9. Ciza Thomas: Performance Enhancement of Intrusion Detection Systems using Advances in
Sensor Fusion (Phd Thesis. Supercomputer Education and Research Center, Indian Institute
of Science Bangalore, India 2009)
10. Gaffney Jr, John E., and Jacob W. Ulvila. "Evaluation of intrusion detectors: A decision
theory approach." In Security and Privacy, 2001. S&P 2001. Proceedings. 2001 IEEE Sym-
posium on, pp. 50-61. IEEE, 2001.
11. Laskov P, Düssel P, Schäfer C, Rieck K. Learning intrusion detection: supervised or unsu-
pervised?. InImage Analysis and Processing–ICIAP 2005 2005 Jan 1 (pp. 50-57). Springer
Berlin Heidelberg.
12. https://www.sans.org/reading-room/whitepapers/detection/intruion-detection-systems-
definition-chaallenges-343. accessed on 06-01-2016
13. S. Mukherjee and N. Sharma, "Intrusion detection using naive Bayes classifier with feature
reduction," Procedia Technology, vol. 4, pp. 119-128, 2012.
14. N. B. Amor, S. Benferhat, and Z. Elouedi, "Naive bayes vs decision trees in intrusion detec-
tion systems," in Proceedings of the 2004 ACM symposium on Applied computing, 2004,
pp. 420-424.
15. M. Jianliang, S. Haikun, and B. Ling, "The application on intrusion detection based on k-
means cluster algorithm," in Information Technology and Applications, 2009. IFITA'09. In-
ternational Forum on, 2009, pp. 150-152.
16. Randeep B., Neeaj Sharma , “ A Novel Density Based K-Means Clustering Algorithm for
Intrusion Detection” in Journal of Network Communications and Emerging Technologies,
2015 3(3), pp. 17-22.
17. Sharma S. K., Pandey P., Tiwari S. K., Sisodia M. S., “An Improved Network Intrusion De-
tection Technique based on K-means Clustering via Naïve Bayes Classification ”,Advances
in Engineering, Science and Management (ICAESM), 2012 International Conference on
[proceedings] : date, 30-31 March 2012. Piscataway, NJ: IEEE, 2012.
18. Mokarian, Asieh, Ahmad Faraahi, and Arash Ghorbannia Delavar. "False Positives Reduc-
tion Techniques in Intrusion Detection Systems-A Review."International Journal of Com-
puter Science and Network Security (IJCSNS) 13.10 (2013): 128.
19. Hussein, Safwan Mawlood, Fakariah Hani Mohd Ali, and Zolidah Kasiran. "Evaluation ef-
fectiveness of hybrid IDs using snort with naive Bayes to detect attacks." Digital Informa-
tion and Communication Technology and it's Applications (DICTAP), 2012 Second Interna-
tional Conference on. IEEE, 2012.
20. Panda, Mrutyunjaya, and Manas Ranjan Patra. "Network intrusion detection using naive
bayes." International journal of computer science and network security 7.12 (2007): 258-
263.
21. Jain M, Richariya V. An Improved Techniques Based on Naïve Bayesian for Attack Detec-
tion. International Journal of Emerging Technology and Advanced Engineering, Vol.2, Issue
1, pp.324-331(2012).
22. Muda, Zaiton, Warusia Yassin, M. N. Sulaiman, and Nur Izura Udzir. "A K-Means and
Naive Bayes learning approach for better intrusion detection."Information technology jour-
nal 10, no. 3 (2011): 648-655.