A Novel Algorithm For Network Anomaly Detection Using Adaptive Machine Learning

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/317185394
A Novel Algorithm for Network Anomaly Detection Using Adaptive Machine

Learning
Conference Paper · December 2016
CITATIONS READS
4 329
2 authors:
Ashok Kumar D Dr Venugopalan S R

Government Arts and Science College, Kumulur - 639120 16 PUBLICATIONS 60 CITATIONS
67 PUBLICATIONS 579 CITATIONS
SEE PROFILE
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Network Anomaly Detection System View project
Performance Analysis of MLPFF Neural Network Back propagation Training Algorithms for Time Series Data View project
All content following this page was uploaded by Dr Venugopalan S R on 15 December 2017.
The user has requested enhancement of the downloaded file.

A Novel algorithm for Network Anomaly Detection using
Adaptive Machine Learning
D. Ashok Kumar1, S. R. Venugopalan2
1
Department of Computer Science, Govt. Arts College, Thiruchirapalli, Tamilnadu, India
2
Aeronautical Development Agency (Ministry of Defence, GoI), Bangalore - 560017, India.
1 akudaiyar@yahoo.com, 2 venu_srv@yahoo.com
Abstract. Threats in the Internet are posing high risk on Security of Informa-
tion and network anomaly detection has become an important issue/area in In-
formation Security. Data mining algorithms are used to find patters and charac-
teristic rules in huge data and this is very much used in Network Anomaly De-
tection System (NADS). Network traffic has several attributes of qualitative and
quantitative nature, which needs to treated/normalized differently. In general, a
model is built with the existing data and the system is trained with the model
and then used to detect intrusions. The major and important issue with such
NADS is that the network traffic changes over time, in such cases the system
should get trained automatically or retrained. This paper presents an adaptive
algorithm that gets trained according to the network traffic. The presented algo-
rithm is tested with Kyoto University’s 2006+ Benchmark dataset. It can be
observed that the results of the proposed algorithm outperform all the
know/commonly used classifiers and are very much suitable for Network Ano-
maly Detection.
Keywords: Intrusion, anomaly, network traffic, normalization, performance

metrics, adaptive algorithm, KYOTO 2006+, Naïve Bayes classification
1 INTRODUCTION
Internet has brought huge potential for business and on the other hand it poses lots of
risk for the business. Internet is a global public network [12]. Intrusion is a delibe-
rate, unauthorized, illegal attempt to access, manipulate or taking possession of In-
formation System to render them unreliable or unusable. Intrusion Detection is the
process of identifying various events occurring in a system/network and analyzing
them for possible presence of Intrusion. Intrusion Detection Systems (IDS) can be
classified into three types based on the method on which intrusion are detected name-
ly Signature-Based, Anomaly Based and Hybrid. Statistical methods and clustering
are used for Anomaly detection Systems [12]. The availability of higher bandwidth
and sophisticated hardware and software, the need to detect intrusions in real-time and
the adaptation of the detection algorithm to the ever changing traffic pattern is a big
challenge. IDS should adapt to the traffic behaviors and learn automatically. In this
paper, an algorithm is proposed for network anomaly detection. The results i.e. Per-
formance metrics of the experiment are encouraging. The proposed algorithm can
detect new/unknown attacks and can learn and adapt automatically based on the net-
work traffic.
The organization of the paper is as follows: Section 2 gives the background and the
literature surrounding IDS with necessary performance metrics. The problem descrip-
tion and the algorithm development are discussed in Section 3. In section 4 the data-
set used in this study, data pre-processing, data normalization used in this study and
the training & test dataset generation are discussed. The experiment and the results
are discussed in section 5.Conclusions and future work in given in section 6.
2 BACKGROUND AND RELATED WORK
Panda, M. et al proposed Naïve Bayes for Network Intrusion Detection and found that
the performance of Naïve Bayes is better in term of False Positive rate, cost and
Computational time for KDD ’99 datasets and same was compared with back propa-
gation neural networks approach [20]. Jain et al in their work have combined Infor-
mation Gain with Naïve Bayes for improving the attack detection and have observed
higher detection rate and reduced false alarm [21]. Muda. Z. et al in their work have
used K-means to cluster the data and used Naïve Bayes classifier to classify the KDD
Cup99[3] data and have achieved better performance than Naïve Bayes classifier [22].
They have achieved 99.7% accuracy, a detection rate of 99.8% and 0.5 false alarm
rate.
FVBRM model is proposed by the authors of [13] for feature selection and com-
pared it with other selection methods by reducing the features of the dataset and then
classifying with Naive Bayes classifier. There is no mention about how the qualita-
tive and quantitative attributes are treated. The authors of [14] have compared the
results of Naïve Bayes algorithm with decision tree and concluded that from the per-
formance point of view Naïve Bayes provides competitive results for KDD 99[5]
dataset. K-means clustering algorithm was applied for Intrusion Detection and con-
cluded that k-means method id very efficient in partitioning huge dataset and has
better global search ability [15, 1]. K-means Clustering is a good unsupervised algo-
rithm used to find out structured patterns in the data but the computational complexity
is high for its application in intrusion detection. A Novel Density Based K-Means
Cluster was proposed for signature based intrusion detection [16] where results show
improved accuracy and detection rate with reduced false positive rate. It not very
clear that which normalization technique was used and how the discrete and conti-
nuous data was treated. Sharma et al. [17] proposed K-Means clustering via Naïve
Bayes for KDD Cup ’99 dataset. This approach outperforms the Naïve Bayes in terms
of detection rate and higher false positives which is a concern
SM Hussein et al. in their work compared the performance of Naïve Bayes, Bayes
Net and J48graft and recorded that Naïve Bayes performs better in terms of rate of
detection and time to build model whereas J48 was better in terms of false alarm rate
[19]. Earlier works which were reviewed in this section tried in achieving higher per-
formance with the help of pre-processing/feature reduction and have achieved per-
formance improvements. The study of the existing literature reveals the need for a
novel algorithm to detect unknown attacks because they have not considered the fol-
lowing points. a) Ever changing network traffic/speed, new attacks and the need for
the algorithm to adapt itself and learn/get trained automatically from the changing
traffic. b) The ability of the algorithms/methods described in literature to perform well
for datasets other than the tested ones. The algorithms were tested with the only one
dataset. c) Either attack or normal data is used for training and not both
d) Network traffic data contains features that are qualitative or quantitative nature and
has to be treated differently and have to use different pre-processing/normalization
technique and e) Earlier works have measured accuracy, detection rate and false alarm
rate only as a performance measure which may not be sufficient, measure such as F-
Score, sensitivity are required for evaluation an algorithm/method.
2.1 Metrics for Intrusion Detection Performance

The choice of NADS for a particular environment is a general problem, represented
precisely as intrusion detection system’s evaluation [9]. For an Anomaly Detection
system False Alarm Rate (FAR) and the Detection Rate (DR) are basic factors and
their trade-off can be analyzed with Receiver Operating Characteristic (ROC) curve.
The above mentioned basic factors FAR and DR is not sufficient to evaluate the per-
formance of IDS [10]. So the evaluation of IDS should take into account the environ-
ment where the IDS is being deployed, its maintenance costs, operating environments,
likelihood of attacks, cost towards false alarm and missed detections etc., [9]. The
following section explains the Performance metrics which needs to be considered
while deploying/deciding on IDS/Anomaly Detection System and these measures are
used for evaluation of the algorithm proposed. Attacks that are detected correctly as
attacks are referred as True Positives (TP) and Normal connections detected as nor-
mal connections are True Negatives (TN). The following table 1 is the general confu-
sion matrix used in Intrusion detection Evaluation. The values in the matrix represent
the performance of the prediction algorithm. TP rate determines the security require-
ment and the number of FP’s determines the usability of the IDS. There is always a
trade-off between the two metrics precision and recall. For an IDS to be effective, the
FP & FN rates should be minimized and accuracy, TP & TN rates to be maximized
[18].
Table 1. Confusion Matrix.
Confusion Ma- Predicted Value

tr ix Attack Normal
True Positives (TP) False Negatives (FN)
Actual Attack
Value Normal False Positives (FP) True Negatives( TN)
The Following table 2 gives the details about the various performance measures for
the evaluation of IDS.
Table 2.Performance Measures used to evaluate IDS.
S. Performance Metric Description Formula

No.
1. Detection Rate/ Proportion of the predicted TP
Positive Prediction positives which are actual posi- (TP+FP)
Value / Precision tive (or) Fraction of test data
detected as attack which is ac-
tually an attack.
2. Accuracy Measure to test the overall (TP+TN)
Accuracy. It can be delineated as (TP+FP+FN+TN)
the percentage of correct predic-
tion among the whole dataset.
3. False Alarm Rate False positive rate (FPR) also FP
known as false alarm rate (FAR), (FP+TN)
refers to the proportion of normal
packets being falsely detected as
malicious.
4. Sensitivity/ True The fraction of attack class TP
Positive Rate/ Recall which is correctly detected (or) (TP+FN)
proportion of actual positives
which are predicted as positives.
F-Score
The harmonic mean between precision and recall is called as F-Score/F-measure. F-
Score is considered as a measure of the accuracy of a test. Good IDS performance is
achieved by improving both precision and recall. Both precision and recall are consi-
dered for computing F-Score. An F-Score of 1 is considered as best and 0 as worst.
2∗𝑃𝑃∗𝑅𝑅
𝐹𝐹 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑃𝑃+𝑅𝑅
(1)
3 Problem Description and Approach/Algorithm development
Supervised algorithms significantly outperform unsupervised algorithms in detect-

ing known attacks. For those problems where the test data is drawn from different
distributions, semi-supervised learning methods offer a promising future [11].The
dramatic increase in the speed of the networks has made the existing policies and
network anomaly intrusions detection systems vulnerable to intrusion than ever be-
fore. Thus making the existing IDS useless unless they adapt to the new trends i.e.
adapt to the ever changing network traffic and learn automatically. ADAPTIVE
NETWORK ANOMALY DETECTION ALGORITHM (ANADA) proposed in this
study uses labeled dataset for initial learning and adapts itself to the changing traffic
patterns. The proposed ANADA algorithm used simple statistical measures such as
mean, median and norm (distance measure).This algorithm uses normalized data i.e.
the normalization of training data is described in the data preprocessing section. The
Uniqueness of the algorithm is given below
• The algorithm uses both attack and normal data for training
• The algorithm adapts itself the new traffic by modifying the training dataset with
the test dataset
• At each test instance the algorithm decides whether the test data is worth being
included/replaced with an instance of training data.
• The algorithm is very simple and can be easily parallelized for performance im-
provements.
• This algorithm uses a new distance measure i.e. 0.8 norm (given in equation 2)
3.1 Adaptive Network Anomaly Detection Algorithm (ANADA)

Input: Training Dataset & Testing Dataset a- attack training dataset n-normal
training dataset t-testing dataset
Output: Anomaly Detection Performance metrics such as Detection rate, FAR,
Sensitivity, F-Score etc.
Generate Initial Population/training Dataset that has equal number (5000) of at-
tacks and normal traffic features.
Training Phase: The training dataset is grouped based on the label as attack and
normal sessions. 5000 attack records and 5000 normal records are used for training.
Find the centroid of the attack class and normal class. For numerical attributes the
mean (or) average is calculated and for the categorical attributes median is calculated.
The centroid will be a set of values.
Testing Phase: For each record in the testing data, the following steps are followed.
BEGIN
1. Initialize the
2. Read the attack and normal traffic data. // attack data is referred as a[5000][14]
and normal data as n[5000][14]
3. Evaluate mean for first 12 attributes and median for next 2 attributes for both
attack and normal data //ma referred as mean of train attack data and mn re-
ferred as mean of train normal data.
4. Read the test data // test data is referred as t[5000][14] 15th column is the actual
label and 16thcolumn will be used for computed label
5. Compute the distance between the test data and the centroid of the attack/normal
dataset using 0.8-norm as given in Equation 2.
|𝑋𝑋| = 0.8�∑𝑛𝑛𝑘𝑘=1 |𝑎𝑎𝑎𝑎 − 𝑡𝑡𝑡𝑡|0.8 (2)
6. If the test data is closer to normal centroid and the distance between test data
and normal centroid is less than 1.5 times of the distance between the normal and
attack centroid then it is labelled as normal else an attack.
7. After labelling the test data, decision has to be made whether to replace the test
data with the training data.
8. If the new test data is attack/normal, the decision has to be made whether the new
data has to be replaced with the attack/normal training data or not. This is done
by calculating the distance between the test data and the attack/normal centroid
and the ith (counter used for replacement) row of attack data and the centroid of
the attack/normal. The distance is calculated using 0.8-norm as given in equation
2. If the new test data is closer to the centroid than the ith data, then replace the
ith data with the new one.
9. Repeat the above steps for all the test data. The algorithm is given in the next
section 3.1.
10. Calculate the TP, TN, FP, FN, sensitivity, specificity, FAR, Accuracy, detection
rate, F-Score etc.
END //end of algorithm.
4 DATASETS FOR EXPERIMENTATION
In this paper, the publicly available dataset Kyoto 2006+ datasets are used for ex-
perimentation.
4.1 KYOTO2006+ DATASET

Kyoto 2006+ [4] dataset is a Network Intrusion Evaluation/Detection dataset which
was collected from various honeypots from Nov ‘2006 to Aug ‘2009. Real network
traffic traces were captured in this dataset. This data has 24 statistical features which
includes fourteen features which were there in KDDCUP ’99 Dataset and additional
ten features for effective investigation. This study uses 31st Aug 2009 data and has
used the first 14 features (conventional features) and the label which indicates wheth-
er the record is an attack or normal. As the study does not distinguish between the
known and unknown attack, both are represented as attack only. The unknown attacks
in this dataset are very minimal and that is also another reason for not distinguishing
known and unknown attack.
4.2 Data Pre-Processing

Raw data needs to be pre-processed before fed into any learning model and the
most used technique is normalization [6]. Network traffic data contains features that
are qualitative or quantitative nature and has to be treated differently. The values of
attributes with high values can dominate the results than the attributes with lower
values [2]. The dominance can be reduced by the process of normalization i.e. scaling
the values within certain range. The quantitative attributes can be normalized by
various techniques such as 1) Mean-range normalization 2) Frequency Normalization
3) Maximize Normalization 4) Rational Normalization 5) ordinal Normalization 6)
Softmax Scaling [7] and 7) Statistical Normalization, whereas applying the above
normalization techniques for qualitative data will not be meaningful. For Qualitative
data the general approach is to replace the values with numerical values. Though this
seems simpler, it does not consider the semantics of the qualitative attributes. In this
study the following probability function is used for normalizing the qualitative data.
fx (x) = Pr (X = x) = Pr({ s ∈ S: X(s) = x})[2] (3)

Based on the above equation the qualitative data are converted to quantitative datain
the range of [0-1]. In this study, for quantitative attributes Mean-range normalization
is used.
(𝑣𝑣𝑣𝑣)−min (𝑣𝑣𝑣𝑣)
𝑋𝑋𝑋𝑋 = max (𝑣𝑣𝑣𝑣)−min (𝑣𝑣𝑣𝑣) [8] (4)
The reason for choosing the mean range (for quantitative attributes) and probability
function (for qualitative attributes) is because these normalization technique yields
better results in terms of time and classification rate [2 and 8]. There are 2 qualitative
attributes i. e. flag and service and all the other 12 attributes are quantitative. The
mean range normalization is applied for quantitative attributes and the above proba-
bility function is used for qualitative attributes.
4.3 Dataset generation for training and testing

The framework used in the study uses both normal and malicious (attack) data for
training. In general the system is trained using either normal data or attack data. This
is one of the unique characteristic of the algorithm which makes it suitable for adap-
tive learning i.e. the system is automatically trained based on the testing/network
traffic data. The data pertaining to date 31-08-2009 of Kyoto 2006+ dataset is used for
this study and this dataset has 134665 records, out of which 44257(32.9%) are normal
and the 90408(67.1%) are attack data records. There were lot of duplicate records
(42.2%) which were removed before the experimentation. From the above statistical
information, it can be observed that the attack data dominates the data set which is
not a general case and there are lot of duplicates.
In this study, the procedure was devised in selecting the testing/training data in
such a way that the above observations do not dominate the detection procedure and
this can be used for all the datasets. In this study the training dataset consists of 5000
attack and 5000 normal records. Four sets of testing records were generated in the
following manner for Kyoto 2006+ dataset. These records were chosen in random
using SPSS Statistics V20 after removing the duplicates.
Dataset1 (Test Case -1) consists of 10000 records of which 10% are attack and the
rest 90% are normal records.
The reason for choosing the above configuration was that in general, the number
of attacks will not be more than 20% of the records.
5 EXPERIMENTAL RESULTS& DISCUSSIONS
The Adaptive Network Anomaly detection Algorithm (ANADA) described earlier in

this study is implemented using MATLAB Version 7.12.0.635 (R2011a). The expe-
riments were carried out on a system with Intel Core i3 2.53 Ghz CPU and 4GB
memory running Window 8 Professional 64-bit Operating System. Microsoft Office
Professional Plus 2010 &SPSS Statistics V20were used for data pre-processing.
Kyoto 2006 Dataset is pre-processed as given above and the training data was fed
to the algorithm for learning. There are 4 four test cases namely test-case1, test-case2
etc. The test-cases are fed one by one and the results are recorded. The results are
given in Table 3 and Figure 1.Table 3 clearly depicts the various Anomaly Detection
evaluation performance measures of ANADA algorithm for Kyoto 2006+ dataset. The
results needs to be compared with the other techniques. Naïve Bayes classification
was used because of the reason that it is a simple classification scheme, provides bet-
ter results in terms of detection rate & FAR. Naïve Bayes is a supervised algorithm
based on Bayes’ Theorem with the ‘Naïve’ assumption that the features are strongly
independent and mathematically this is given in Equation 5.
5
Naïve Bayes model was built using the same training set with 5000 attack and
5000 normal vectors. All the four test cases were re-evaluated with the model built
and the results are tabulated. In addition to above the test cases were evaluated using
Naïve Bayes (NB) 10-fold cross-validation. The cross-validation is a process of re-
peatedly carrying out the experiment 10times so that each subset is used as test setat
least once. This is used to estimate the accuracy and this has been found to be effec-
tive when there is sufficient data. The results of the NB Train &Test, NB 10-fold
cross validation and ANADA are given in Table 3and the same is depicted as graphs
in Fig 1.
Table 3.IDS Performance Comparison of ANADA with Naïve Bayes (Kyoto 2006+)
FALSE
DETECTION ALARM F-
KYOTO DATASET RATE ACCURACY RATE SCORE
NB Train &
Test 0.7229 0.9616 0.0426 0.8388
NB 10 Fold 0.7223 0.9615 0.0423 0.8380
TEST CASE - ANADA
1 0.8861 0.9773 0.0127 0.8866
NB Train &
Test 0.8499 0.9646 0.0441 0.9187
NB 10 Fold 0.8512 0.9642 0.0435 0.9175
TEST CASE - ANADA
2 0.9402 0.9750 0.0149 0.9373
NB Train &
Test 0.7244 0.9619 0.0422 0.8398
NB 10 Fold 0.7266 0.9621 0.0417 0.8404
TEST CASE - ANADA
3 0.8085 0.9727 0.0251 0.8744
NB Train &
Test 0.8484 0.9641 0.0446 0.9176
NB 10 Fold 0.8525 0.9644 0.0430 0.9178
TEST CASE - ANADA
4 0.9336 0.9666 0.0159 0.9148
Fig. 1.Performance Comparison of ANADA with NB and NB 10 Fold (Kyoto 2006+ dataset)
From the above table it can be clearly observed that DR, Accuracy of ANADA is
higher in all the cases and F-Score of ANADA is also higher in all the cases except
for test case -4 which marginally low. False Alarm Rate (FAR) is lower than NB’s
Train and Test and 10-fold cross validation in all the cases which qualifies the usabili-
ty of the algorithm.
6 CONCLUSIONS&FUTURE WORK
In this study a novel adaptive algorithm has been proposed. The proposed method
uses the labeled dataset for training but can adapt/learn itself and can detect new at-
tacks. .The performance measures of the algorithm can still be improved by combin-
ing this algorithm with feature weights. The algorithm has good potential to be paral-
lelized. The future work shall focus on parallelizing the algorithm using GPGPU pro-
cessors for achieving performance as energy efficiency has become the prime concern
for the Computer industry. Different sensors for different protocol types can be used
for performance improvements. The authors are working on improving the algorithm
and modifying it for flow based Anomaly Detection.
References
1. M¨unz, G., Li, S., & Carle, G., (2007, September). Traffic, Anomaly detection using K-
Means Clustering In GI/ITG Workshop MMBnet
2. Ihsan Z, Idris MY, Abdullah AH. Attribute Normalization Techniques and Performance of
Intrusion Classifiers: A Comparative Analysis. Life Science Journal. 2013;10(4), 2568-
2576.
3. The UCI KDD Archive: KDD Cup 1999 Data, Information and Computer
ScienceUniversity of California, Irvine,
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (1999). Accessed 2 February
2014.
4. Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D. and Nakao, K., (2011) Statistical
Analysis of Honeypot Data and Building of Kyoto 2006+ Dataset for NIDS Evaluation. Pro-
ceedings of the 1st Workshop on Building Analysis Datasets and Gathering Experience Re-
turns for Security, Salzburg, 10-13Apil 2011, 29-36. ACM 2011. http://dx.doi.org/1-
.1145/1978672.1978676.
5. MIT Lincoln Lab., Information Systems Technology Group (1998) The 1998 Intrusion de-
tection off-Line Evaluation Plan. http://www.ll.mit.edu/ideval/files/id98-eval-ll.txt
6. Ammar, A., (2015) Comparison of Feature Reduction Techniques for Binominal Classifica-
tion of Network Traffic, Journal of Data Analysis and Information Processing.
http://dx.doi.org/10.4236/jdaip.2015.32002.
7. Adrian R. Chavez, Jason Hamlet, Erik Lee, Mitchell Martin and William Stout (2015), Net-
work Randomization and Dynamic defence for Critical Infrastructure Systems, Sandia Na-
tional Laboratories, New Mexico. SAN2015-3324.
8. Wang W, Zhang X, Gombault S, Knapskog SJ. Attribute normalization in network intrusion
detection. In Pervasive systems, algorithms, and networks (ISPAN), 2009 10th international
symposium on 2009 Dec 14 (pp. 448-453). IEEE.
9. Ciza Thomas: Performance Enhancement of Intrusion Detection Systems using Advances in
Sensor Fusion (Phd Thesis. Supercomputer Education and Research Center, Indian Institute
of Science Bangalore, India 2009)
10. Gaffney Jr, John E., and Jacob W. Ulvila. "Evaluation of intrusion detectors: A decision
theory approach." In Security and Privacy, 2001. S&P 2001. Proceedings. 2001 IEEE Sym-
posium on, pp. 50-61. IEEE, 2001.
11. Laskov P, Düssel P, Schäfer C, Rieck K. Learning intrusion detection: supervised or unsu-
pervised?. InImage Analysis and Processing–ICIAP 2005 2005 Jan 1 (pp. 50-57). Springer
Berlin Heidelberg.
12. https://www.sans.org/reading-room/whitepapers/detection/intruion-detection-systems-
definition-chaallenges-343. accessed on 06-01-2016
13. S. Mukherjee and N. Sharma, "Intrusion detection using naive Bayes classifier with feature
reduction," Procedia Technology, vol. 4, pp. 119-128, 2012.
14. N. B. Amor, S. Benferhat, and Z. Elouedi, "Naive bayes vs decision trees in intrusion detec-
tion systems," in Proceedings of the 2004 ACM symposium on Applied computing, 2004,
pp. 420-424.
15. M. Jianliang, S. Haikun, and B. Ling, "The application on intrusion detection based on k-
means cluster algorithm," in Information Technology and Applications, 2009. IFITA'09. In-
ternational Forum on, 2009, pp. 150-152.
16. Randeep B., Neeaj Sharma , “ A Novel Density Based K-Means Clustering Algorithm for
Intrusion Detection” in Journal of Network Communications and Emerging Technologies,
2015 3(3), pp. 17-22.
17. Sharma S. K., Pandey P., Tiwari S. K., Sisodia M. S., “An Improved Network Intrusion De-
tection Technique based on K-means Clustering via Naïve Bayes Classification ”,Advances
in Engineering, Science and Management (ICAESM), 2012 International Conference on
[proceedings] : date, 30-31 March 2012. Piscataway, NJ: IEEE, 2012.
18. Mokarian, Asieh, Ahmad Faraahi, and Arash Ghorbannia Delavar. "False Positives Reduc-
tion Techniques in Intrusion Detection Systems-A Review."International Journal of Com-
puter Science and Network Security (IJCSNS) 13.10 (2013): 128.
19. Hussein, Safwan Mawlood, Fakariah Hani Mohd Ali, and Zolidah Kasiran. "Evaluation ef-
fectiveness of hybrid IDs using snort with naive Bayes to detect attacks." Digital Informa-
tion and Communication Technology and it's Applications (DICTAP), 2012 Second Interna-
tional Conference on. IEEE, 2012.
20. Panda, Mrutyunjaya, and Manas Ranjan Patra. "Network intrusion detection using naive
bayes." International journal of computer science and network security 7.12 (2007): 258-
263.
21. Jain M, Richariya V. An Improved Techniques Based on Naïve Bayesian for Attack Detec-
tion. International Journal of Emerging Technology and Advanced Engineering, Vol.2, Issue
1, pp.324-331(2012).
22. Muda, Zaiton, Warusia Yassin, M. N. Sulaiman, and Nur Izura Udzir. "A K-Means and
Naive Bayes learning approach for better intrusion detection."Information technology jour-
nal 10, no. 3 (2011): 648-655.
View publication stats

A Novel Algorithm For Network Anomaly Detection Using Adaptive Machine Learning

Uploaded by

Copyright:

Available Formats

A Novel Algorithm For Network Anomaly Detection Using Adaptive Machine Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Novel Algorithm For Network Anomaly Detection Using Adaptive Machine Learning

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A Novel Algorithm for Network Anomaly Detection Using Adaptive Machine

Conference Paper · December 2016

Ashok Kumar D Dr Venugopalan S R

Network Anomaly Detection System View project

The user has requested enhancement of the downloaded file.

D. Ashok Kumar1, S. R. Venugopalan2

Keywords: Intrusion, anomaly, network traffic, normalization, performance

2 BACKGROUND AND RELATED WORK

2.1 Metrics for Intrusion Detection Performance

Table 1. Confusion Matrix.

Confusion Ma- Predicted Value

Table 2.Performance Measures used to evaluate IDS.

S. Performance Metric Description Formula

3 Problem Description and Approach/Algorithm development

Supervised algorithms significantly outperform unsupervised algorithms in detect-

3.1 Adaptive Network Anomaly Detection Algorithm (ANADA)

4 DATASETS FOR EXPERIMENTATION

4.1 KYOTO2006+ DATASET

4.2 Data Pre-Processing

fx (x) = Pr (X = x) = Pr({ s ∈ S: X(s) = x})[2] (3)

4.3 Dataset generation for training and testing

5 EXPERIMENTAL RESULTS& DISCUSSIONS

The Adaptive Network Anomaly detection Algorithm (ANADA) described earlier in

View publication stats

You might also like