Performance Comparison and Implementation of Bayesian Variants For Network Intrusion Detection
Performance Comparison and Implementation of Bayesian Variants For Network Intrusion Detection
Performance Comparison and Implementation of Bayesian Variants For Network Intrusion Detection
Abstract—Bayesian classifiers perform well when each of the Naïve Bayesian algorithms are some of the most important
features is completely independent of the other which is not always classifiers used for prediction. Bayesian classifiers are based on
valid in real world application. The aim of this study is to probability and with general assumption that all features are
implement and compare the performances of each variant of independent of each other which doesn’t usually hold in real
Bayesian classifier (Multinomial, Bernoulli, and Gaussian) on
world, these assumptions account for why Naïve Bayes
anomaly detection in network intrusion, and to investigate
whether there is any association between each variant’s algorithm performed poorly on certain classification, the
assumption and their performance. Our investigation showed that assumption is in addition to individual assumption as each
each variant of Bayesian algorithm blindly follows its assumption variant of Naïve Bayes classifiers of.
regardless of feature property, and that the assumption is the
single most important factor that influences their accuracy. i. Multinomial Naïve Bayes
Experimental results show that Bernoulli has accuracy of 69.9% ii. Bernoulli Naïve Bayes
test (71% train), Multinomial has accuracy of 31.2% test (31.2% iii. Gaussian Naïve Bayes
train), while Gaussian has accuracy of 81.69% test (82.84% train).
Going deeper, we investigated and found that each Naïve Bayes
Has different assumptions which impacts their efficiency and
variants performances and accuracy is largely due to each
classifier assumption, Gaussian classifier performed best on accuracy on certain tasks. Virtually all existing comparison and
anomaly detection due to its assumption that features follow evaluation of Bayesian classifier with other algorithm are
normal distributions which are continuous, while multinomial without acknowledgement of the fact that each variants works
classifier have a dismal performance as it simply assumes discreet based on different assumption which affects their efficiency and
and multinomial distribution. accuracy depending on the type of classification. Since, each
Bayesian variant performed differently, it sounds interesting to
Keywords—anomaly detection, multinomial bayes, Bernoulli understand how each algorithm performed on intrusion dataset,
bayes, gaussian bayes, Bayesian classifier, intrusion detection and to understand the reason why some intrusions are not being
I. INTRODUCTION detected by model until system compromise when the wrong
Bayesian variants was being adopted.
Security is indispensable and very crucial in modern
information technology framework [2], [4], [5], [16], [18] and The main contribution of our research is stated below.
so, as we had gotten to grapple with the fact that there is no
perfect system, no matter how sophisticated or state of the art a
• We showed that Gaussian Naïve Bayes algorithm
system could be, it can be attacked and compromised. With
performed best among all the three variants of Bayesian
hackers constantly coming up with ever changing innovative
algorithm on anomalous detection in network intrusion
and highly sophisticated ways to compromise system, focus had
in terms of efficiency and accuracy on KDD dataset,
shifted to making state of the art system extremely complicated
followed by Bernoulli with 69.9% test accuracy while
and tedious to be compromised since there cannot be a perfect
Multinomial have abysmal performance with 31.2%
system. Before any system could be compromised, there must
accuracy.
be an intrusion for any damage to occur [1], [9], [12], [15]. It is
one thing for a system to be intruded, it is another thing for the
• Our investigation also shows that each Bayesian
intrusion to be immediately detected and dealt with before any
algorithm works based on its assumption regardless of
compromise is made. An intrusion that lasted for about fifteen
data. Gaussian Naïve Bayes performed better on
(15) milliseconds before being dealt with by a combination of
anomaly detection in network intrusion because of its
machine learning (to accurately detect the actual intrusion) and
assumption that features follow normal distributions
game theory (changing parameters and configurations to
which are continuous. So, we are sure that the
prevent further attack) approach gives an insight of a system
algorithm factored in all target categories.
perfection.
II. RELATED WORK distributed according to Gaussian distribution [7],[8]. Hence,
the probability of individual features is assumed to be.
a. MULTINOMIAL NAÏVE BAYES
P(X1 | Y) = 1√2πϭ2y exp(-((xi - uy)2 )/ 2ϭ2y) (4)
In multinomial Naïve Bayes, features are assumed to be from a
multinomial distribution [3], [6], [17] which gives the For which all parameters are independent of each other. One of
probability of observing counts from a few categories [13], the simplest ways to approach this is to assume that the data has
[14], this makes it very effective for prediction when the Gaussian distribution without any co-variance. All we have to
feature(s) is discrete and not continuous. do is to find the mean and standard deviation between each label
The likelihood of given ₖ is the product of terms' to form our model. With perfect knowledge that our variable is
probabilities ₖᵢ in the statistical degree of ₖᵢ, thereby rejecting normally and continuously distributed from -∞< < +∞. The
null hypothesis: total area under the model curve is 1.
The probability of document d being in class c being computed
as;
P(c) and P(tk|c) are being estimated from the training set as we
will see in a moment. Since conditional probabilities are being
multiplied together, one for each position 1<k<nd. This can
result in an underflow of floating points. And so, it is better for
us to do the computation by adding logarithms of probabilities
instead of multiplying the actual probabilities. Hence, the class
with the higher logarithmic probability will still be the most
probable. The class with the highest log probability score is still
the most probable which is;
Figure 1 Probability under Gaussian Distribution
log(xy) = log(x) + log (y). Hence,
b. BERNOULLI NAÏVE BAYES
Cmap=argmaxcec[logp(c) + Σ1<k<nd logP(tk|c)] (3)
Bernoulli Naïve Bayes [5], [10], [11] assume data is discrete
, is the actual maximization that is being done in the
and the distribution is of Bernoulli mode. Its main feature is the
implementation of Naïve Bayes which is the maximization of
acceptance of binary values such as yes or no, true of false, 0 or
the log to get the correct class of the input. Multinomial Naïve
1, success or failure as input. Its assumption of discrete and
Bayes is good for training a model when we have discrete
Bernoulli distribution as
variable and when the distribution is multinomial in nature. The
assumption that the distribution is multinomial coupled with
additional assumption of independence among the features P(x) = P[X=x] = ( =1− =0 =0) (5)
makes multinomial Naïve Bayes a drawback when the two
assumptions are not valid in test or train data. Where x can be either 0 or I but nothing else makes it suitable
for binary classification as its classification rule is according to
c. GAUSSIAN NAÏVE BAYES
P (xi | y) = p(I | y)xi + (1 – p( i | y ))(1 – xi) (6)
As a typical Bayesian classifier which assumed the value of
each feature to be completely independent of the other, it Its computation is based on binary occurrence information
assumed that each continuous value in a continuous data is (Figure 2), and so neglects the number of occurrences or
frequency, this makes Bernoulli unsuitable for certain tasks
such as document classification.