Analysis and Design For Intrusion Detection System IDS Using Data Mining IEEE 2010
Analysis and Design For Intrusion Detection System IDS Using Data Mining IEEE 2010
Host Sensor
Alarm System
Alarm
Manager
Intruder Tracing
System Protection
Strategy
Archive Information
Alarm Strategy
Network Sensor Manager
Host Sensor Manager
Pattern
Mining
Mining Algorithm
Library
Misuse
Detector
Sensor-1 Sensor-2 Sensor-m
Analysis Engine
Alarm
Message
Alarm
Message
Anomaly
Detector
Sensor-1 Sensor-2 Sensor-n
340
C. Analysis Engine
Analysis Engine consists of three parts: Network/Host
Sensor Manager, Misuse and Anomaly Detector, Mining
Algorithm Library and Pattern Mining.
1) Sensor Manager receives data from sensors, then
analyse the data, translate them into the form of database
records, and store them into the data warehouse.
2) Misuse and Anomaly Detection detects intrusions based
on the matching patterns stored in the data warehouse.
Traditional IDS is divided into two separate types: misuse
detection and anomaly detection. Anomaly detection is known
as behavior-based detection, which sets up the behavioral
models for users under normal circumstances in the learning
phase, then compares the current user behavior with the
existing behavioral models, and founds an intrusion if the
deviation is greater than the threshold of the credibility. The
basic principle is that intrusion comes out if any behavior is not
consistent with the known behaviors.
Misuse detection is also called knowledge-based intrusion
detection, which sets up intrusion patterns for the known
intrusions, then matches the current user behaviors and system
status with the existing intrusion behavior patterns. The basic
principle is that intrusion comes out if any behavior is
consistent with the known behaviors.
We integrate these two models into the hybrid IDS, thus
format new basic principles of intrusion detection: any
behavior is a normal behavior if it is consistent with normal
behavior model, any behavior is a intrusion behavior if it is
consistent with anomaly behavior model, and others are added
to the detection models in data warehouse by the Pattern
Mining module based on Mining Algorithm Library to generate
a new detection model. While comparing an unknown behavior
with normal/anomaly behavior model, the detectors determine
a normal/anomaly behavior by comparing support and
confidence level of calculated results with a given minimum
support and confidence level.
3) Mining Algorithm Library and Pattern Mining for
mining unknown intrusions.
Point of view from the data warehouse, data mining can be
regarded as an advanced stage of online analytical processing
(OLAP). We apply data mining technology to IDS, use its
algorithms of association analysis and sequential pattern
analysis to extract safety-related characteristic properties,
generate classification models based on them, and identify
automatically security incidents. The analytical methods of
data mining can be divided into three parts:
a) Association analysis
Its purpose is to uncover hidden relationships among the
data. Based on correlation among a set of items, you can use
the association analysis to identify the correlation between
intrusion behaviors.
Here are the basic algorithms of association analysis:
Set I=(i
1
, i
2
, ..., i
m
) is a collection of binary words in which
the elements are referred to as item. Assume D as a collection
of transaction T, which is a collection of items, and TI.
Assume X is a collection of items in I, if XT, therefore
transaction T contains X.
An associational rule is an implication form like XY,
where XI, YI, and XY=. The support of rule XY in
the transaction D is the ratio of the number of transactions
contained X and Y in a transaction set to the number of all
transactions, denoted by Support (XY), that is:
Support(XY)=|{T: XYT, TD}|D|
The confidence level of rule XY in the transaction D is
the ratio of the number of transactions contained X and Y in a
transaction set to the number of transactions contained X,
denoted by Confidence (XY), that is:
Confidence(XY)=|{T: XYT, TD}||{T: XT, TD}|
Given a transaction set D, the tasks of association analysis
are to create the associational rules that support and confidence
level from mining data are respectively greater than the
minimum support (minsupp) and the minimum confidence
(minconf) given by the users.
Agrawal and et al in 1993, designed a basic algorithm
(Apriori). In recent years, the algorithm has been made
considerable progress. The project applied the latest algorithms
for pattern mining.
b) Sequence pattern analysis
Similar to the association analysis, its purpose is to uncover
relationships among the data. But its focus is on analysis of
context among the data. Many behaviors of hacker intrusions
have context, and some actions must occur after others. For
example: a hacker generally scans the system port before attack.
c) Classification analysis
Assume record collection and a set of tags, where tag is a
group of categories with different characteristics. We give a tag
for each record, that is, to classify records by tags. Then we
check the tagged records, and describe their characteristics. For
example, the intrusions are divided into three categories based
on harmful levels of hacking: the fatal intrusion, the general
intrusion, and the weak intrusion. Classification analysis
checks the previous hacking, classifies each risky level, and
then gives their descriptions according to classification
standards.
Bayesian classification algorithm is as following:
Each connection record is described with an n-dimensional
feature vector X=(x
1
, x
2
, ..., x
n
), where the n attributes,
respectively, describe characteristics of n-connected records.
Assume that there are m categories C
1
, C
2
, ..., C
m
. Given
an unknown connection record X (or no tag), classification
predicts that X is the highest category of posterior probability,
namely, Bayesian classifier assigns unknown connection
records d to the category C
i
, if and only if P(C
i
|X)P(C
j
|X), 1
j m, j i. According to Bayesian,
P(C
i
|X)=P(X|C
i
)P(C
i
)/P(X).
For any category, P(X) is a constant, we can get the greatest
341
value of P(X|C
i
)P(C
i
). The priori probability of category is
P(C
i
)=s
i
/s, where s
i
is the number of connection records in the
cat
nd s
i
is the number of connection records in the
cat
, if and only if P(X|C
i
)P(C
P(X
rithms in this project to
improve the performance of IDS.
D.
, archiving, intrusion tracing
when necessary. Here omitted.
III. THE EXPERIMENT FOR ASSOCIATION ANALYSIS
A.
s and the detecting phase of
o be suitable for detection rule set while
rule is that of the
lidity has a direct impact on
the accuracy of detection results.
B.
from the file of network pac
rec
a text file, in which th
var
efore, we have to
which the right items does not
which the left items does
more ideal. The few associational rules
are as Table I showed.
TABLE I. ASSOCIATIONAL
rules
ite s
S t Co ce
egory C
i
, s is the total number of connection records.
For calculation P(X|C
i
), in order to reduce overhead, given
the assumption condition of category independence, so that
P(X|C
i
)=P(X
k
|C
i
), (k=1,,n), where P(X
k
|C
i
)=s
ik
/s
i,
s
ik
is the
number of connection records that has the value of X
k
in the
category C
i
, a
egory C
i
.
In order to classify the unknown connections, for each
category C
i
, we calculate P(X|C
i
)P(C
i
), to assign connection
records X to category C
i i
)>
not contain the IP and Ports.
After the above steps of filtration, we get the final
associational rules to be
|C
j
)P(C
j
), 1jm, ji.
Although the algorithms adapt to different scenes, we
comprehensively use these algo
Alarm system
The main functions of alarm system are to build the
emergency measures based on alarm strategies, such as the
appropriate system protection
The design of associational rule detector
Association analysis in data mining is divided into two
parts: the learning phase of the rule
the application of the rules learnt.
1) In the learning phase: the Analysis Engine applys
association analysis to connection records from Network
Sensor Manager, to mine out the associations between the
values of data items under the normal state of networks, and
obtaine the associational rule set, which are filtered by some
artificial rules so as t
detecting intrusions.
2) In the detecting phase: the Analysis Engine gets the
connection records from Network Sensor Manager, and
matches with detection rule set to determine whether intrusion
takes place. The process matching detection
association analysis in the detecting phase.
The detection rule set made in the learning phase is the core
of the Analysis Engine. Their va
The experimental results for associational rules
Our experiment data are kets
orded by TCPdump tool.
We compile the network packets to the format of the
connection records, save them as e
iables are separated by a space.
Association analysis of data mining builds up the rule sets
from the connection records, where the minimum support is set
to 5%, and the minimum confidence is set to 100%. But there
are a large number of useless rules in the rule sets. They can
not be used simply to express the meaningful associations
between the values of connection attributes. If we use them as a
standard for monitoring the network intrusions, the decisions of
the system would be misdetections. Ther
remove the useless rules, as the following:
To filter out the rules in
contain the categories;
Then to filter out the rules in
RULES
uppor
The left items of
The right
ms of rule (%)
nfiden
(%)
192. 168. 7. 13 80 sf normal 7. 8 100. 0
192. 168. 4. 16 25 passive exter normal 8. 3 100. 0
192. 168. 2. 10 80 active normal 23. 2 100. 0
192. 168. 4. 18 tcp 25 sf normal 8. 5 100. 0
192. 168. 7. 23 80 a 19. 4 100. 0 active norm l
IV. CONCLUSIONS
The hybrid IDS is efficient to detect known and unknown
intrusions. The research on intrusion detections based on data
mining is one of the hot study topics at home and abroad.
There are still a series of theoretical and practical problems to
be resolved, and a number of key technologies are required to
make further deep study. The experiment shows that the design
and implementation of an efficient and accurate IDS based on
dat
representative
original data and to filter precisely useless rules.
ence
Foundation of Zhejiang Province, China (No. Y1080343).
the 7th USENIX Security Symposium, San
9 IEEE
from
Computer Engineering, Beijing.
2002, 28(6), pp9-10,169
a mining is a large, complex project.
In the application of the data mining algorithms to original
connection records, how to effectively get the corresponding
frequent patterns is the key to study. In the future, we will
focus the study on how to select appropriate and
ACKNOWLEDGMENT
The work has been supported by the Natural Sci
REFERENCES
[1] W. Lee and S. J. Stolfo. Data mining approaches for intrusion
detection, In Proceedings of
Antonio, TX, January 1998.
[2] W. Lee and S. J. Stolfo. A data mining framework for building
intrusion detection models, In Proceedings of the 199
Symposium on Security and Privacy, Oakland, CA, May 1999
[3] http://www.sans.org/resources/idfaq/data_mining.php?printer=Y,2003.4
[4] Chinese Academy of Sciences (CAS). Network IDS technology in CAS
reached the international advanced level, in Chinese. Retrieved
http://www.cas.cn/jzd/jcx/jcxlc/200204/t20020403_1034832.shtml
[5] Xu Jing, Liu Baoxu and Xu Rongsheng. Design and implementation of
data mining-based IDS, in Chinese,
342