Autonomic Fault Management based on
Cognitive Control Loops
1
1
2
Sung-Su Kim, 1Sin-seok Seo, 2Joon-Myung Kang, and 1,3James Won-Ki Hong
Dept. of Computer Science and Engineering, POSTECH, {kiss, sesise, jwkhong}@postech.ac.kr
Department of Electrical and Computer Engineering, University of Toronto, joonmyung.kang@utoronto.ca
3
Division of IT Convergence and Engineering, POSTECH
Abstract— This paper presents an efficient fault management
approach based on cognitive control loops in order to support
autonomic network management for the Future Internet. The
cognitive control loops determines urgency of network alarms,
processes urgent alarms more quickly, and then infers root causes
of the problems based on learning and reasoning. We show that
we reduce a number of alarms by correlation and detect alarm
priorities using an ontology model based on the policy.
Index Terms— Autonomic Fault Management, Cognitive
Management, Alarm Correlation, Association Rule Mining
T
I. INTRODUCTION
he Internet is a very successful modern technology. Despite
that success, fundamental architectural and business
problems exist in its design. Incremental patches have been
added to solve those problems so far. However, there is a
limitation to solve inherent problems incrementally, such as a
lack of IP addresses, security, and management problems.
There are some approaches for the design of the Future Internet:
revolutionary and evolutionary [1] [2]. In this design,
management of the Future Internet is one of the important
topics. However, we do not have a clear picture of the Future
Internet yet and many emerging technologies are investigated
for the Future Internet. For example, Content Centric
Networking (CCN) is the one of the hot issues and network
virtualization and autonomic networking will be the key
technologies for the Future Internet [3].
Although a new Internet architecture substitutes the current
Internet architecture, a basic paradigm of network management
will not be changed. The paradigm is to understand current
status of network and take the appropriate actions. In order to
understand the network status, we need to monitor network
devices, links, and servers. Network administrators suffer from
lots of network events and alarms. Enterprise networks
generate millions of network alarms per day. In cloud
computing or virtualized network environment, there will be
more network events and alarms to be analyzed. In addition to
physical entities, alarms related to virtualized resources will be
generated.
Existing rule based and case based alarm correlation
approaches need manually defined rules and cases based on
assumption that a managed network is stable. However, there
might be a missing dependency between alarms and a manual
modification is necessary when a managed network is changed.
For example, if a topology of the managed network is changed,
some rules related with a topology should be changed manually.
Therefore, it is necessary to update a dependency model with
learning. Alarms contain information about serious status of
network resources, such as link, router, switch, etc. However,
this fragmentary information does not tell the impact of a
certain problem. Serious and urgent alarms need to be detected
and processed more quickly than normal alarms.
We propose an efficient fault management approach based
on a cognitive control loop which is a part of the new FOCALE
model. The cognitive control loop determines priorities of
network alarms, processes alarms with three different control
loops, and then infers root causes of the problems based on
learning and reasoning. In order to evaluate our approach, we
synthetically generate alarms, correlate and analyze them to
find root causes. In addition, we propose ontology for
determining the priorities of alarms. Urgent cases are treated
immediately with specified actions. Otherwise, possible sets of
actions are examined and the most appropriate one is selected.
In our experiment, 16 different alarms are reduced to four
clusters by using learned rules and our clustering algorithm. It
means that the effort and time of higher-level network manager
s can be reduced.
The organization of this paper is as follows. Section 2 covers
related work on a FOCALE autonomic architecture [4] for the
future Internet and alarm correlation. Section 3 presents a
concept of a cognitive control loop. Section 4 describes a
detailed approach for processing network alarms. Section 5
presents a case study to validate our concept and algorithm.
Finally, Section 6 presents conclusions and future work.
II. RELATED WORK
In this section, we present a FOCALE autonomic
architecture and existing alarm correlation approaches.
A. FOCALE
FOCALE [5] is an autonomic networking architecture. The
acronym FOCALE stands for Foundation – Observation –
Compare – Act – Learn – rEason, which describes its novel
control loops. Note that other autonomic approaches, such as
Since there are at least two fundamentally different operations
1104
978-1-4673-0269-2/12/$31.00 ©2012 IEEE
that the control loop is responsible for – monitoring vs.
(re)configuration – this overloads the semanttics of the control
structure, since these two operations have notthing in common.
Indeed, a fault received from one managed entity might not
have anything to do with the root cause of thee problem; hence,
the (re)configuration loop will affect ddifferent entities.
FOCALE uses the DEN-ng information moodel [7] and the
DENON-ng ontologies [8] to translate dispaarate sensed data
into a common networking lingua francaa. The DEN-ng
information model is currently being stanndardized in the
Autonomic Communications Forum (ACF
F); its previous
versions have already been standarrdized in the
TeleManagement Forum and in the ITU-T. The DEN-ng is
used to represent static characteristics and behhaviors of entities;
the DENON-ng is then used to augment this model with
consensual meaning and definitions so that vendor-specific
concepts can be mapped into a common teerminology. This
enables facts extracted from sensor input datta to be reasoned
about using ontology-based inferencing.
B. Alarm Correlation Approaches
There are four alarm correlation approaaches, rule-based
alarm correlation [9], codebook-based alarm
m correlation [10],
case-based alarm correlation, mining based alarm correlation
[11] . However, Rule-based, codebook-basedd, and case-based
approaches are highly dependent on expert knowledge of
skilled operators. Especially, it is not easy to reflect
dynamically changing network condition succh as wireless or
overlay environments because rules or dependdency models are
made manually based on the assumption that nnetwork is mostly
stable. Mining based alarm correlation is aable to detect the
cause and effect relationships between aalarms [11, 12].
However, it is hard to detect relationships in a short period of
time because of its long processing time. Our m
method used both
rule-based and mining based approaches. E
Efficiency can be
taken from the rule-based approach and dyynamic changing
relationships are detected by mining based approach.
Fig. 1 Simplified Version of thee FOCALE Autonomic
Architecture
nite State Machine (FSM)
Since all processes use the Fin
and reasoner, the system can recogn
nize when an event or a set
of events has been encountered befo
ore. Such results are stored
in short-term memory. This reactivee mechanism enables much
of the computationally intensive porrtions of the control loop to
be bypassed, producing two “shortcu
uts” labeled “high priority”
and “urgent”. The deliberative proceess is embodied in the set of
bold
arrows,
which
h
take
the
Observe-Normalize-Compare-Plan-D
Decide-Act path. This uses
long-term memory to store how
w goals are met on a
context-specific basis. The reflectiive process examines the
different conclusions made by the seet of deliberative processes
being used, and tries to predict the best
b set of actions that will
maximize the goals being addresssed by the system. This
process uses semantic analysis to un
nderstand why a particular
context was entered and why a conttext change accrued to help
predict how to more easily and effiiciently change contexts in
the future. These results are also sto
ored in long-term memory,
so that the system better understaand contextual changes its
reasoning to aid debugging.
OPS
III. COGNITIVE CONTROL LOO
FOCALE [5] control loops are self-governinng, in the system
senses changes in itself and its environment, annd determines the
effect of the changes on the currently active set of business
policies. As shown in Fig. 1, the FOCALE ccontrol loops [13]
operate as follows. Sensor data is retrieved fr
from the managed
resource (e.g., a router) and fed to a model--based translation
process, which translates vendor- and device--specific data into
a normalized form in XML using the DEN
N-ng information
model and ontologies as reference data. This iss then analyzed to
determine the current state of the managed enntity. The current
state is compared to the desired state.
In order to strengthen the self-awarreness, the new
FOCALE cognition model employs a m
model of human
intelligence built using simple processes,, which interact
according to three layers, called reactive, deliberative, and
reflective [14, 15]. The new FOCALE cognition model
employs cognitive processes as shown in Fig. 2.
gnitive Model
Fig. 2 FOCALE Cog
IV. AUTONOMIC FAULT
T MANAGEMENT
In this section, we describe how alarms
a
are processed in a
cognitive control loop. Cognitive con
ntrol loops are able to adapt
to changing environment with reeasoning and learning. In
addition, alarms are classified based
d on their urgency to solve
important problems more quickly.
2012 IEEE/IFIP 4th Workshop on Management of the Future Internet (ManFI)
1105
A. Multiple Control Flows based on the Prioorities of Alarms
The cognitive control loops process netw
work events and
alarms. At the same time, relationships bettween alarms are
leaned to adapt to changing environmentaal conditions. As
shown in Fig. 3, multiple control loops are avaailable based on a
priority of an alarm.
t normalize phase of new
Network alarms are correlated in the
FOCALE control loops. First, alarm
m information is extracted to
the form of Fig. 4. Typically, a sin
ngle failure affects to other
services and devices. Therefore, if
i a single failure occurs
somewhere in the network, many allarms related to the failure
are generated. Once a fault occurs, many identical alarms are
generated to notify the fault before it is fixed. Those identical
alarms are generalized as shown in Fig. 4. By alarm
generalization, the number of alarm
ms is reduced. Generalized
alarms are then correlated to reducee the number of alarms and
find root cause alarms.
Fig. 4 Example of Alarm
m Generalization
n a Priority
Fig. 3 Multiple Control Flows based on
These control flows are mapped to thee new FOCALE
cognition model in Fig. 3. In an observe phasee, data is retrieved
from the managed resource (e.g., SNMP polling or trap).
Vendor specific data is translated to a normaalized form based
on the DEN-ng information model. Network aalarms are filtered
and correlated in order to efficiently find roott cause alarms. In
this phase, a dependency model is used to corrrelate alarms. At
the same time, a normalized data is fed to a learning phase.
Changing environment conditions are captuured by learning,
especially relationships between alarms are deetected to update a
dependency model. After correlating alarms in a normalize
phase, a priority of the alarms is determined bby classifying the
alarm. The alarm is classified as urgent if tthis alarm affects
serious performance degradation of netwoork resources or
services. Alarm priorities are determined based on a policy. If
an alarm is urgent, a set of actions is sent to thee network devices
without passing through plan and decide phhases. This is the
difference from the previous version of FOCA
ALE control loops.
If the current state is a high priority, it skipss a plan phase for
taking immediate actions. For a low priority allarm, a plan phase
takes a high-level behavioral specification frrom humans, and
controls the system behavior in such a way as to satisfy the
specifications. It means that a plan phase computes all the
possible sets of actions to change the current state to a desired
state. A decide phase chooses a set of actionss which maximize
a goal. Finally, an act phase sends commends for chosen action
to target network devices. Model-based trannslation converts
device-neutral actions to device-specific comm
mands.
1106
Fig. 5 An Example of a Deependency Model
Alarm correlation depends on a baasic dependency model and
association rules detected in the learrning process are added. As
we mentioned, a learning process leearns relationships between
alarms. Fig. 5 shows a basic dep
pendency model which is
manually defined. It is based on th
he TCP/IP model. A lower
layer problem affects to higher layerrs. For example, if a server
link is down, an IP layer is also unavailable. At first, a manually
d. Additional rules learned
defined dependency model is used
from association rule mining are add
ded to the basic dependency
model to adapt to changing environm
ment conditions.
Algorithm 1 describes how to mak
ke a set of clusters based on
the association rules. Initially, alaarms are generalized and
grouped with a same alarm ID. It means
m
that each alarm is a
single cluster by itself in the iniitial phase. Then, all the
association rules are examined one by one and the
corresponding clusters are merged. In this way, each cluster
hip information. Therefore,
contains both alarms and relationsh
root cause alarms can be anaalyzed easily. Based on
relationships between alarms belong
ged to the same cluster, root
causes can be inferred.
2012 IEEE/IFIP 4th Workshop on Management of the Future Internet (ManFI)
Algorithm 1. Alarm Clustering
Input: A set of E of alarms (a1, a2, … )
A set of R of association rules (r1, r2, ...)
Output: A set of C of clusters (c1, c2, … )
1: C= group the set E by an identical alarm ID
D
2: n= count(R)
3: for i = 1 to n
4: rule ri is form of aj՜ ak
5: find cluster cl, including alarm aj
6: find cluster cm, including alarm ak
7: merge cl and cm into cl
8: put the association rule ri into cl
B. Association Rule Mining
We can use various machine learningg techniques for
inferences. For efficient alarm correlation,, it is extremely
ms. In this paper,
important to find relationships between alarm
association rule mining is used to find the causse and effect from
relationships between alarms.
Table 1. Alarm transaction dataa sets
TID
Transaction item sets
1
A1, A2, A4, A55
2
A1, A4, A5
3
A2, A3, A4, A55
4
A1, A2, A4,
5
A1, A3, A5
The transaction database is made of the aalarm data in the
managed network after pretreatment shown in Table 1. Each
transaction in a database has a unique traansaction ID and
contains a subset of the items. A rule is defined as an
implication of the form ܺ ՜ ܻ, where ܺǡ ܻ ܫ كand ܺ ܻ תൌ
. A priori association rule algorithm basically has two steps;
the first is finding all frequent item sets in a daata set by applying
min_sup (minimum support threshold).; the second is
generating association rules based on the frequuent item sets. For
any transaction sets for X, the support for the X, sup(x), is
defined as a portion of the transactions in thhe data set which
contains the item set in Equation (1). In Table 1, support of {A1}
is 4/5 (80%). We assume that the default vaalue of min_sup is
10% and the support of {A1} is greater than min_sup.
Therefore, rules related to A1 should be foundd.
ሺሻ ൌ
ୡ୭୳୬୲ሺ୶ሻ
୲୦ୣ୬୳୫ୠୣ୰୭୲୭୲ୟ୪୲୰ୟ୬ୱୟୡ୲୧୭୬
ൈ ͳͲͲሺΨሻ
alarms. Classifying urgent alarms iss dependent on a goal and
policy of a network. We defined the
t ontology based on the
DEN-ng information model to make effective semantic
ments, alarms, and their
representations of network elem
priorities.
Fig. 6 describes the concept of network elements and alarms
for determining their state and prriorities. An element is a
network resource that has its own staate, such as CPU utilization,
link throughput, etc. An element pro
ovides services and notifies
its state to a network administratorr. A notification can be an
alarm or event. An alarm has a destiination, source, and type as
described in Fig. 4. Alarms are claassified into three classes:
urgent, high priority, and low priority.
p
An element also
provides a service. A service has threee classes: gold, silver, and
bronze. A gold service is the most im
mportant service.
Three alarm classes are defined baased a policy of a managed
network. For example, it can bee defined by a network
administrator that if an alarm affeects to the Service Level
Agreement (SLA) violation of a gold service, it can be
classified as urgent. We use Semaantic Web Rule Language
(SWRL) [18] to make conditional ru
ules into the ontology.
We assume that alarms related to
o a gold service are urgent
and a gold service is provided by the server WS2 in Fig. 7 and
alarms related to WS2 are classifieed as urgent. For example,
“WS2 HTTP unavailable”, “WS2 IP down”, or “WS2 port
b fixed as soon as possible.
down” are urgent alarms needed to be
The following SWRL rules are for classifying
c
alarms based on
our assumption. These SWRL rules determine alarms as urgent
if alarms are about an element that provides
p
a gold service.
z
Alarm(?a) hasAlarmDest(?
?a, ?dest) Element(?dest)
providesService(?dest,
՜UrgentAlarm(?a)
z
?s)
GoldService(?s)
a, ?dest) Element(?dest)
Alarm(?a) hasAlarmSrc(?a
providesService(?dest,
՜UrgentAlarm(?a)
?s)
s)
GoldService(?s)
(1)
Confidence of the rule ܺ ՜ ܻ is defined inn Equation (2). In
Table 1, conf( ͳܣ՜ ܣͶ) is sup(ܣ ڂ ͳܣͶሻ Ȁݑݏݏሺͳܣሻ ൌ 75%.
Frequent item sets and the minimum confidennce constraint are
used to form rules.
ሺ ՜ ሻ ൌ
ୱ୳୮ሺ୶ ୷ ڂሻ
ୱ୳୮ሺ୶ሻ
ൈ ͳͲͲ
ͲሺΨሻ
(2)
C. Determination of Alarm Priorities
One of the most important features of the cognitive control
loops is that alarms are controlled differentlly based on their
priorities. Urgent alarms can be processed faaster than normal
Fig. 6 Ontological Model for Alarms and Priority
ND RESULT
V. EVALUATION AN
In this section, we describe evalu
uation and its results for
validating our proposed approach.. We
W implemented the alarm
2012 IEEE/IFIP 4th Workshop on Management of the Future Internet (ManFI)
1107
unavailable” and “FS2 FTP unavaillable” alarms. However, in
the specific time window, the nod
de N1 generates multiple
alarms when the fault is not fixed during the time window.
Those alarms are including redundaant and similar alarms. For
example, the node N1 generates fivee “WS2 HTTP unavailable”
and five “FS2 FTP unavailable” alarrms. Those identical alarms
are generalized as we explained in
n Section 4. Therefore, a
higher level manager receives a red
duced number of alarms. In
our experiment, the generalization process
p
reduces 100 alarms
to 22 alarms. However, alarms gen
nerated by node N6 and N8
are not received because of the failu
ure of R4.
X\
uGGh
correlation algorithm in a Java language and uused the Weka [16]
library for association rule mining. We gennerated synthetic
alarm data sets for the experiment which corrrelates alarms and
finds root causes. Fig. 7 shows the experiimental topology
composed of 22 nodes with four critical alaarms. “IP Down”,
“Link down”, “Port down”, and “Routeer down” occur
randomly at designated nodes as shown in Fig 8. A default
route from R6 to R1 is through R3-R0-R1. If a link between R3
and R0 is down, N4 cannot connect to WS
S1 and FS2. We
generated two synthetic alarm data sets for training and
validation. For example, the link of the routerr R0 is down from
5 to 15 second and the WS1’s port is dow
wn from 20 to 30
second. If “WS1 port down” occurs, N1 andd N8 generate an
alarm “WS1 HTTP Unavailable”. We assumeed that N1 and N8
periodically poll the states of all the servers inn the network.
XW
\
W
hG
uX
uY
uZ
u[
u\
XW
XW
XW
XW
XW
yW
yX
yY
yZ
{G
~zX
mzY
\
\
\
uG
\
\
[
`
kGG
\
\
]
XZ
\
^
Fig. 8 Synthetically Generated Alarms from Each Device
Fig. 7 Experimental Topology
Based on the generated synthetic alarm daata set, the cause
and effect relationships are detected. Table 2 shows a part of
alarms, alarm IDs, and detected association ruules. A3ÎA4 and
A3ÎA6 mean that the services on WS
S1 and FS2 are
unavailable if the R0-R3 link becomes dow
wn. Based on the
rules, when A3, A4, and A6 alarms are ggenerated, A3 is
identified as the root cause alarm. There aare thousands of
alarms in enterprise networks [17] and a largee number of rules
can be detected.
Table 2. Alarms and Association Rulees Detected
Alarm
Alarm
Rules
ID
A3
R0-R3 link down
A3Î A4
A4
N4ÎWS1 HTTP unavailable
A3ÎA6
A5
N1ÎWS1 HTTP unavailable
A8ÎA6
A6
N4ÎFS2 FTP unavailable
A9ÎA6
A7
N4ÎWS2 HTTP unavailable
A3ÎA7
A8
N1ÎFS2 FTP unavailable
A9
FS2 IP down
Then, all the critical alarms described in Fiig 8 are generated
simultaneously. The type and the number of generated alarms
are described in Fig. 8. The node N1 generaates “WS2 HTTP
1108
Fig. 9 shows an output of our clu
ustering algorithm. Even if
the total number of alarms is still larg
ge, we can find a root cause
alarm in each cluster easily. For example, the cluster 1 consists
navailable”, and “FS2 FTP
of “R0 link down”, “WS1 HTTP un
unavailable” alarms. Based on the association
a
rule in Table 2,
we can infer that the root cause alarrm is “R0 link down”. The
cluster 4 consists of “WS1 port down” and “WS1 HTTP
unavailable” alarms. Therefore, neetwork administrators only
can focus on the root cause alarm. Fig.
F 9 shows the number of
clusters and alarms of each cluster. The
T root cause alarm of the
cluster 1 is “R0-R3 link down”. Th
he other alarms caused by
“R0-R3 link down” are included in the cluster 1. The cluster 1
includes the other alarms caused by
y “R0-R3 link down”, such
as “N4ÎWS1 HTTP unavailab
ble”, “N4ÎWS2 HTTP
unavailable” and “N4ÎFS2 FTP un
navailable”. The root cause
alarm of the cluster 2 is “R4 router down”, and “R4 neighbor
loss” and “R4 IP down” are relateed alarms. The root cause
alarms for cluster 3 and 4 are “FS2
2 IP down” and “WS1 port
down” respectively. 14 different alarms are reduced to four
clusters.
Based on the ontological model shown in Fig. 6, we made a
SWRL rule for determining a prioriity of an alarm. The alarms
in four clusters are examined and claassified to determining their
priorities by a SWRL rule shown
n in Section 4. An alarm
“N4ÎWS2 HTTP unavailable” is determined as an urgent
alarm. WS2 provides a gold servicce, which has the highest
priority among the service classes. Itt means that the problem is
significant and should be solved imm
mediately. The root cause of
the alarm is “R0-R3 link down” by
y analyzing from the alarm
2012 IEEE/IFIP 4th Workshop on Management of the Future Internet (ManFI)
cluster 1. Therefore, a set of commands for recovering a link
R0-R3 is sent to R0 and R3 without passing through plan and
decide phases.
Ontology and SWRL rules enable us to analyze services
affected by an alarm as well as the state of network devices.
Based on the analysis, we can determine whether the alarm is
urgent or not. It enables us to solve an urgent problem quickly.
If the cognitive control loop is not used, alarms are processed
one by one. Even if an alarm is urgent, it would be treated same
as other normal alarms. Then, critical alarms cannot be
examined while others are being processed. This is the strength
of a cognitive control loop.
REFERENCES
[1]
[2]
[3]
[4]
uGGh
[
Z
[5]
Y
[6]
X
W
jX
hG
jY
jZ
j[
Z
Z
Y
{G
X
uG
X
X
kGG
X
X
[7]
X
[8]
Fig. 9 Clustered Set of Alarms
VI. CONCLUSIONS
In this paper, we have proposed an efficient fault management
approach based on cognitive control loops. We have shown a
case study to validate our concept using the synthetically
generated alarm data sets. At first, manually defined
dependency model is used. Missing and changing dependencies
are detected by a learning phase of the control loops. Ontology
and SWRL rules are used to represent the relationships among
network resources, services, and alarm priorities. From the
evaluation, we have shown that we reduced a number of the
alarms, processed the alarms with different orders based on the
alarm priority, and found root causes easily by association
rules.
Our future work is to evaluate the performance of the control
loops. We will show that our approach can process urgent
alarms more quickly comparing to the existing control loops.
[9]
[10]
[11]
[12]
[13]
ACKNOWLEDGMENTS
This research was supported by the WCU (World Class
University) program through National Research Foundation of
Korea funded by the Ministry of Education, Science and
Technology (R31-2010-000-10100-0) and the KCC(Korea
Communications Commission), Korea, under the “Novel Study
on Highly Manageable Network and Service Architecture for
New Generation" support program supervised by the
KCA(Korea
Communications
Agency)”
(KCA-2011-10921-05003)
[14]
[15]
A. Feldmann, “Internet clean-slate design: what and
why?”, ACM SIGCOM Computer Communication
Review, Vol. 37, Issue 3, Jul. 2007, pp. 59-64.
M. Blumenthal and D. Clark, “Rethinking the design of
the Internet: The end to end arguments vs. the brave
new world”, ACM Transactions on Internet Technology,
vol. 1, no. 1, Aug. 2001, pp. 70-109.
V. Jacobson, D.K. Smetters, J.D. Thornton, M.F. Plass,
N.H. Briggs, and R.L. Braynard, “Networking Named
Content,” In CoNEXT '09, Rome, Italy, Dec. 2009
J. Strassner, “Autonomic Networking – Theory and
Practice”, 20th Network Operations and Management
Symposium (NOMS) 2008 Tutorial, Brazil, April 7,
2008.
J. Strassner, N. Agoulmine, and E. Lehtihet, “FOCALE
– A Novel Autonomic Networking Architecture”,
ITSSA Journal, Vol. 3, No. 1, May 2007, pp 64-79.
IBM, “An Architectural Blueprint for Autonomic
Computing,
v7”,
http://www-03.ibm.com/autonomic/pdfs/AC%20Bluep
rint%20White%20Paper%20V7.pdf.
J. Strassner, “Introduction to DEN-ng”, Tutorial for FP7
PanLab II Project, 2009.
M. Serrano, J. Serrat, J. Strassner, and M. Ó Foghlú,
“Management and Context Integration Based on
Ontologies, Behind the Interoperability in Autonomic
Communications”, extended journal publication of the
SIWN International Conference on Complex Open
Distributed Systems, Chengdu, China, Vol 1, No. 4, Jul.
2007
D. Banerjee, V. R. Madduri, and M. Srivatsa, “A
Framework for Distributed Monitoring and Root Cause
Analysis for Large IP Networks,” 28th IEEE
International Symposium on Reliable Distributed
Systems, September 2009, pp.246-255.
White Paper, “Automating Root-Cause Analysis: EMC
Ionix Codebook Correlation Technology vs.
Rules-based Analysis, Nov. 2009.
Jukic O. and Kunstic M., “Logical Inventory Database
Integration into Network Problems Frequency Detection
Process,” ConTEL 2009, Jun. 2009, pp.361-365.
Risto Vaarandi, “A Data Clustering Algorithm for
Mining Patterns from Event Logs,” IP Operations and
Management (IPOM 2003), Oct. 2003, pp.119-126.
J. Strassner, J.W.K. Hong, S. van der Meer, “The Design
of an Autonomic Element for Managing Emerging
Networks and Services”, International Conference on
Ultra Modern Telecommunications (ICUMT 2009),
October 12-14, 2009, St. Petersburg, Russia.
M. Minsky, “The Society of Mind”, Simon and Schuster,
New York, ISBN 0671657135, 1988.
J. Famaey, S. Latre, J. Strassner, and F. De Turck, “A
Hierarchical Approach to Autonomic Network
Management”, Network Operations and Management
Symposium Workshops, 2010 IEEE/EFIP, , Osaka,
Japan, April 19, 2010.
2012 IEEE/IFIP 4th Workshop on Management of the Future Internet (ManFI)
1109
[16]
[17]
[18]
1110
Ian H. Witten and Eibe Frank, “Data Mining: Practical
Machine Learning Tools and Techniques with Java
Implementation”, Morgan Kaufmann Publishers. ISBN
1-55860-552-5.
X. Chen, Y. Mao, Z. M. Mao, and J. Van Der Merwe,
“KnowOps: Towards an Embedded Knowledge Base
for Network Management and Operations,” In
Proceedings of the 11th USENIX conference on Hot
topics in management of internet, cloud, and enterprise
networks and services (Hot-ICE’11), Berkeley, CA,
USA, 7-7.
SWRL: A Semantic Web Rule Language Combining
OWL and RuleML,W3C Member Submission 21, May
2004.
2012 IEEE/IFIP 4th Workshop on Management of the Future Internet (ManFI)