Correlating IDS Alerts With System Logs by Means of A Network-Centric SIEM Solution
Andreas Bråthen
Master’s Thesis
Master of Science in Information Security
Department of Computer Science and Media Technology
Gjøvik University College, 2011
Avdeling for
informatikk og medieteknikk
Høgskolen i Gjøvik
Postboks 191
2802 Gjøvik
Andreas Bråthen
Correlating IDS alerts with system logs by means of a network-centric SIEM solution
This thesis concerns the need for a network-centric Security Information and Event Management
(SIEM) solution that correlates data based on network topology and traffic flow, and which takes
into account the continuous change in such networks. The research question is raised based
on the fact that current SIEM solutions are device-centric with minimal understanding of the
causal relationship between log events. Furthermore, the used approaches are suboptimal in
correlating data collected from scattered security systems (e.g. IDS, firewall), which requires
security personnel to analyze larger data sets with potentially high false positive rate, rather
than having the incidents validated, prioritized, and presented in a unified view.
We have in this thesis proposed a conceptual model based on a network-centric approach,
and performed a case study of this model using Cisco NetFlow. We observe the model through a
series of attacks, and analyze whether the model is a more viable approach to deal with incidents
in comparison to current approaches, and whether the approach makes it possible to reduce the
number of alerts requiring follow-up and in prioritizing incidents more accurately. The study
identifies several network characteristics that may influence the practical implementation of such
a model and proposes a set of requirements that a network-centric model should fulfill.
Denne oppgaven omhandler behovet for en nettverkssentrisk Security Information and Event
Management (SIEM) løsning som korrelerer data basert på nettverkstoplogi og nettverksflyt, og
som tar hensyn til den kontinuerlige endringen i slike nettverk. Forskningsspørsmålet er basert
på det faktum at gjeldende SIEM løsninger er enhetssentriske med minimal forståelse av det
årsaksmessige forholdet mellom logg-innslag. Videre er gjeldende tilnærminger suboptimale i
korreleringen av data som er samlet fra spredte sikkerhetssystemer (f.eks. IDS, brannmur), som
krever sikkerhetspersonell til å analysere større datasett med potensielt høy falsk positiv rate,
istedenfor å få hendelsene validert, prioritert og presentert i en enhetlig visning.
Vi har i denne oppgaven foreslått en konseptuell modell basert på en nettverkssentrisk tilnærm-
ing, og utført en case study av denne modellen ved bruk av Cisco NetFlow. Vi observerer mod-
ellen i en rekke angrep, og analyserer hvorvidt modellen er en mer levedyktig tilnærming for å
håndtere hendelser i sammenligning med gjeldende tilnærminger, og hvorvidt en slik tilnærming
gjør det mulig å redusere antallet alarmer som behøver oppølging og prioritere hendelser mer
nøyaktig. Studien identifiserer flere nettverks-karakteristikker som kan påvirke den praktiske
implementeringen av en slik modell og det foreslås et sett med krav som en nettverkssentrisk
modell bør oppfylle.
The idea for this thesis emerged while I was working at the European Organization for Nuclear
Research (CERN) a few years ago, and concretized through my Msc studies at Gjøvik University
College. Although the study has been interesting and meaningful by itself, it has been so partic-
ularly because of all the people that have been involved and contributed to it. I’m astonished by
all the time many people have been willing to spend, for which I’m forever thankful.
My supervisor, Slobodan Petrović, deserves a thanks for his great feedback and guidance. He
has extended to me numerous of hours of advice and been steadfast in his role as a supervisor.
My fellow students deserve to be thanked for all the good times and companionship we
have had during the studies. Without them the studies would not have been as interesting and
enjoyable as they have been.
I would like to thank my family and friends for their understanding and support. They have
motivated me in times of need, and always encouraged me to look ahead.
I would also like to express my utmost gratitude to my girlfriend, Stine Andresen, for her
love, sacrifice and support through my Bsc and Msc studies. She has always believed in me and
maintained unwavering faith in my abilities.
There are so many others whom I may have inadvertently left out and I sincerely thank all of
them for their help.
1 Introduction
1.1 Topic Covered by the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Justification, Motivation and Benefits . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Related Work
2.1 Event Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Correlation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Reducing the Correlation Data Set . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Context and Situational Awareness . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Determining Network Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Establishing Network Flow . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Attack Classifications and Detection Capabilities
3.1 Attack Classifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Classification Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 The Igure and Williams Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.1 Confidentiality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.2 Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.3 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Network Event Correlation Model
4.1 Network Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Protection Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Proposed Network-Centric Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.1 Step 1: Detecting Intrusions . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.2 Step 2: Tracking Flow Direction . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.3 Step 3: Evaluating Protection Gate Decisions . . . . . . . . . . . . . . . . . 26
4.3.4 Step 4: Estimating Intrusion Impact . . . . . . . . . . . . . . . . . . . . . . 27
5 Experiment
1 Introduction
1.2 Keywords
Keywords covered by this thesis in accordance to the taxonomy provided by IEEE Computer
Society: 3.2.0.{Data communication, Network-level security and protection}, 3.2.3.{Network
monitoring}, 11.6.5.{Unauthorized access}.
attack did succeed, whether the attack was blocked by another security system or whether the
attack had any impact on the destination at all. A SIEM on the other hand, has a much better
idea of what is going on as it interprets events through log data that have taken place on the
devices. The problem with SIEMs are exactly the opposite – because they have no understanding
of details and do not focus on network data being transmitted, they do not know the root cause
behind log entries, the causal relationship between log entries and the attack vector resulting in
a log entry.
Because SIEM operates on a higher abstraction layer, it trusts other devices (i.e. sources) to
send information (i.e. through log data) that the devices themselves, or the applications generat-
ing events, consider relevant. When events are not generated, the SIEM will be unable to detect
attacks, as it is neither aware of an activity taking place nor able to reconstruct or fill the gap
to correlate successfully 1 . Examples of this could be when the source purposely does not record
events or when events occur at a layer below or outside the applications working sphere. The
SIEM is then exposed to circumvention where the state of the source is manipulated in such a
way that no trace can be found - which can be the case for network traffic (e.g. Denial Of Service
(DoS), protocol anomaly, encrypted payload), low level system alterations or even after a device
has been compromised.
Traditional SIEM approaches however, do not emphasize on any particular type of logs and
may therefore be considered to operate on a flat structure, treating logs from systems, network
devices and applications on an equal level. SIEM systems try, however, to integrate some network
awareness through the use of vulnerability management, asset database, and network change
and configuration control systems (NCCM). Some systems are also supplemented with contextual
information such as network environment and threats [7]. Despite of this, SIEMs’ remain static
systems that do not take into account the continuous changes in a network infrastructure and lack
the necessary network details to detect new, or analyse incidents sufficiently. Current solutions
are therefore unable to correlate events requiring a holistic view and understanding of network
dynamics, including context, topology, packet payload, protocols, session flows and more. We
will in this project try to determine what requirements are needed to make a SIEM system that
has a holistic view of the network.
1 In some domains, techniques in the pre-processing phase have been proposed such as path completion. Path com-
pletion refers to inclusion of important page access records that are missing in the access log due to browser and proxy
server caching [9]. Other studies in the same domain have proposed a network monitor system instead [10].
wall, Web server, IDS, end-devices and so on [12, 13]. Furthermore, correlating information
found in multiple logs allows IDS’ to improve the effectiveness of alerts [13]. By studying alerts
generated by IDS’ and analysing what log data is relevant for those alerts, we would able to
make more educated decisions when dealing with incidents. This would also allow us to have
a holistic view of what happened with minimum information. By having complete information,
security personnel can conduct more careful analysis and will have more time doing so as false
positives and real attacks that are stopped, would never reach the view of the analyst in real
2 Related Work
This chapter reviews current research in the fields of event correlation, topology discovery and
network flow analysis, as well as drawing parallels to context integration to build a foundation
for the research.
Before the event correlation takes place, three processes must be complete [16, 18]: collect-
ing, filtering and normalization. The first process collects data in its raw form. This is followed
by the filtering process, whose objective is to reduce the number of events and disregard those
that are not related to an attack [16]. It conducts four types of tasks to achieve its objective [17]:
compression, counting, suppression and generalization. Finally, the normalization process trans-
lates data into a standardized format, which is understood by all components in the correlation
process [15].
The event correlation process is subject to several problems, which are inherited by the pre-
vious components, related to the raw data set and the manipulation of it. In particular, it has
to deal with issues related to logging (e.g. formats and vast amounts of recorded events) [16]
and data that are inadequate (e.g. ambiguous, incomplete, inconsistent) [17]. In this respect,
the lack of standardized log formats and an agreed protocol for generating events, may be con-
sidered major causes of these problems. Whenever the event correlation engine measures the
strength between variables, it has to deal with the aforementioned issues.
2.1.1 Correlation Techniques
The correlation process is described by [16] as follows:
Events are correlated by assigning relationships between multiple events related directly or
indirectly with the system violation. The events related with the attack are generated by dif-
ferent devices and applications, and are written in different log files. The correlation process
then, links a series of events in order to recreate the attack sequence. [Ed. Depending on the
outcome of the correlation, an alert may be generated. The alert can be assigned a priority
value according to the severity, impact or probability.]
The correlation engine uses rules to interpret incoming events. It is problematic to construct
and to maintain correlation rules, as it requires continuous effort to identify problem patterns,
which is time-consuming and error-prone [18]. The correlation techniques used by the correla-
tion engine may fall into the following four categories [1, 19]:
Rule-based correlation. The relationships between alerts are specified in rules, which stipu-
late pre- and post-conditions that need to occur for a correlation to take place. The correlation
method is based on predefined sequences of events that relate to known patterns and behav-
iors, which defines an attack [16].
Scenario-based correlation. Causality relationships between alerts are specified in terms of
scenarios. A successful correlation (i.e. match) occurs when a combination of alerts form a
predefined attack scenario.
Statistical correlation. The statistical relationship between alerts fall within a predefined thresh-
old (i.e. statistically related). The threshold value is based on estimates for what is known
good or known bad behaviour.
Temporal correlation. Correlation takes place according to the alerts or events temporal rela-
tionship (i.e. based on time-series).
Another type of correlation technique involves doing an impact analysis of the attacked sys-
tem [20]. The idea of impact analysis is to determine what impact the threat has on the system
in question. This can be done with the following two correlation methods: local correlation) The
impact of the attack is verified by a local agent running on the victim, which checks whether the
attack succeeded or not; and Operating System (OS) correlation) The attack class is compared
to the types of services running on the host (including OS), which deems the attack harmless if
the host is not exposed to the particular attack class.
2.1.2 Reducing the Correlation Data Set
In an attempt to reduce the volume of logs relevant to an incident and improving the correlation
process, [21] studied how distinct types of attacks were related to various types of log files.
The study showed that three of the categories, Denial of Service (DoS), User-to-Root (U2R) and
Remote-to-Local (R2L), constituting 44 attacks, had large similarities in terms of log traces and
where some attacks could be reduced to as little as 5 log sources from the initial 15. Accordingly,
by using this log-to-alert mapping they were able to improve IDS accuracy and effectiveness by
correlating log data.
In a similar study [13], the researchers looked at the relationship between attacks and log
types using a top-down approach. The idea was to study some known attacks to infer which logs
would contain traces of them. They concluded that some attack classes had common behavior
and that it was possible to identify logs that would be more likely to store useful information
related to particular attack classes. The attacks they studied were categorized into two classes:
Remote-to-User (R2U) and DoS. Out of the 15 attacks, 8 belonged to the former class while the
remaining to the latter. When studying these classes, the researchers observed that the two most
important logs were syslog and NetFlow 1 . Accordingly, correlation based on log content could
improve IDS performance.
In relation to our study, these results are interesting from two aspects in particular. First, they
support our assertion that log correlation used together with IDS may be advantageous in terms
of reducing the number of alerts requiring follow-up and in investigating intrusions. Then, both
studies showed high dependence on logs originating from network devices such as NetFlow and
routing information. In the first study, 38 of the 44 attacks had traces found in NetFlow. In the
second study, the figure was 12 out of 15 attacks. This supports our assertion that network flow
and topology information have some importance in detecting or investigating attacks.
2.1.3 Context and Situational Awareness
To further improve the quality of the correlation techniques, several researchers have proposed
to integrate additional information into the process 2 . In [22], the researchers discern that the
wealth of information available to the security analyst may have the potential to contribute in
detecting incidents and gaining confidence in the credibility of incidents’ alarms. They propose
a framework where alerts are combined with vulnerabilities, target topology and ranking alerts
based on the interest of the asset owner. This is similar to a technique known as vulnerability
correlation [23], where data from vulnerability scanners are compared to the observed alert.
In [24], the researchers propose to integrate system monitoring or vulnerability scanning tools
in order to increase the confidence in alerts. There is also a technique called susceptibility cor-
1 NetFlow is a Cisco developed network protocol to report IP traffic data, which is the de-facto-standard for monitoring
relation [25], where the probability of an asset’s exposure is calculated by using all available
information about that asset, such as what services are running, what ports open, and the type
of OS used on the machine.
In study [13], the researchers correlate log data based on traces after the Yara-virus and
complement the correlation process by using IDS. Their idea is that correlating heterogeneous
logs while simultaneously enabling IDS to identify attacks makes it possible to reduce the number
of false positives and validating whether an attack has taken place. A shortcoming with their
study is that they study only a single virus, which is known to leave distinct log traces and which
uses methods easily detectable by an IDS.
In study [26], the researches discuss the combination of system logs and IDS alerts from a
reverse perspective. First, alerts from IDS’ are correlated, and then system events are integrated
into the process. They state, as with [13], that the support of more detailed and precise infor-
mation from event logs enable alert correlation from IDS to achieve higher accuracy. The system
information being integrated in this case is known as OS-level dependency tracking, which is a
method to track process forks 3 and file operations from event logs based on specific objects. The
conclusion of the study is that the discussed integration greatly improves the correctness of the
correlation process and in making hypotheses about possible missed attacks.
A similar study to [26] is conducted by [27] based on OS-level dependency tracking with
IDS alert correlation using an event-processing engine called Coral8. The researchers make two
statements when describing their approach. First, most attacks have operations on specific OS-
level objects. Secondly, if an attack prepares for another attack, the later attack’s corresponding
operations would be dependent on the earlier ones. The study concludes that the technique can
significantly reduce false correlations.
According to [28], any practical solution for discovering physical IP topology needs to deal
with three fundamental difficulties: limited local information) Because of the difficulties of in-
ferring a device’s physical neighbors (layer-2 devices in particular), an algorithm should make
minimal assumptions and utilize information stored locally; transparency of elements across
protocol layers) Because layer-2 devices are completely transparent to layer-3 routers directing
traffic between subnets, the algorithm should establish interconnections between network ele-
ments operating at different layers of the OSI model; and heterogeneity of network elements)
Because a network is often comprised of different vendors, the algorithm should be able to gather
topology information correctly from heterogeneous sources.
Another difficulty in this respect is determining the types of devices that packets are flow-
ing through in the topology. Albeit existing techniques to gain information about hosts can be
used (e.g. reconnaissance techniques performed by scanning tools), these techniques encounter
a problem similar to the techniques utilized to perform topology discovery – increasing the load
on the network and hosts when generating probing traffic. Although we will not be addressing
issues related to discovering and building a network topology in this study, we will look into
the aspect of determining host types when the topology is known by using log data. For a more
comprehensive overview of current network discovery techniques and their limitations, look at
survey [5].
2.2.1 Establishing Network Flow
In any large-scale network, event and alarm-producing systems are distributed across the entire
network, comprising some (and possibly all) of the computing and infrastructure systems in
the network [30]. These systems produce large amounts of information that easily overwhelm
a security analyst as well as log management systems. As stated, two major problems having
ramifications for event correlation are related to the volume of data and the lack of standardized
log formats [16].
In terms of event sources, we can divide them into three categories: 1) Those events that
influence how packets are traversed along the topology (i.e. generated by devices that manipu-
late the layer-2 and layer-3 content of network packets); 2) Those events that combined reflect
the attack process; and 3) those events that provide context and impact analysis of the attack.
Another view of event sources is the classification of them into distinct type of domains [1] as de-
picted in Figure 2. In answering our research question, we primarily need to look at the network
domain of event sources.
In terms of log formats and network flow technologies, Cisco’s NetFlow is the de-facto stan-
dard used by network manufacturers today. The newest version 9 was published in 2004 [31]
and may be used in equipment such as switches, routers and firewalls. A flow in Cisco’s NetFlow
is structured into a seven tuples format: source IP, destination IP, source port, destination port, IP
protocol, ingress interface and IP type of service. NetFlow is becoming superseded by a new for-
mat called IP Flow Information Export (IPFIX) [32]. Whereas NetFlow was developed by Cisco,
IPFIX is developed by the Internet Engineering Task Force (IETF) and is expected to become the
new de-facto standard. Both NetFlow and IPFIX consider a flow to be a set of packets being sent
through a device within a specific timeslot that shares a number of characteristics.
Flows are maintained in the cache of the monitoring device, and each set of characteristics
not seen earlier, are inserted into this cache. Each entry in the cache is given an unique flow ID.
Flows are considered complete, and the flow entry is flushed from cache (or exported, which is
the terminology used by RFC standards), when one of the following criteria has been met [31]:
1. TCP flags indicate a completed flow (FIN or RST).
2. X seconds after the last packet has been seen, matching a specific flow ID. Time is config-
3. X minutes after the flow has been created. This is to avoid staleness. Time is configurable.
4. When the exporter encounter internal constraints, such as when the memory is full or when
counters wrap around, causing the flow cache to rotate.
Flows are said to play a vital role in network security to detect DoS attacks, network-propagating
worms, and other undesirable network events [33, 34]. There exist several commercial products,
and Free and Open-Source Software (FOSS) that revolve around Cisco NetFlow, such as Cisco
CS-Mars, IBM Aurora, NetQoS, Arbor Networks. There also exist alternatives to Cisco’s NetFlow
such as HP’s sFlow or Juniper’s jFlow. The latter two are, however, flow sampling technologies
that by specification are problematic when dealing with the research question raised in this the-
sis. Sampling is described as follows in RFC 3917 [35]:
Sampling describes the systematic or random selection of a subset of elements (the sample)
out of a set of elements (the parent population). Usually the purpose of applying sampling
techniques is to estimate a parameter of the parent population by using only the elements of
the subset. Sampling techniques can be applied for instance to select a subset of packets out of
all packets of a flow or to select a subset of flows out of all flows on a link.
This chapter reviews current research on attack taxonomies, and applies the Igure and Williams
taxonomy [3] with particular emphasis on traceability and detection capabilities.
Conducting security reviews and assessments: An attack taxonomy may be beneficial in the
process of reviewing security posture and assessing relevant data.
Measuring detection capabilities: An attack taxonomy may contribute in estimating a secu-
rity controls’ detection capabilities and in determining strong and weak characteristics of such
Improving detection capabilities: An attack taxonomy may be necessary to identify which
attack classes should be tested against a security control and to understand which character-
istics of the attack a security control needs to be improved on.
Evaluating the impact of an attack on a system or service: An attack taxonomy may enable
an organization to estimate risk and probability of a particular type of attack class and prior-
itizing resources accordingly.
Avoiding common design flaws in product development: An attack taxonomy may enable
software developers to enhance their effort on areas where a product should be particularly
strong and robust.
Attack taxonomies are used in this study to understand what types of attacks are most relevant
to the research, and in selecting wide and non-related attack types. This further allows us to
gain better understanding of what types of components the network should consist of, which is
driven by the components relevance in a particular attack class. This helps us determine what
detection capabilities are needed, what log sources may be relevant to investigate attacks and
what signatures are needed to detect them.
Correlating IDS alerts with system logs by means of a network-
Figure 3: Hierarchical level of features found in attacks [2].
In terms of IDS/IPS 1 testing, the testing platforms rarely conform to the entire set of char-
acteristics discussed earlier. The KDD cup 99, which had been used extensively in the academia
for this purpose, is one example of this. The KDD cup 99 divides 4.900.000 single connection
vectors where each contains 41 features, is labeled as either normal or attack with exactly one
specific attack type [40]. The attacks fall into either of the following categories: DoS, User to
Root (U2R), Remote to local (R2L) or Probing attack. There are several limitations behind using
this data set in IDS/IPS testing [40, 41], and the use of it in the network intrusion detection
domain has been highly discouraged [42].
The perhaps most comprehensive testing platform today is the one used by NSSlabs. It is well-
acknowledged as it includes a large set of attacks. Their tests in 2010, using their methodology
v6.0 for instance [43, 44], include 1179 exploits composed of several different classifications. The
tests do not argue for the choice of classifications, but are nevertheless used as a standard mean
for comparing IDS/IPS’, which includes the following categories: Threat vectors (who initiated
the attack), Target type (e.g. Web-server, JavaScript), coverage by result (e.g. code injection,
buffer overflow), coverage by vendor (e.g. Adobe), types of fragmentation (e.g. packet, stream),
types of obfuscation (e.g. URL, HTML) and evasion-techniques (e.g. FTP). The tests also include
performance measurements and claims to be using real-world simulated traffic.
A weakness with such industry-based testing platforms, is that they do not measure detection
capabilities or accuracy in evaluating events, but whether a particular exploit was detected or not
(i.e. the hit rate). An IDS vendor with significant amount of resources to develop signatures, or
an IDS with a more aggressive default enabled signature-set, will be beneficial in such tests. An
IDS focusing on a particular service or OSI-layer may be superior on its area, but weaker overall.
If the attacks were less generic, such as targeting specific features of an attack or branches based
on a taxonomy hierarchy, the test would be better suited to measure a security controls’ detection
capabilities as stated initially.
Level 1 - attack impact: The immediate impact of an attack in terms of the basic security prop-
erty it violates. Immediate means in this case the first security property being violated.
Level 2 - system specific attack types: The classes of impact from an attack that fall under a
1 IPS is often considered the equivalent of an IDS with added blocking capabilities.
A challenge of using this taxonomy is deciding how granular one should be for each attack.
By using a top-down approach one could narrow down to the very core of an attack, or one
could stop at a higher level that makes more sense when testing completeness. This ensures that
one selects attacks from a wide perspective that are more suitable for making generalizations.
A potential problem with this kind of taxonomy is that complete attacks may not be isolated.
Attack vectors often have dependencies and run sequentially, which means that a single attack
may fall into multiple categories thus complicating the goal of generalizing.
A problem with taxonomies based on basic security properties as discussed by [45], is that it is
not obvious what should be considered the immediate result of an exploit. The authors exemplify
this with a password-guessing attempt. A password-guessing attempt may reveal the password of
the user account, but the intentions behind the guessing of such an account would be up to the
attacker. The attacker could continue to breach any of the three basic security properties, thus be
the primary objective behind the actions. However, as the first consequence is that the attacker
learns the password, the attack is considered a breach of confidentiality.
A similar pragmatic example is also given in the same study [45]. The researchers explain
that by planting a trojan horse on an infected computer, the attacker has succeeded in executing
unanticipated processes. The intensions behind planting such a backdoor may be to violate any
of the three basic security properties. Using a simplified approach again, the trojan has primarily
inserted data which may be considered an integrity breach. The attack vectors used to plant the
trojan, which is concealed from the user, is a result of deliberately tricking the user into installing
what is believed to be a trustworthy application.
Another point that the taxonomy does not discuss is the origin of the attack or exploit. The
attacker may target a specific host to attempt to breach any of the three security properties, or the
client may accidentally perform actions that may be considered a successful attack without the
attacker knowing. The latter may be explained by the client becoming infected or attacked based
on its behavior. Visiting a malicious Website for instance, may lead to the client being infected
without the attacker directly targeting that client in particular. Regardless of who originates the
attack, the intentions behind the attack may just as well be the same.
The taxonomy discussed here will be used in this thesis, and is depicted in Figure 4. The
Figure shows the first and second layer mentioned earlier, which is considered adequate depth
for the case study. We will discuss all of the layers in the sections that follow.
To adjust to the problems mentioned earlier, we will extend the use of the basic security prin-
ciples as suggested by Meadows [46]. First and foremost, the use of the word authorization may
be considered approval. Approval is the acceptance that some action may be performed within
the boundaries of that approval. The term confidentiality will not only include unauthorized ac-
Figure 4: Attacks classified by intent based on the Igure and Williams taxonomy [3].
cess but also unauthorized use of a system. This may involve reconnaissance with the intent to
gather information that is not considered public information. The term integrity will denote the
change of a system without the authorization to do so. Approval may in this case be considered
the act of a user deliberately installing an application with clear intentions. Breach of availability
is the intent to influence a service in such a way that the quality of that service is significantly
reduced or access to the service itself denied.
3.2.1 Confidentiality
The goal of confidentiality is to protect information from being disclosed or revealed to entities
not authorized to have that information [47]. Protection is ensured by using security controls
that authorizes access to data according to some criteria, and/or isolates the service from the
data source in such a way that confidentiality may not be breached through the service alone. An
attacker’s goal may then be accomplished by direct or indirect means. Manipulating databases
or encodings to retrieve data are examples of direct attacks, gaining shell access or increasing
privileges through the service to perform additional attacks is a secondary type of attack. This
thesis is only concerned with direct attacks, as the attacker may perform any type of attack once
security has been breached [3].
Confidentiality has in this thesis three level two branches. Disclose is about gaining access
to information being protected by security controls, by circumventing it in such a way that its
integrity has not been breached. Typical attacks may be SQL-injections, directory traversals, ma-
nipulating encodings, semantic URL attacks and replay attacks. This branch of attacks is primarily
on the application level and may require application-aware signatures.
The second branch, profiling (i.e. reconnaissance), is concerned with the act of collecting
information about a target before attacking it. Profiling is often the legitimate use of a service
with the sole purpose of collecting as much information about that service as possible. The
actions do not need to be towards a particular service, it may also include port-scans and sweeps,
vulnerability scanning and more. Profiling causes no harm to the systems 2 , but often generates
a lot of noise that may be detected by an IDS.
The third branch, inference, is concerned with the legitimate use of services to extract data
that combined may form a new meaning [48]. For instance, imagine a database allowing SQL-
views to get the average sum of salary per department but not per employee. If it was possible
to create arbitrary queries with a reduced set of people, one could over a set of queries be able
2 Unless discussing SCADA or PLC-systems, which are not considered in this study.
to extract information about a particular employee. Another class of attacks we define into this
branch is brute-force. A brute-force attack is about repeatedly trying various combinations to find
the correct key. The inference class may be very noisy due to it’s nature of repeated attempts.
3.2.2 Integrity
Integrity refers to the trustworthiness of data or resources, and is usually phrased in terms of
preventing improper or unauthorized change [48]. Integrity is verified by creating mathematical
hashes of data and comparing them before and after an event has taken place. This makes sense
when transferring data over a network to ensure that data is complete and not altered, but
is impossible to impose in services where data is expected to change and the change cannot
be predicted. An attacker’s goal is then to alter data in such services either by circumventing
some security control, or claiming an authorized identity, which is allowed to perform direct
We have divided integrity into three branches in this thesis. Manipulating refers to the actual
change of data on the system. This may include services, file systems, registry or even directly
into memory. Manipulating attacks may refer to legitimate use of a service in a non-authorized
way, or it may refer to attacks that change the processes and action-flows in such a way that
unexpected outcome occurs. Example of the latter category are buffer overflows and in general
the use of exploits that takes advantage of improper input validation. Manipulating attacks in
terms of buffer overflows are typically seen as strings of repeated characters, which need to be
provided in order to hit the return address on a stack.
The second branch is about destroying, removing, scrambling or even hiding data in such
a way that it is not present when expected. As with manipulation attacks, it may involve data
located on the application level or directly on the storage medium itself. Examples of attacks
in this branch may be typical database queries such as drop tables, or it may be direct shell
commands performing deletion commands.
The third branch is about insertion attacks. It is not an attack in the typical sense, but is
about inserting data into a data-storage with the purpose of either degrading the overall value
of other data or adding unauthorized scripts and programs into the execution flow. A trojan is
an example of the latter as the program code is inserted into the system, without approval, and
which circumvents the human "security control" by hiding in, or masquerading as an authorized
3.2.3 Availability
Availability refers to the ability to use information or resources as desired [48]. Availability is
basically both about being able to access a particular resource and being able to use it in the
way it is intended. Large deviations in performance may lead to breach of availability. Security
controls protecting availability are often not primarily about security aspects, but are service-
management issues in the sense that traffic needs to be sent where expected, and the necessary
resources need to be allocated for each connection. Breach of availability may often come from
legitimate use of a service where the allocated resources do not meet the popularity (i.e. amount
of connection attempts) of that service.
We have divided availability attacks into two branches, which follow the categorization for
DDoS attacks in particular [49]. Degrade attacks are about consuming some portion of the con-
sumers’ resource significantly, thus seriously degrading the service to legitimate users [49]. The
goal for the attacker could be for instance, to cause the victim to loose some percentage of its
customers due to them not getting access, or that the victim is falsely believed that additional ex-
pensive investment in resources is required to cope with the perceived low performance. Degrad-
ing attacks may target the full spectrum of the OSI-model [50]. DoS of applications in particular,
is about tying the resources in terms of CPU-cycles or memory allocation to false requests. These
requests may be large (i.e. network-intensive) queries for data, or requests that target system
components requiring intensive processing.
The second branch is about disrupting the service in such a way that the service is denied to a
majority of legitimate users [49]. As with degradation attacks, this may target the full spectrum
of the OSI-model. Attacks that are disruptive are rarely as sophisticated as degrade attacks as
they are about brute-force and flooding the victim with data that cannot be processed within
reasonable time. Detecting attacks that are disruptive is then simple because they create a lot
of noise on the network, which has significantly higher bandwidth utilization than the normal
baseline. The challenge with DDoS attacks is stopping the large amounts of seemingly legitimate
traffic from many sources, but this is not discussed in this thesis.
This chapter discusses the underlying principles for the proposed network event correlation
model, and describes the model in a multi-step approach from alert detection to impact anal-
• The cost may be influenced by several criteria (e.g. bandwidth, jitter, latency, QoS).
• The cost between vertices changes according to external strategic factors (e.g. preferred
route, load balancing 2 ).
• The complete graph is not always known (e.g. crossing network domains, continuous recon-
figuration and updates).
• There may co-exist multiple algorithms for calculating optimal path in the same network.
• The graph may expand or contract ad-hoc (e.g. link failure, equipment replacement, mainte-
Within a single network domain (i.e. intra-domain), Interior Gateway Protocols (IGP) are used,
such as: Interior Gateway Routing Protocol (IGRP), Enhanced IGRP (EIGRP), Open Shortest
Path First (OSPF), Routing Information Protocol (RIP) and Intermediate System to Intermediate
System (IS-IS).
All of the mentioned protocols use algorithms that consider a different set of network char-
acteristics in their calculations, which makes them difficult to compare. One protocol may for
instance be less computationally complex and have smaller message overhead, making it more
suitable in smaller environments. Interior Gateway Protocols are divided into two classifications.
Link-state protocols, such as OSPF and IS-IS, often uses the Dijkstra algorithm to calculate paths,
where each node constructs a complete map of the network and calculate paths independently.
Distance-vector protocols, such as RIP and IGRP, use the Bellman-Ford algorithm, where each
node informs its neighbors periodically, in addition to when changes occur on the network.
Routing protocols have to relate to cost in the sense that cost is a vector of parameters where
each may be weighted differently. EIGRP for instance, considers the following six tuples: band-
width, path load, path delay, reliability, Maximum Transmission Unit (MTU) and hop count.
Other more lightweight protocols such as RIP, relate to cost as a single value, the hop count
between path and destination. A network using a distance-vector may then be simpler to deal
with as it is more predictable in a network with less topology changes, but it may be problematic
when used in large-scale networks [53].
Routing protocols are relevant in our thesis because it is the calculation that determines how
packets are routed throughout a network. While the goal in the shortest path problem is to
determine the optimal route between two vertices, our goal is to identify what exact route was
used and what devices mediated along that path. Furthermore, traffic passes through barriers
that treat packets differently, according to the barriers’ decision criteria and rules. It is necessary
to collect decisions through log data to determine what actions were taken on those packets and
to identify relevant logs that enable the correlation model to raise educated and precise alerts.
Protection domains are influenced by level-2 and level-3 of the OSI-model, shown as a logical
representation of how a network is isolated in terms of user-interactions and resource-access.
Protection domains may be viewed as access to information from an attacker’s perspective as
shown in Table 1 [54] 3 .
The table shows the type of information that an attacker can access, or learn, depending
on the attacker’s location in relation to a given network infrastructure. It shows, for instance,
that if an attacker only has a remote computer, that attacker can only access some information
about the network infrastructure in most situations. If the attacker however, has access to a
device located within the network infrastructure, that attacker would be able to directly access
other resources on that network. The distinction of access to resources, depending on how much
access the attacker has, is important as there are guards protecting the access to resources, which
influences the correlation process and impact evaluation, based on intrusion alerts.
A modern network infrastructure often follows a hierarchical solution, or segmentation prac-
tice, as depicted in Figure 5 [48, 55]. The protection barriers or guards discussed until now are
synonymous with the protection walls seen in the figure.
Figure 5 consists of four protection domains, which are located in front, behind or between
the protection barriers. The first protection domain is located in front of the outer protection
barrier, which is basically the representation of the Internet. Exterior Gateway Protocols (EGP),
which is not considered in this study, resides here. The distinction is made because of ingress-
and egress-filtering performed by each barrier. Behind the outer protection barrier, three intra-
domains may be found where the aforementioned IGPs reside. One protection domain between
the outer, inner and demilitarized zone (DMZ) 4 protection barrier, another behind the DMZ
protection barrier, and the last behind the inner protection barrier. The latter domain most often
consists of numerous additional domains, which are not represented here as it changes signifi-
cantly between organizations [48].
The figure shows a typical hierarchical model (c.f. Table 1), where the inner protection do-
main inherits the weakest protection barrier that allows traffic to be reached from the previous
protection domain 5 . This granular model is beneficial as it requires the attacker to breach multi-
ple barriers before reaching the tradionally most sacred data owned by the organization is found.
3 NI is the abbreviation for Network Infrastructure.
4A DMZ is a subnetwork within an organization that is considered less trustworthy than internal networks. Resources
exposed to external networks normally reside here.
5 For ingress traffic, and vice versa for egress traffic.
A linear model ensures that the first guard blocks network-traffic from entering the next domain,
and then the following guard blocks traffic from entering the next and so on.
The linear model is as stated, beneficial because it supports a multi-security scheme (i.e.
defense-in-depth). It is strong for protecting the organization, but may be less supportive when
investigating incidents such as when an internal host has been compromised. Imagine having
an IDS in front of the outer protection barrier and assuming that an attacker successfully com-
promised an internal resource located on the DMZ network. Investigating the incident requires
the victim be identified, that some state information from it can be obtained, and sometimes
determining whether the attack was blocked or not by the DMZ protection gate. An IDS with no
knowledge of the internal network may not evaluate the impact alone, and even multiple IDS’
located in different protection domains where alert-correlation takes place, may have challenges
on its own [56, 57].
which primarily are routing and switching equipment using routing protocols and technology as
discussed in Section 2.2. This equipment generates traffic flow data, which is sent to a central
collector on the network, being the SIEM solution. The traffic flow data is assembled to determine
the exact path that traffic has taken between the gates, and traffic to be mapped when source-
and destination addresses change. Assembling traffic flow data is asserted to be critical in order to
understand what logs are relevant for evaluating incidents, and in understanding what decisions
were made on the different protection gates that analyzed the traffic.
The protection domains in the model are evaluated independently as the domains have no
direct relationship. Traffic routed within one domain does not affect the routing in the next
domain and is assembled by direction (i.e. ingress or egress). Packet content (i.e. payload), even
if encrypted, is irrelevant as only the headers from OSI-layers discussed earlier are needed 6 . The
purpose of assembling network traffic in the domain is not to find out what is sent, but rather to
determine where it was sent, how it was sent and whether some part of the transport-layers was
changed during transmission.
The protection gate inspects the packets and determines whether the packets are allowed into
the next protection domain or not. The decision made by a protection gate constitutes the most
important evaluation to whether an incident should be followed up or not. If the gate blocked
the traffic that raised the intrusion alert, the alert is considered a low priority, or even discarded.
If the gate allowed the traffic that raised an intrusion alert, the alert needs to be investigated
further according to some priority (e.g. probability of success, impact of attack, or even formal
risk-models). Note that investigating further may mean that the traffic entered a new protection
domain, and that the process needs to be repeated within that domain.
The types of log data used in this model may be classified into four distinct categories. Each
of these categories is used in various steps of the model. The numbers denote the classes:
1. Detection logs. Generated at each detection control located in front of or behind a protection
gate. Approach used in this study is an IDS, but may also include controls such as proxies,
gateways, firewalls or Data Loss Prevention (DLP) systems.
2. Network traffic logs. Generated by all equipment within each protection domain. Typical
examples are switches and routers, and even firewalls.
3. Protection gate logs. Generated on the border of each protection domain. Refers to both
ingress- and egress-filtering decisions made by firewalls, proxies, gateways, DLP or others.
4. Device logs. Generated at each device being targeted by the attack. Includes resources such
as user-clients and servers. Basically the final destination of the packet.
The logs are categorized into different groups based on a session-oriented approach, includ-
ing devices as depicted in Figure 6. All logs are then assigned to a group being relevant for a
particular session, which includes network traffic logs, protection gate and detection logs. When
analysing a particular event, the specific session is investigated in order to determine whether
6 Encrypted packets would cause problems particularly in two situations. First, the IDS would be unable to inspect
the packets content to detect attacks. Secondly, data may be tunneled so that what is sent to a destination is re-sent to
a different host and thus escaping detection mechanisms (and even circumventing the impact analysis). A general best-
practice is however, that encrypted traffic is either denied, or inspected through the use of SSL-termination equipment.
the session was allowed through the gates, and to determine the destination device. When the
destination device has been determined, and the session logs show that the session was allowed
through all gates, the investigation process can begin.
The goal of the investigation process is to determine what impact the session had on the vic-
tim, which is done by correlating logs obtained from the device with the raised intrusion alert.
The outcome of this investigation is the re-evaluation of the severity of the incident. For instance,
if the intrusion detection system yields that the intrusion-severity is high, the investigation pro-
cess based on logs may determine that the actual severity is much lower, and even adding the
probability of success is low. Each investigation may then be a risk evaluation by itself, which is
a more precise and all-encompassing evaluation than the individual intrusion detection system
can provide. The evaluation process may be an impact analysis as studied by [20] or any other
research technique discussed in Section 2.1.3.
The model is reactive in nature as it deals with traffic that has taken place and analyses alerts
in a larger context for the security analyst. A potential issue with this is that an alert may be
raised on sessions that have not completed, which could be the case when large amounts of
data are transferred in a single session, or when a session does not terminate properly (a time-
out will occur). Despite that sessions may not be completed in some situations, they may be
complete enough to be analyzed in the context the intrusion alert was raised in.
4.3.2 Step 2: Tracking Flow Direction
The second step in the model determines the aggregated packet flow (i.e. sessions) path through-
out the network domain. It does so by taking all flow events and assembling these to determine
path from the source (i.e. from a protection gate) to the destination (i.e. next protection gate,
server, host). Each piece of network equipment will generate flow events, and allow these to be
combined and collected. The flow tuples source and destination IP, and source and destination
port are combined for this (c.f. Section 2.2.1 for NetFlow tuples). The latter tuple will only be
present when the traffic concerns TCP-sessions. Figure 8 illustrates the area of focus for step 2.
When determining the path for a session, we may encounter situations where the session
is fragmented into multiple packets. Each of these fragments may take different routes, and
normally be re-assembled at some central locations (e.g. intermediate network equipment) 7 . In
7 Some protection gates and other types of network equipment, require the complete packet to be ressambled in order
case the packets are re-assembled after a protection gate, the model needs to interpret how each
of these packets was treated at the gate in step 3.
This step is repetitive for each protection domain identified to be a part of the traced session,
and will only correlate logs that are related within that particular domain.
4.3.3 Step 3: Evaluating Protection Gate Decisions
The third step of the model will evaluate the decision made by the protection gate on the par-
ticular packet (or session) passing through it. The gate’s decision influences whether the threat
may materialize (i.e. the alert may impact) or if it is blocked before reaching its target. This is
illustrated in Figure 9
The protection gate may take many different actions on the packet, and generate log entries
on its decisions. The actual evaluation process made by the protection gate is not of particular
interest in the model as we are only interested in its decision in binary form – was it authorized
or not? This requires the model to have some understanding of the actual type of device the
gate is, and the ability to interpret log data collected from it. This is not straightforward for
to make a decision on how to proceed.
heterogeneous networks, which encompass numerous different vendors, models, systems and
versions, but should not be a complicating factor once the log data are supported (i.e. structure
Another important area the model needs to account for is whether the gate modifies the
layer-3 address of the packet. This influences the subsequent correlation as for instance the
IP-address would have no relationship to the IP-address, unless the two
addresses are mapped. This address-translation is performed by the gates (and sometimes routers
and switches) that support Network Address Translation (NAT), and is common practice from
external to internal network and vice versa. Once the binary answer about the packet has been
determined, it needs to determine the outgoing address of the packet. Intuitively, if a packet pass
through a gate with the same header and/or identifiers, the packet was allowed through by the
protection gate.
The decision made by protection gates is evaluated in terms of ingress- or egress-filtering. If
traffic is blocked ingress, it may not need to be addressed by the incident handler as the client was
neither compromised nor put at risk. If it is egress, the traffic needs to be addressed as the cause
may be a deliberate policy breach, attempt to circumvent policy restrictions, or a compromised
client sending call-back traffic 8 . This is an important distinction in the model because egress-
filtering has higher severity when it blocks. The model could then correlate device logs based on
the gate decisions made to block the session. This is however, in contrast with our idea of having
a model that only needs to understand the binary decision.
4.3.4 Step 4: Estimating Intrusion Impact
The final step in the model is related to the actual impact evaluation of an intrusion alert. The
intrusion alert is related to some part of the targeted host, and the model should first understand
what parts of the host are exposed and what logs are relevant. This correlation process requires
some understanding of the intrusion alerts. This step is illustrated in Figure 10.
The impact evaluation attempts to determine whether the host was compromised, and if its
state were in any way altered because of the event that triggered the intrusion alert. Depending
8 Call-back refers to malware that has succeeded in infecting a host, attempting to send information to its owner (c.f.
backdoor, control-center).
on the outcome, the alert is prioritized accordingly. The model is not concerned about the severity
level necessarily, but rather about determining whether those alerts constitute a real threat and in
enabling security personnel to better prioritize which alerts to investigate. The model is neither
concerned whether an intrusion alert is a false positive. As far as the model is concerned, all alerts
are real. Its primary objective is to determine whether the alerts constitute an actual threat, and
to what degree these require follow-up.
The alerts will be based on the severity level presented by the intrusion detection system,
as the model does not relate to the actual context the alerts were raised in. What is important,
however, is to classify alerts based on their probability of success. The model will operate with
the following categories:
True: The host was negatively affected by the traffic alerted by the intrusion detection system.
False: The host was not negatively affected by the traffic alerted by the intrusion detection
Unknown: It cannot be verified or refuted that the traffic affected the host.
With these categories in mind, the incident handler would deal with alerts first located in the
true category, then the unknown. The false category would not be considered, as these alerts
have no impact on the target. The categories fall into a two-dimensional queue, where alerts are
sorted by their respective severity level. The severity level is based on the evaluation made by
the intrusion detection system, in combination with the evaluation process, as stated.
A potential problem is that multiple alerts being related to the same attack fall under different
categories. As mentioned earlier, alert-correlation should also be considered to reduce the chance
for ambiguity and avoid separate alerts that are related to the same incident. Having alerts
related to the same attack grouped together is also beneficial as it increases the confidence level
that the alerts are handled correctly.
5 Experiment
The first part of this chapter reviews the network topology that the experiment is based on and
discusses various aspects of it such as composition, traffic flow, policy decisions, followed by a
discussion about how to interpret data. The last part of the chapter is the actual experiment,
where data is presented and observations discussed.
(host or not (src net 192.168.1 and dst net 192.168.1)) and not
Our second problem was that we observed our physical NICs transmitting duplicating traffic.
It turned out that it was a known issue that existed for virtualized hosts configured as bridged
mode in VMware [60]. There were several proposals on how to deal with the issue on Windows,
but none applied to our environment, and there was no fix available from VMware. Bridge mode
1 This is not entirely true as ver. 9 introduced the concept of templates and field type definitions. Our Cisco router did
not support this however, due to old Cisco Internetwork Operating System (IOS).
Figure 11: A link-layer overview of the experiment’s network composition.
was used by the protection gates located on the segment, as they needed to com-
municate with each other through the router and with the virtualized hosts configured as host-
only. We were able to identify where this occured (NICs connected to the purple domain), and
addressed the issue by halving some of the flow tuples in the algorithm (Packets, Bytes and
Red: Completely open policy, constituting the most untrustworthy domain where the organi-
zation has neither insight nor control of what takes place. The protection gate in front of
the red domain is primarily for blocking "Internet noise", non-targeted attacks, and general
reconnaissance/probing traffic. It is expected that some traffic originating from the Internet
are allowed through, as the organization is hosting services located on the internal protection
domain being public accessible.
Purple: Is the "roundabout" of the organization. All traffic entering this domain is routed to
the other respective domains (with the exception of the log server). The Cisco router ensures
that external hosts cannot access internal resources without circumventing its access control
and ensures that internal traffic is properly routed.
Orange: Enforces a semi-open policy, where only services located within that domain can be
accessed. Only traffic going to and from these services is allowed through the gate. The
domain constitutes a DMZ-zone where services cannot be fully trusted.
Green: The most restrictive policy of the domains, where all the internal clients are located and
valuable information stored. It cannot be accessed directly from the external domain without
the clients first initiating the communication channel 2 , and where hosts on the inside has
been configured to always send traffic through the proxy located on the DMZ. This is done
through an OS-configuration and may easily be circumvented.
Each of the domains has been hardened according to their policy, based on best practices for
firewall-configurations. The domains may be viewed as a linear model, where each domain down
the chain is more secure (i.e. restrictive) than the previous. The lower the chain, the less are the
resources in that domain allowed to do. In terms of the traffic flow, all outgoing traffic is allowed
for the following directions: 1) purple towards the red domain; 2) Orange towards the purple
domain; and 3) green through the purple and towards the orange domain, via proxy. Notice that
resources located in the green domain should not access the red domain without passing through
the orange domain (enforced through OS-configuration as stated). The orange domain contains
security controls such as a proxy-server (Squid ver. 3.0 stable 19) and antivirus-filter (ClamAV
ver. 0.97 with most recent signature base) for both Web- and SMTP-traffic.
Every protection domain is protected by a protection gate that performs Source NAT (SNAT).
SNAT refer to the process of masquerading IP-addresses behind a protection guard and replacing
IP-addresses with new addresses when passing through the protection guard, and vice versa.
A table of all the on-going connections is maintained in the guard, where port number is the
identifier for each session. This is typically used in large-scaled networks, which has the drawback
that one needs to acquire log data or lookup connections in the table to determine what was
accessed by a host in front of or behind the device performing SNAT.
5.1.2 Network Services
As stated, traffic originating from internal resources is only transmitted out of the network if their
connection attempts are authorized by some security control. Our default policy on the network
is to allow all traffic towards known services and hosts, unless the traffic pattern is considered a
violation of policy. In the orange protection domain, we have prepared 3 servers and 4 services
typically found in organisations.
Proxy: The proxy is the main gateway for primarily Web traffic through ports 80 and 443, but
also used for FTP-sessions and the like. It retrieves requests from hosts located internally and
externally, and evaluates each request according to a set of criteria. The proxy maintains a
state with the requestor, it denies traffic based on port or address, it caches requests, and it
performs antivirus scanning on all content that it mediates. The proxy uses port 3128 (Squid
default) for all requests.
Mail: The E-mail-server acts both as a Mail Transfer Agent (MTA), IMAP- and SMTP-server for
clients on the internal network. It can retrieve E-mails from the outside, and has enabled
spam- and phishing-control in addition to running antivirus-checks on every E-mail.
DNS: The Domain Name System (DNS) is basically an electronic phone-book for IP-to-name
mappings. For every request that the proxy receives containing names, a DNS-query is sent to
the DNS-server. The DNS-server replies with the translated IP-address (discussed later). The
DNS also plays a vital part of the security scheme as it is integrated with an IP-and-domain-
name reputation list. The DNS responds with either typical answers such as Canonical Name
(CNAME), A-records, and pointers to reverse IP-addresses (PTR), or it replies with a negative
answer if the queried name is present in the reputation list.
NTP: The Network Time Protocol (NTP)-server is used to ensure accurate time through the en-
tire organisation. All hosts on the network are configured to periodically synchronize with this
server. NTPs are particularly important in log analysis as correct time-stamp on log records is
necessary to investigate incidents properly, and placing events in the correct timeline.
The first three services maintain logs of each and every traffic-request being made. These log
sources are used in the correlation process and when investigating intrusions, and constitutes
in general important sources to evaluate whether an intrusion alert poses a real threat. The
services will be able to stop or influence traffic related to their field of expertise if their security
mechanisms consider this necessary.
and 2) The logs that determine how the receiving traffic is treated. Logs may then show whether
the client is at risk based on the particular IDS alert that triggered.
Our algorithm for evaluating IDS alerts is as follows:
1. Determine whether the traffic originates from an internal or external client.
2. If the traffic originates from an internal client, trace back to the source of the traffic.
- Investigate logs along that route.
- Evaluate logs based on the behavior that was observed.
3. If the client retrieves traffic from the outside, follow the traffic to its destination.
- Investigate logs along that route.
- Evaluate the impact of the traffic and any subsequent change in client’s behavior.
Each IDS alert, which constitutes an incident that needs to be looked into, contains minimal
information about what took place, and requires human intervention in most circumstances. An
IDS alert in Snort format is composed of the signature-header, which includes information such
as ID and message, and then what source- and network socket triggered on the traffic. We will
use this socket information in the algorithm to automate the trace and network traffic, as well as
narrowing the needed logs along the way as mentioned.
5 The reason for choosing this time duration is to make sure that as most data as possible is collected during the
5.2.1 Results
During the experiment, it was collected a total of 9465 log entries (excluding flows) throughout
the network. Most of these were related to the clients and the proxy server, which is expected
considering our topology and choice of log sources. Logs from the firewalls were only collected
when traffic was blocked, which turned out to be particularly much for the DoS-exposed guard.
Because our topology was configured with an open policy in terms of outgoing traffic, few of
these log entries were related to this. Table 2 outlines the number of collected logs, sorted by
In addition to the log entries collected from the internal network, we have the IDS located on
the external network. It raised 972 alerts during the experiment. This includes all alerts raised for
when malicious files were downloaded to the client, for when the infected machines attempted
to connect to the outside, and for the attempt to DoS an internal resource. The top 10 alarms,
accounting for 72% of the triggered alerts, are seen in Table 3.
Signature Numbers
1:15334384:5 233
1:2011938:3 106
3:15454:4 91
129:15:1 59
1:2010882:3 52
1:12592:3 50
1:17668:1 48
122:3:1 20
1:17543:1 19
1:17517:2 16
Total (all alerts) 972
The top three IDS alerts from Table 3 are shown below.
When alerts were raised, we correlated them with NetFlow and log data, and evaluated the
risk based on what had been raised. First, we used NetFlow to trace back the traffic source to
the client, and the traffic flowing towards the client. Each time the NetFlow passed through a
protection gate, we checked whether there were any blocking statements that described why it
was blocked. Once the entry and exit flow had been identified, we could evaluate what timing
information was relevant for the logs, based on what had been detected by our IDS. This process
was automated in an implementation of the algorithm, where each IDS alert provided the output
as shown in Table 4.
Based on NetFlow, we were able to determine whether traffic that had raised an IDS alert,
was successfully received by the client. The traffic flow further enabled us to establish a precise
timeline for the entire session, allowing us to narrow down the amount of log records for each
passing device to an absolute minimum. In terms of outgoing traffic, we acquired all the log
records on the client for a given threshold, up until the first flow was seen. For incoming traffic,
we could look into the log data beginning from the last NetFlow reaching the client, based on
the timestamp.
Results shows that of the 972 IDS alerts that triggered, all except the ones not based on
UDP/TCP, could be traced accurately back to the client and protection guards. 367 of the IDS
alerts could be mapped to log records that showed the sessions being blocked by the protection
guard. 334 of the IDS alerts could be traced back to security mechanisms such as the proxy and
antivirus-solutions that blocked the session. The high number of IDS alerts is caused by some
malware that resulted in multiple IDS alerts, where all could be traced back to the exactly same
proxy-entry due to identical network sockets.
################## ALERT START ##################
Time: May 3 21:17:45 1304450333.351: 05/03-21:18:53.301462
Date flow start Duration Proto Src IP Addr:Port Dst IP Addr:Port Packets Bytes Flows Location
<Client start>
2011-05-03 19:18:53.259 0.249 TCP -> 5 504 1
2011-05-03 19:18:53.259 0.249 TCP -> 5 504 1
2011-05-03 19:18:53.710 0.249 TCP -> 5 504 1
2011-05-03 19:18:53.711 0.249 TCP -> 5 504 1
2011-05-03 19:18:53.791 0.021 TCP -> 12 976 1
2011-05-03 19:18:53.791 0.026 TCP -> 12 976 1
2011-05-03 19:18:54.240 0.026 TCP -> 12 976 1
2011-05-03 19:18:54.240 0.026 TCP -> 12 976 1
2011-05-03 19:18:54.241 0.026 TCP -> 12 12360 1
(proxy): May 3 21:17:45 1304450265.297: pport:3128 cip: cport:1053 dip: stat:403
user:- reqply:963 resply:295 reqdata: GET
1.1 size:TCP_MISS offset:DIRECT type:text/html tr:115
################## ALERT END ##################
Figure 12: Session request mediated by the proxy on behalf of the client.
• By opening and closing fewer TCP connections, CPU time is saved in routers and hosts
(clients, servers, proxies, gateways, tunnels, or caches), and memory used for TCP protocol
control blocks can be saved in hosts.
• Network congestion is reduced by reducing the number of packets caused by TCP opens, and
by allowing TCP sufficient time to determine the congestion state of the network.
• Latency on subsequent requests is reduced since there is no time spent in TCP’s connection
opening handshake.
Whether a connection is persistent depends on how the end-points set up the connection. In
HTTP 1.0, the line "Connection: Keep-Alive" can be seen in the packet header. In HTTP 1.1, this is
no longer needed as it is enabled by default and must be explicitly turned off. As we were using
SNAT, persistent connections prohibited us from breaking down in- and outgoing-connections
and correlate them with precision to IDS alerts. We resolved the issue by forcing both the proxy
and the attacker’s service to use non-persistent connections.
We also encountered the problem of pipelining, which is the concept of allowing a client to
make multiple simultaneous requests without waiting for each response. Single TCP connections
are then used more efficiently, but it complicates the correlation process as a single session cannot
be separated in terms of flows. By disabling persistent connections, we also prohibited pipelining
from taking place.
5.2.4 Network Socket Reuse
The number of ports is limited to 16-bit, assigned to sessions and released when TCP connections
have completed their TIME WAIT state. Because of this, port numbers are bound to be repeated
over some time period. Windows in particular, incorporates a 120 seconds delay in their algo-
rithm to avoid confusion with duplicated TCP segments of the old connection [63]. With low-
traffic clients, which were the case in our experiment, it takes longer time than high-load hosts
to repeat the same TCP segments. Port numbers were our unique identifier as mentioned, and
considered an essential component to combine flows.
When running through our algorithm in real-time, we encountered no problems because
of the low traffic load. However, when running the algorithm on data that was gathered after
the experiment had ended, which is likely to be done on historical data, we observed several
conflicts between end devices. One IDS alert resulted in two different network flows where
two different clients were identified. The problem could be solved by looking at all the flows
in conflict, and determining the minimum amount of time between them. It turned out that
with our network traffic, 15 minutes was sufficient. The problem took place on the client with
IP-address and the proxy server in particular.
We attempted to DoS one of the servers on the DMZ, which was allowed by protection gate A
but blocked by protection gate B. This explains the high-number of log entries from this source
compared to the other sources. In NetFlow, we saw entries such as the following (c.f. Table 4 for
field description):
ICMP packets, which are encapsulated as part of the level-3 on the OSI stack, do not have any
port numbers (Introduced on level-4) 6 . Because we were unable to relate flows to IDS alerts,
we could not determine the risk and impact of the DoS attacks, and neither trace the traffic to
its source or destination. NetFlow would then be unsuitable for availability attacks as described
here, as IDS alerts could not be related to the flow. In NetFlow, flows that are port-less, are
denoted with a zero.
6 Discussion
This chapter contains a discussion about the experiment and our observations when conducting
it. We will focus on the issues we encountered and discuss these in relation to our research ques-
tions. We start by discussing the model and its challenges, followed by some sections discussing
various issues. The last section proposes a set of requirements that we argue a network-centric
SIEM solution should fulfill.
Prioritizing Alerts
Prioritizing alerts is the concept of presenting alerts, or an evaluated incident summary, to an
incident handler so that the alerts with highest criticality based on the organisations assets,
are addressed first. The algorithm used in the experiment did not have any sophisticated alert-
prioritization scheme, as the log correlation algorithm was not advanced enough. The experi-
ment showed that we could significantly reduce the amount alerts by being network centric, and
provide the incident handler with a view as needed to analyze incidents. A model as proposed,
should be extended to include a summary after the SIEM correlation process, where incidents
can be dealt with according to a stack-based approach. One could for instance, rank incidents
according to a numerical value where each value is the calculation of various attributes, such as:
Traffic direction: In which direction was the alert raised? Outgoing traffic has a higher weighted
value as the intrusion takes place behind the security perimeter (c.f. Table 1), and given that
security policies are traditionally less restrictive for egress traffic. Incoming traffic on the
other hand, are subject to Internet noise and have a higher false positive rate.
Protection gate action: What was the decision taken by the protection gates on the network
traffic? Traffic passing through a protection gate is more severe than blocked traffic.
Outcome correlation process: What was the result of the SIEM correlation process? An alert,
or cluster of alerts, should have precedence over other alerts with the same severity if the
alerts are verified to be real threats (i.e. higher confidence level). The outcome of correlation
phase does not necessarily need to be conclusive, it could also be a probability measurement
based on asset characteristics (e.g. OS, services, versions).
Severity rating: How severe is the attack considered? Attacks with higher severity should be
prioritized. The severity level should not be dictated by attack vector alone, but also based
on the criticality of services and hosts (i.e. integrating business assets).
Attack categorization: What type of attack was observed? A policy violation for instance,
should in general have lower attention than trojan-activity. Business goals may also dictate
that confidentiality attacks as discussed in Section 3.2.1 have precedence over availability
attacks (e.g. stock brokers vs. Internet banking).
the authors performed flow level traffic measurement to detect such attacks. One could consider
areas where the model included traffic flow measurement to track this type of traffic.
Coverage of attacks was also, in particular in our experiment, heavily dependent on the IDS’
ability to detect attacks or irregularities in the network traffic. If the IDS did not detect an attack,
the attack would bypass the model. To extend the attack coverage, the model should include
complementary correlation methods, such as provided by todays SIEM solutions. The connection
tracking could then be used afterwards to provide the necessary context of the detected intrusion.
Other types of intrusion detection sources can include network flow anomaly, blocked events by
protection gates, or application-level events such as viruses detected by an antivirus solution.
7 Future Work
This chapter contains proposals for future work, which is related to the proposed model, the
method used, and building a network SIEM solution in general.
• An important aspect that needs to be addressed, is the relationship between the alert intru-
sions raised by the central IDS and the actual log entries on the end clients. One approach
is to look at the actual mapping between IDS alerts and specific log entries. Another is to
understand what types of log entries are relevant for various types of attack classifications.
Section 2.1.2 discussed some work on the area.
• In terms of log sources, we need to acquire a better overview of how application logs influence
the model, and how application logs may be mapped to flows, or assigned identifiers that
allow dynamic mapping to take place. A particular problem we encountered in this respect,
was the lack of common identifiers between various levels of logs, making it difficult to relate
them. Proxies for instance, acting as mediators, use two independent flows per session which
cannot be related without identifiers.
• Other methods that may replace Cisco NetFlow ver. 5 should be reviewed in order to un-
derstand the characteristics of such methods in relation to the model (disregarding sampling
technologies as discussed in Section 2.2.1). IPFIX, aka NetFlow ver. 10, should be reviewed
in relation to the model as it provides functionality that may prove to be more suitable. The
NetFlow Secure Event Logging (NSEL)[2] used by Cisco ASA 5580, also looks promising as it
maintains state of flows between ingress and egress, and triggers on events that cause state
change in the flows (i.e. event-driven).
• The proposed model was applied with a signature-based IDS, allowing 1:1 mapping between
flows and IDS alerts to take place. It would be interesting to study the behavior of anomaly-
based IDS in relation to the model, where multiple flows are extracted from a single anomaly.
This may also include flow level traffic measurement to complement the signature-based
attack coverage.
• Our network environment did not include routing protocols as the network environment was
considered too small. It will be necessary to see how routing protocols and routing tables
affect the model, both in terms of dynamic and static layout. We would for instance assume
that static routing tables would replace the need for flows in those protection domains this
applies to.
• We discussed the problem of network traffic only consisting of OSI level-3 and below, and
how it was not possible to relate the traffic directly to the application or stream. We think
methods should be reviewed to relate low-layered connections with streams.
• There are also several topics that have not been considered in this study, and which deserve
attention. This includes load-balanced networks, IP-fragmented networks where fragments
are not reassembled by intermediate routers, and the correlation of flows where either cen-
tral data is missing (which may be the case when the amount of NetFlow overwhelms the
receiver [66]) or where load-balancing has separated the traffic.
8 Conclusions
This thesis is concerned with the study of a network-centric log correlation model, whose pur-
pose is to answer the research question Can we improve SIEM by making it network centric?. To
substantiate our research, we proposed a conceptual model and then studied that model in a
common network environment by using Cisco NetFlow. Our answer to the research question is
largely influenced by this chosen method, in addition to the network composition and performed
network attacks.
The proposed model has many common denominators with today’s practice for a segmented
network topology, and its sheer focus on the dynamic aspect of a network, supports the research
question, which was also observed in the experiment. By applying this model in a simulated real-
life environment, we were able to reduce the amount of log data to a more manageable size. We
believe that this may be beneficial both in terms of efficiency for a SIEM solution and the level of
work required by a security analyst to investigate a particular incident. Furthermore, it allowed
us to establish an accurate timeline in terms of network traffic for a given event, providing a
reliable trace of all network devices involved in the incident, as well as providing a context on
the incident itself. We were also able to significantly reduce the number of IDS alerts that were
raised, which consequently reduces the amount of alerts requiring human intervention. Research
concerning CIDS or alert correlation may reduce this even further [15, 56, 59].
Despite of the promising prospects on the model itself, observations showed that actual
method we used, Cisco NetFlow, may be a suboptimal approach for tracking network flow within
a security context as discussed in this thesis. The method had several drawbacks in relation to
a network-centric approach, which primarily is based on the characteristics of the protocol and
its applicability in a wide set of attack classifications. With the former, we refer to the method
for generating flows, variables that influences flows (e.g. timeout-values), and the number of
traffic identifiers constituting a flow (e.g. collisions). With attack classifications, we refer to the
issue that flows do not provide enough characteristics for attacks taking place on OSI layer-3 and
below, and that some protocols such as HTTP and FTP, support techniques such as persistency
and pipelining, making it difficult to precisely identify a particular attack-stream.
We also encountered transparency-issues between high-level logs and flows, which required
us to facilitate mapping between application-logs and network-logs by providing a shared key.
The inconsistency between log formats with different focus is expected and stipulates that the
model needs to consider alternative approaches, or facilitate, on those central hosts this concerns.
Proxies, DNS’ and antivirus-solutions are typical examples of this.
Conclusively, we believe that our work shows that it is possible to improve SIEM by making it
network centric in relation to our conceptual model, but that the applicability of Cisco NetFlow
ver. 5 in this matter, may be problematic. To move the field forward, we have proposed several
aspects of the model that requires attention as further work and presented a set of requirements
that the model needs to fulfill.
