FYP Report

Sara Elizabeth Bury Sentinel Aberrant Network Behaviour Indication and Analysis
BSc. Computer Science 22nd March 2007
1 I certify that the material contained in this dissertation is my own work, and does not contain signicant portions of unreferenced or unacknowledged material. I also warrant that the above statement applies to the implementation of the project, and all associated documentation. Date: 22nd March 2007 Signed:
Abstract The aim of this project is to explore the use of aberrance detection techniques for network monitoring in a large network environment. This an important area for research and development as todays networks are expected to function twenty four hours a day, seven days a week; something which is impossible to guarantee relying only on the vigilance and investigative skill of network operators. This project can be broken down into three main areas: research into current aberrant network detection methods and assessment of their suitability; eliciting the requirements of a large network operator, and the production of a prototype system to illustrate the advantages of an aberrance detection system within a network operations environment. The result would be a system which indicates instances of aberrant behaviour as they occur and provides further information for network operators to aid their workow and allow them to make an initial classication of the event.
Contents
1 Introduction 1.1 1.2 Project Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 1.3 ` DANTE and GEANT2 . . . . . . . . . . . . . . . . . . . . . . . .
7 7 7 8 10 11 11 14 14 15 15 16 17 17 18 18 21 21 22 23 24 25 26 27 30
Report Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Background and Related Work 2.1 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sources of Network Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 2.2.2 2.2.3 2.2.4 2.3 Measurement of Metrics . . . . . . . . . . . . . . . . . . . . . . . Individual Packet Capture . . . . . . . . . . . . . . . . . . . . . . SNMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Network Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Existing Network Monitoring Solutions . . . . . . . . . . . . . . . . . . . 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 2.3.6 2.3.7 2.3.8 TCPDump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Snort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RRDtool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cacti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flow-Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NfDump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NfSen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4
NfSen-HW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 2.4.2 Architecture and Organisation . . . . . . . . . . . . . . . . . . . . Holt-Winters Forecasting . . . . . . . . . . . . . . . . . . . . . . .
3 Design
CONTENTS 3.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 3.1.2 3.2 Network Operators Workow . . . . . . . . . . . . . . . . . . . . Requirements list . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 30 30 30 35 35 35 36 36 37 38 38 38 40 42 47 47 47 49 50 51 51 52 52 53 53 53 56 57
Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 3.2.2 3.2.3 3.2.4 NfSen-HW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MySQL and PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . Debian GNU/Linux . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3
System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 NfSen-HW and NfDump . . . . . . . . . . . . . . . . . . . . . . . runSentinel.sh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sentinel.jar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sentinel Database . . . . . . . . . . . . . . . . . . . . . . . . . . . Sentinel Web Interface . . . . . . . . . . . . . . . . . . . . . . . .
4 Implementation 4.1 4.2 4.3 4.4 Method of Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . NfSen-HW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . runSentinel.sh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sentinel.jar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 4.4.2 4.4.3 4.5 4.6 Implementation Overview . . . . . . . . . . . . . . . . . . . . . . Problems with XML Parsing . . . . . . . . . . . . . . . . . . . . . Database Insertion . . . . . . . . . . . . . . . . . . . . . . . . . .
Sentinel Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sentinel Web Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 4.6.2 4.6.3 Live Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 System Operation
CONTENTS 5.1 Usage Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 Examining Live Update for Aberrant Behaviour . . . . . . . . . . Filtering the results . . . . . . . . . . . . . . . . . . . . . . . . . . Viewing further Details . . . . . . . . . . . . . . . . . . . . . . . . Analysis and editing event details . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 57 57 58 58 60 60 62 62 62 65 70 71 71 77 78 78 79 80 81 82 83 92
6 Testing and evaluation 6.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 6.1.2 6.2 6.3 Defect and Component Testing . . . . . . . . . . . . . . . . . . . Functional and Integration Testing . . . . . . . . . . . . . . . . .
User Interface Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 6.3.2 Requirements List Review . . . . . . . . . . . . . . . . . . . . . . Summary and Feedback from DANTE . . . . . . . . . . . . . . .
7 Conclusion 7.1 7.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Acknowledgements B Project Proposal C JavaDoc D NfDump(1) Manpage E Holt-Winters Forecasting Examples
List of Figures
1.1 1.2 2.1 2.2 2.3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 4.1 4.2 4.3 4.4 5.1 5.2 5.3 5.4 5.5 5.6 6.1 ` GEANT2 Network Topology . . . . . . . . . . . . . . . . . . . . . . . . . ` GEANT2 Global Connectivity . . . . . . . . . . . . . . . . . . . . . . . . Section of an RRD exported to XML format . . . . . . . . . . . . . . . . NfSen Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NfSen-HW Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . Use Case diagram depicting the diagnosis of a network anomaly . . . . . Overview of Proposed System Architecture . . . . . . . . . . . . . . . . . Sentinal Java UML Diagram . . . . . . . . . . . . . . . . . . . . . . . . . Sentinal Database Entity Relationship Diagram . . . . . . . . . . . . . . Simple foreign key linking example . . . . . . . . . . . . . . . . . . . . . Proposed Live Update Web Interface . . . . . . . . . . . . . . . . . . . . Proposed Details Web Interface . . . . . . . . . . . . . . . . . . . . . . . Proposed Review Web Interface . . . . . . . . . . . . . . . . . . . . . . . Sentinel Java UML Class Diagram . . . . . . . . . . . . . . . . . . . . . . Sentinel Database UML Diagram . . . . . . . . . . . . . . . . . . . . . . Aberrant Marking Example . . . . . . . . . . . . . . . . . . . . . . . . . Subtracting 40 Minutes Example . . . . . . . . . . . . . . . . . . . . . . Investigation Process Step 1 . . . . . . . . . . . . . . . . . . . . . . . . . Investigation Process Step 2 . . . . . . . . . . . . . . . . . . . . . . . . . Investigation Process Step 3 . . . . . . . . . . . . . . . . . . . . . . . . . Investigation Process Step 4 - Editing . . . . . . . . . . . . . . . . . . . . Investigation Process Step 4 - Inserting . . . . . . . . . . . . . . . . . . . Sequence Diagram of System Operation . . . . . . . . . . . . . . . . . . . General Defect Testing Model . . . . . . . . . . . . . . . . . . . . . . . . 8 9 19 23 26 31 37 39 40 41 43 44 45 50 52 55 55 57 58 59 60 60 61 62
LIST OF FIGURES 6.2 Functional Testing Model . . . . . . . . . . . . . . . . . . . . . . . . . .
5 65 92 93 93 93
E.1 Aberrant Marking Example . . . . . . . . . . . . . . . . . . . . . . . . . E.2 Subtracting 40 Minutes Example 1 . . . . . . . . . . . . . . . . . . . . . E.3 Subtracting 40 Minutes Example 2 . . . . . . . . . . . . . . . . . . . . . E.4 Subtracting 40 Minutes Example 3 . . . . . . . . . . . . . . . . . . . . .
List of Tables
2.1 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 6.1 6.2 6.3 6.4 6.5 6.6 6.7 Consolidation functions within RRDtool for aberrant behaviour detection Derived Requirements List for High Level Requirement A . . . . . . . . . Derived Requirements List for High Level Requirement B . . . . . . . . . Derived Requirements List for High Level Requirement C . . . . . . . . . Derived Requirements List for High Level Requirement D . . . . . . . . . Derived Requirements List for High Level Requirement E . . . . . . . . . Derived Requirements List for High Level Requirement F . . . . . . . . . Derived Requirements List for High Level Requirement G . . . . . . . . . Sentinel Database Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . Sentinel.jar Testing - XML Parsing . . . . . . . . . . . . . . . . . . . . . Sentinel.jar Testing - Source and Prole Detection . . . . . . . . . . . . . Sentinel.jar Testing - Database Connectivity . . . . . . . . . . . . . . . . runSentinel.sh Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sentinel UI Functional Testing - Live Update . . . . . . . . . . . . . . . . Sentinel UI Functional Testing - Details . . . . . . . . . . . . . . . . . . . Sentinel UI Functional Testing - Review . . . . . . . . . . . . . . . . . . 20 32 33 33 33 33 34 34 40 63 63 64 65 67 68 69
1 Introduction
1.1 Project Aims
The rationale behind this project was to gain an understanding of current research work surrounding aberrant network behaviour detection and then investigate the challenges faced when creating a system which would detect aberrant behaviour and provide a classication of its type. Leading on from this, the aim is to produce an application which illustrates how the work of a network operator could be aided by indicating instances of aberrant behaviour, providing any relevant information, and performing some kind of classication of the type of anomaly. Such an application should ease a network operators workow when diagnosing and xing network problems by providing necessary details with considerably less manual intervention than might currently be required. It should also provide the facility for instances of aberrant behaviour to be recorded to provide a historical perspective on any future anomalies detected which should further aid the network operator in their work.
1.2
Motivation
Computer Networks play an increasingly important role in todays technological age. The transfer of information between computers has become something necessary for many day to day activities, and this is especially true of the education and research sector. Universities and research institutes rely on them as communication links between scholars and students across the globe. Also in many cases the research being done corresponds directly to the networks themselves, computing and communications research requires high speed reliable links between sites in order to accurately test new protocols and technologies. It is important that these networks are monitored carefully to ensure that potential issues are caught and resolved. People charged with the task of maintaining computer networks face a constant battle to ensure that they do not fail, but failure is not such a black or white issue. Whilst one problem faced might be a network breaking removing a connection between machines, it is more than likely that regular problems would be less obvious and require investigation to solve. Services on the network may become unusually busy, or slow to respond; users might notice lag between transfers being sent and acknowledged. Other issues might aect network trac but remain unseen, namely security problems. Users might not notice if their data is being tampered with or observed, but it is up to a network administrator to try and prevent attacks of that kind, and to ensure they are rectied if they occur.
CHAPTER 1. INTRODUCTION
1.2.1
` DANTE and GEANT2
DANTE, standing for Delivery of Advanced Network Technology to Europe, is an organisation part owned by each of the European National Research and Education Networks (NRENs) which has worked to plan, build and operate pan-European computer networks for advanced research and education since it was established in 1993. [DANTE, 2007]. DANTE has played an important part in the previous four generations of pan-European research network, and was responsible for the initial construction and ` subsequently the maintainence and management of its current incarnation, GEANT2. This network connects 30 NRENs serving 34 countries providing network facilities for ` approximately 30 million research and education users [GEANT2, 2007].
` Figure 1.1: GEANT2 Network Topology
` Figure 1.2: GEANT2 Global Connectivity DANTE and its network operations team are responsible for the day to day business of ` running GEANT2, ensuring the network is operating smoothly and that each of its end users are happy with its performance. As you can see from Figure 1.1 and Figure 1.2, the ` GEANT2 network is exceptionally large and interacts with multiple research networks around the world. Monitoring a network this size presents a very dicult prospect, its a balance between wanting to know about every network event in order to be sure the network is operating correctly but also having only the time to deal with problems which are being specically reported by end users. This results in a situation where network anomalies not causing immediate problems for network users are often missed, and potentially causes problems further down the line as whatever the cause of the network event might be is not dealt with in the rst instance. This problem is emphasised by the sheer amount of data being dealt with, any metrics created for monitoring purposes are be excessively large and logically cannot be kept for an indenite amount of time. This leads to circumstances where a network problem has occurred but the data pertaining to it has been deleted simply because the data for that time period has expired. A network operator in this situation does not have time to spend actively monitoring the network for aberrant behaviour by hand. Most widely used network monitoring so-
10
lutions will provide an overview of network activity in a graphed format and this can be used to visually identify anomalies. Unfortunately this is not an automated process and requires human interaction to view the graphs at the correct time. Also on a network ` the size of GEANT2, in order for an anomaly to show up on a graphed view it would have be quite large. Due to this. network events could be missed both through being of a size too small to create a visible footprint on the graphs and by ocurring at a time during which a network operator has not checked the monitoring software. In such a case there is not normally any record of the behaviour other than in the graphs, no other historical record is kept of network anomalies and their type. In some cases it is possible to go back and examine the graphs for a given time period, but the data used to create them might have expired hence losing the potential for closer analysis and the graphs themselves often become averaged over time, causing smaller anomalies to be evened out into the normal ow of data. ` DANTE and their work with GEANT2 provide an excellent example of why automated aberrant network detection is a necessary area of research. This project aims to use this scenario and through discussions and liaison with network operators at DANTE, provide a concept network monitoring system which will attempt to provide a solution for the issues highlighted above.
1.3
Report Overview
The remaining sections of this report will be as follows: Chapter two provides some background information pertaining to this area of research and examines existing work and applications which could aid the design and implementation of the project. Chapter three gives a breakdown and explanation of the major design decisions made and describes the system architecture, interface designs and communication structure. It also provides a list of documented requirements to be met by the nished product. Chapter four describes how the application was implemented and lists important sections of code. Chapter ve shows the application in operation, how it would be used by a network operator in their daily work with a walkthrough of typical usage. Chapter six gives an overview of the testing undertaken, and how successfully the application meets the specied requirements. Chapter seven draws conclusions based on comparisons between the nished project and the initial aims and objectives. It analyses the overall success of the project and indicates where further work could be undertaken.
2 Background and Related Work

2.1 Related Work
There is a lot of research surrounding the area of network anomaly detection and in every case the rst thing that must be dened is what specically constitutes anomalous or aberrant behaviour on a network. Some researchers have chosen to dene aberrant events as any network trac which has been caused by some malicious intent [Kim et al. 2004], others simply as any large scale event on the network [Wagner & Plattner, 2005]. These denitions seem simultaneously too broad and too specic; malicious network trac may make up a large part of anomalous trac on a network but there are other contributing factors such as network conguration issues which are not be covered by this, where as some large scale network events may be planned, or occur as part of general network usage. A more reasonable denition is circumstances when network operations deviate from normal network behaviour [Thottan & Ji, 2003, pg 2192], in essence when witnessed trac on the network diers from might be expected according to prior knowledge of how the network operates. This obviously requires an indepth knowledge of the network and how it is used. One way of doing this is to create a picture of normal network trac and actively use that for comparison purposes to judge which trac is abnormal. Jake Brutlag develops this idea further by stating that if you have an accurate statistical model for a given time series of network trac data, then you can dene aberrant behaviour as behaviour that does not conform to this model [2000, p140]. In these cases aberrant or anomalous behaviour is not necessarily of any given type or size, but merely something which would not have normally occurred, and even in relation to the earlier denitions it is appropriate as it can be used to specically identify malicious trac of network wide events. Overall, it means that the identication is not restricted to network events which have been witnessed and identied before allowinh new, undened network problems to be agged up. Of course, for this to be a useful denition there must rst exist an exact specication of what constitutes normal behaviour and this is a big focus of much research in this area. Almost all the research papers covered were in agreement that what is required is a statistical analysis of network trac data [Barford et al. 2002; Thottan & Ji, 2003; Kim et al. 2004; Brutlag. 2000]. The question then is how that statistical analysis is performed, and then from that, how is aberrant or non-normal behaviour identied. Barford et al. [2002] perform a signal analysis of network trac data known as wavelet analysis which produces an organised hierarchy of data over time into separate levels known as strata. The diering levels of strata produce information of varying types, from sophisticated aggregations of the original data at the lower levels to ne grained details at the higher end [p74]. From this they show that these separations of results can indicate dierent characteristics, for example, lower level strata capturing patterns over a long period of time, middle ranges of strata producing information about daily 11
CHAPTER 2. BACKGROUND AND RELATED WORK
12
variations, and high levels indicating very short term variation and in their opinion not useful to network anomaly analysis. They make the point that there can never be one single method for detecting network anomalies from this information due to the diering denitions of what a network anomaly is, but they suggest a method for automating the process of identifying irregularities in the measured data[p75] which they call a deviation score. The results illustrated that a wavelet analysis of network data is quite eective at showing the details of network trac, both during normal network operation and during anomalous events. The network data used throughout this analysis was both SNMP from their network devices, mostly activity counts (i.e. numbers of packets transmitted per node) and Network Flow data including more specic protocol level information about end to end packet ows, which together provide a reasonably solid measurement foundation [p71]. A comparison was made between the details elicited from Network Flow and SNMP and they found that it is possible to expose anomalies eectively using both. There will be further discussion of Network Flow data and SNMP data later in this chapter. Thottan & Ji [2003] provide an overview of what they consider to be the most popular network anomaly detection methods; rule based techniques, nite state machines, pattern matching and nally their main focus, statistical analysis which can be used to continuously track the behaviour of the network [p2194] unlike the other approaches which often require recalibration over time. The statistical analysis is performed using SNMP data collected from network routers and the method of statistical analysis has been developed based upon the theory of change detection, i.e. having dened a network anomaly as correlated abrupt changes in network data [p2195], using theory to detect changes in network data indicates network anomalies also. They dene an abrupt change as any change in the parameters of a time series that occurs on the order of the sampling period of the measurement [p2195]. It is the correlated nature of the changes which distinguish them from the normal variable nature of normal network operation but due to the nature of SNMP data from various devices, even data of the same type from separate devices cannot be treated the same way. Each source of data must be tested independently and correlations between devices found. To give a very general overview, abrupt changes are detected by comparing the variation of statistics between two contiguous windows of data using an auto-regressive model. They found that the use of ne-grained network data greatly improved the time taken for detection and something for concern was the possibility of time synchronation being out between the various network devices being polled. Also SNMP runs using the UDP network protocol meaning that there is no guarantee of queries and responses reaching their desired target. Kim et al. [2004] propose a method for abnormal trac detection based entirely on Network Flow analysis. They divide the analytical process into two sections, ow header detection and trac pattern data generation. As a packet is received by their algorithm its header is checked and the transport protocol determined. From this further checks can be made on such information as destination/source port number, or the packet/ow size. The trac patterns can be used to detect further aberrant behaviour, for example,
13
a scanning attack would result in a large ow count per host, but small ow and packet sizes. This is not strictly a statistical analysis of network data, more a record of previous network trac from a specic host/network in order to produce better knowledge of their use of the network. It suers from the same pitfalls as most rule-based analysis, for example, a regular need for reconguring and a lack of ability to detect new and undocumented aberrant events, but it does produce some interesting information regarding particular network anomalies and how they appear as part of Network Flow data. It is also of note that their system suered problems with false alarms due to the similarities (according to their model) between attack trac data and normal peer-to-peer communication data which, according to their paper, is the nature of as much as 50% of current Internet trac. Jake Brutlag [2000] describes the statistical model from which aberrant behaviour is determined as having to take into account a number of factors, mostly surrounding season cycles or variations that are considered normal network behaviour, for example, network usage during the day being higher than at night, and higher still Monday-Friday compared to weekends. The model should be able to take this into account, and not mistakenly judge such trends as aberrant instances. It should also be capable of evolving over time with the network as the cycles and trends gradually adapt to new conditions [p140]. His emphasis is on the use of such a model in a real-time monitoring context, complicated statistical modelling is not likely to be understood by the network operators and may have issues performing at an adequate speed. The model is broken down into three sections [p140]: An algorithm for predicting the values of a time series one time step into the future. A measure of deviation between the predicted values and the observed values. A mechanism to decide if and when an observed value or sequence of values is too deviant from the predicted value(s). His solution is an extension to the Holt-Winters forecasting algorithm which builds upon exponential smoothing. Exponential smoothing is a simple algorithm for predicting the next value in a time series which works on the premise that the most useful value to predict the next value is the current value and that the continued usefulness of earlier values decays exponentially. Aberrant behaviour is then detected through devised condence bands, a measure of how much deviation is allowed for a specic time within a seasonal cycle. There will be a more full explanation of the Holt-Winters forecasting algorithm and how it works later in this chapter. Jake Brutlag included this implementation in RRDtool, a data logging and graphing application and illustrates its use within a web based network monitoring solution called Cricket [Brutlag, 2000b; RRDtool, 2007; Cricket, 2007]. His conclusions are that whilst not an optimal solution, it is exible, ecient and does eectively detect aberrant behaviour. This solution appears to be the most complete, if not the most formally specied. The
14
technique used is already at a production level and is being used. The fact that it is incorporated into RRDtool, one of the most commonly used logging and graphing tools available, makes it a very attractive option. There will be a closer examination of this RRDtool/Cricket solution later in this chapter. Whilst most of these methods of anomaly detection and analysis have involved basic counting metrics, there has been some investigation into the use of dierent methods of analysis to create models of trac ow. One such approach involves devising the entropy content within trac data and using that information to decide whether trac is anomalous or aberrant [Wagner & Plattner, 2005]. Entropy is dened as a measure of how random a data-set is [p172] and the process they use to determine entropy for the network trac data rst involves representing the data in a purely binary format then performing data compression. The resultant size of the compressed data then corresponds directly to the level of entropy present. Their results found many interesting entropy patterns in normal and attack trac, for example, in regular network trac the entropy of source and destination port elds is almost identical where as in attack trac many of the answering ows do not exist, hence source port entropy increases whiled destination port entropy decreases. They also found that this method of analysis is not greatly aected by the use of sampled network trac data.
2.2
Sources of Network Data
Before looking at how to detect aberrant or anomalous network behaviour it is important to examine possible sources of network trac data and their strengths and weaknesses. There are four main types of network data available to use and in this section each will be examined.
2.2.1
Measurement of Metrics
This refers to the method of obtaining network data by measuring certain metrics regarding network performance. An example might be the measurement of packet round trip times and packet loss. This is not something which is necessarily automated, there are command line tools which can give results of this nature such as ping or traceroute. Findings obtained in this manner would not normally be incorporated into a network monitoring solution but are useful as a secondary source of information during the investigation of a potential network problem. They present useful information about the state of the network at a given time and also how well it is currently operating but cannot give any indication of the type or nature of network trac.
15
2.2.2
Individual Packet Capture
This method involves capturing each individual packet as it passes through the network and processing it to nd out useful information. Due to its invasive nature it provides highly detailed information about the type and even content of data traversing the network, this is due to its ability to look into the application layer of network packets. Such indepth network trac data creates the potential for incredibly accurate and specic analysis of network operation, not just based upon protocols used or source/destination, but also based upon the program or application the packet is being used to update. In a lot of cases this would be the ideal for network trac analysis and would mean that all kinds of networ anomalies can be identied and very accurately classied, unfortunately such high detail comes at a price. Capturing individual packets as they pass through network devices is an incredibly intensive process when the sheer amount of packets traversing even a medium sized network operation. In a scenario such as that at DANTE, individual packet capture would be far, far too heavy a load for any available server. Whilst the information would be highly desirable it would result in such a performance hit on the network itself that it is inappropriate for a passive network monitoring application.
2.2.3
SNMP
SNMP stands for Simple Network Monitoring Protocol [SNMP, 2007] and is an IETF declared Internet standard in the application layer of the TCP/IP ve layer model. It is used by network monitoring systems to monitor and manage network connected devices using Management Information Base (MIB) queries. Devices can be polled for numerous dierent types of information, the rst is regarding the state of the device itself. This gives information about load and operational readiness, for example, information about how heavily loaded the processor within a router is which could indicate potential problems with the capability of that specic device, or possibility of unpredicted network load in that area. It can also produce statistical information about the network data the device is passing such as numbers of packets transmitted in a certain period of time which gives another indication of bandwidth and network load. Another capability is providing network management systems with alerts when certain events occur on the device, for example a large number of failed login attempts to its management interface. One other use of this protocol standard is to actually remotely manage the network devices, reconguring them for dierent circumstances, for example, blocking partiular ports or dropping network interfaces. This isnt something which is specically connected to network monitoring and aberrant behaviour detection but such capability would allow a network engineer to react to aberrant events which might have been detected and perhaps nd a solution. Information gathered via this method is quite coarse, there are few specics about types of packets or regularity of their throughput other than plain statistical counts and aggregations. This data could be very useful alongside a more indepth source of network data,
16
but probably is not granular enough to be the soul data source in an aberrant behaviour detection system.
2.2.4
Network Flow
A Network Flow is a record of a unidirectional sequence of packets between two endpoints over a dened period of time that contains certain information with which the ow can be identied. This information consists of seven key elds: source IP address, destination IP addresss, source port number, destination port number, protocol type, service type and router input interface. After receiving a packet a ow capable router will examine it for the information to ll these seven elds and based upon the results decide whether the packet is part of a pre-existant ow record, or if it is something new. In the case that it is part of an existing ow record, the trac statistics of that ow record will be increased accordingly, otherwise a new ow record will be created with the statistics including the initially recived packet. A few standards exist for for ow data, the most common being NetFlow developed by Cisco [NetFlow, 2007] and generally accepted as the industry standard, another is sFlow an alternative produced more recently. Both produce fairly similar data for analysis and for the purposes of this denition the focus will be on NetFlow as it is currently the more commonly supported. A ow record does not contain any information pertaining to the application layer, it is merely a trac proling tool. Flow level data is not as specic as full packet analysis but holds the advantage in large scale heavily used networks due to its high speed nature. NetFlow recording is nowhere near as intensive as individual packet capture and produced a much smaller dataset for a given series of packets due to the way it aggregates packets into related ows. This can have a big impact on heavily used networks as the sheer amount of data created by each data source and the processing power required to perform analysis can be prohibative to producing any kind of useful network usage report, especially when working in real-time. Even with this reduction in the amount of data, without some kind of presentation application NetFlow data can still be dicult to manage and so in organisations where NetFlow is used, it will most likely be sent to a network monitoring system to produce clear reports about the analysis carried out. NetFlow data can be used to gain an overview of trac traversing the entire network at a point in time. It holds enough detail to analyse and produce reports of trends in port usage, bandwidth on a packets per second/ows per second/bits per second basis, as well as giving indications of interesting network behaviour. A network can be analysed using NetFlow data and characterised according to how it is normally used, from this it can be seen when network usage is dierent. This is all based on ow level information but is usually enough to indicate areas of interest. There are some potential problems with NetFlow as a main source of network data, the rst being the common use of packet sampling in order to create the ow records. Even though recording Network Flow is much less intensive and quicker than individual
17
packet capture, it is still too much of an overhead for very large networks, such as in ` the case of DANTE and their operation of GEANT2. The problem is twofold, rstly not wanting to impact network performance with the analysis load, but also creating large datasets which are impossible to deal with sensibly. In such cases there is nothing to be done but enforce some scheme of packet sampling upon the NetFlow enabled router. By this process, not all packets are examined to record ow data, a given ratio can be set, usually somewhere between the extremes of 1 in 15 and 1 in 1000. A study completed quite recently by Braukho et al. [2006] examined the impact of packet sampling on anomaly detection metrics and the results were quite interesting. The investigation used a record of ow data from the outbreak of the Blaster worm in 2003 where the characteristics were well known, and the anomaly detection could be replayed at various levels of sampling to produce results which could be scientically compared. Firstly they found that packet counts are barely disturbed by packet sampling even as high as 1 in 1000, where as ow counts are heavily disturbed causing many identiable trends to simply vanish. They attribute this to the fact that ows containing only a few packets are sampled with a lower probability that ows containing many packets, hence in a lot of cases the smaller ows disappear. Secondly they examined how volume and feature entropy metrics are aected by packet sampling, their conclusion was that though we see that packet sampling disturbs entropy metrics (the unsampled value cannot easily be computed from the sampled value as for byte and packet counts), the main trac pattern is still visible in the sampled trace. [p161]. This is something which would have to be taken into account when deciding upon analysis techniques using NetFlow data.
2.3
2.3.1
Existing Network Monitoring Solutions

TCPDump
TCPdump [TCPdump, 2007] is a commonly used network debugging tool which enables the user to intercept and view individually captured TCP/IP packets that are being transmitted over a network. It is built upon the libpcap [libpcap, 2007] packet capture library and has the capability of writing out the data obtained from captured packets to a formatted text le. This can then be interpreted by a statistical analysis program to produce reports of trends in network usage and to give further information about network trac traversing the network in question. The program itself contains no form of alter or notifcation regarding network events but this can be acheived with the application of some network monitoring solution and an indepth analysis of the data recorded. This solution provides an overwhelming amount of data regarding usage of the network and can be very useful in diagnosing network faults. However, as mentioned in previously, individual packet capture is a very intensive process and on a network of any great size there would simply not be the resources available to capture, process and store every packet or even a sampled amount of packets in this fashion. This is a very useful tool
18
to have when actively working to solve an identied issue, but is not something which should necessarily be used in a passive network monitoring context.
2.3.2
Snort
Snort [Roesch, 1999] is described by its creator as a Lightweight Intrusion Detection System which operates in a passive fashion providing administrators with enough data to make informed decisions [p229]. It is based upon the libpcap [libpcap, 2007] packet capture library like TCPdump but analyses individually captured packets with the capability to examine the payloads of packets in the application layer which TCPdump lacks. Again, due to the nature of its operation, it does not scale successfully to be used on larger networks, and its creator states that it is intended to be used on small, lightly utilized networks [p229]. Its method of network trac analysis is a rule-based one and the rules are created by the individual network administrators tailored to their network. If Snort witnesses some trac trend which is dened as aberrant according to those rules it will perform a set action, most commonly sending an email to the administrators to alert them to the possibility of some network problem. Again this is dependent on individual packet capture which is not suited to large, heavily used networks, even as stated by the creator of the program. It does contain some form of alert system but as mentioned previously, rule based systems are not ideal for such analysis as it is very dicult to predict new network trends, either naturally evolving ones or ones caused by new network threats. Such a system might create large amounts of false positives or in the case of a new style of anomaly, may miss the problem altogether.
2.3.3
RRDtool
RRDtool [RRDtool, 2007] stands for Round Robin Database tool and describes itself as the industry standard data logging and graphing application. It provides a series of tools capable of creating, updating and manipulating databases of time series data with which to produce graphs for visualising results. Its data storage uses round robin database principles which means that the database les will never grow to be larger than a custom set size. This is acheived through constantly averaging and generalising the data held within over a set amount of time. This has two results, rstly that the size needed to record the data for a particular source will always be constant, but secondly that over time the older results held will lose their granularity as smaller dierences will be averaged out. This means it is a good choice for a situation such as at DANTE as the initial data size is a known quantity, and storing data in this format means it is held in a compact fashion meaning that potential I/O constraints are minimized. The loss of data granularity is something which can be organised such that only data so old that it is of no direct use is changed past a certain point.
CHAPTER 2. BACKGROUND AND RELATED WORK Capabilities
19
RRDtool provides a way of storing data in a logical and easily readible/updateable format. It provides the facility to generate graphs based upon the data values held within the RRD databases and to hold data in dierent resolutions depending on user congurable settings, and also contains a form of aberrant behaviour detection based on Holt-Winters forecasting. The result of its aberrant behaviour detection is a boolean result, yes or no, for a particular time period. There is no built in functionality for alerts or available interface to more information, the behaviour is merely agged when seen. It does not provide any front end interface to these commands, nor is it specically tailored to data sources such as SNMP queries or Network Flow. The data must be organised and processed into the required format before being inputted to RRDtool for archiving and graphing. There are quite a few front-ends and extensions available for RRDtool, most noteable are Cacti and NfSen which are discussed later in this chapter.
<rrd> <version>0003</version> <step>300</step> <lastupdate>1169397600</lastupdate> <ds> <name>flows</name> </ds> <rra> <cf>FAILURES</cf> <pdp per row>1</pdp per row> <database>  <row> <v>0.0000000000e+00</v> <v>0.0000000000e+00</v> <v>0.0000000000e+00</v> <v>0.0000000000e+00</v> <v>0.0000000000e+00</v> <v>0.0000000000e+00</v> <v>0.0000000000e+00</v> <v>1.0000000000e+00</v> <v>0.0000000000e+00</v> <v>0.0000000000e+00</v> <v>0.0000000000e+00</v> <v>0.0000000000e+00</v> <v>0.0000000000e+00</v> <v>0.0000000000e+00</v> <v>0.0000000000e+00</v> </row> </database> </rra> </rrd>
Figure 2.1: Section of an RRD exported to XML format
Architecture This is a brief overview of the architecture of an RRDtool database and how it operates, including the aberrant behaviour detection capability added by Jake Brutlag. Firstly an
20
RRDtool database (from this point on referred to as an RRD) is stored on disk as a binary le specic to the architecture of the machine used to compile the version of RRDtool it was created using. It can be exported to and imported from an xml format, in which it is easy to see the constituant sections [See gure 2.1], but more importantly so it can be ported between machines with RRDtool compiled for dierent architectures. This binary format minimizes the time taken for read and writes performed by the application itself. RRDtool performs an operation on the RRD known as consolidation which is essentially a form of archiving based upon user specic rules. Consolidation occurs with every RRD update, as new data is added older data is consolidated such that the archive maintains a specic size, and that the overall data result is how the user has dened; older data can be reduced to an average, a minimum, a maximum etcetera. There can be dierent consolidation functions per RRD and internally data using the same consolidation rules is divided into separate Round Robin Archives (RRAs) where the required amount of space is set aside ready for data values to ll it. The example given in the RRDtool documentation is that of a need to store 1000 values at 5 minute intervals. Within the RRA, space for 1000 values will be allocated plus a header of a set size. As data values are updated they are added to the allocated space in a round robin fashion, so newer values would appear to knock older values o the end of the 1000 recorded instances. This is when the consolidation function is used to keep track of the previous data in the way the user has specied. The aberrant behaviour detection functionality within RRDtool is implemented using the Holt-Winters forecasting method which will be examined more closely later in this chapter. The information is stored through the addition of ve consolidation functions as shown in table 2.1 [Brutlag, 2000a p143]. An array of forecasts computed by the Holt-Winters algorithm, one per primary data point. SEASONAL An array of seasonal coecients with length equal to the seasonal period. For each primary data point the seasonal coecient that matches the index in the seasonal cycle is updated. DEVPREDICT An array of deviation predictions. Essentially copied from the DEVSEASONAL array to preserve a history; it does no processing of its own. DEVSEASONAL An array of seasonal deviations. For each primary data point the seasonal deviation that matches the index in the seasonal cycle is updated. FAILURES An array of boolean indicators, 1 indicating a failure. Each update update removes the oldest value and inserts the new observation. On each update the number of violations is recomputed. Table 2.1: Consolidation functions within RRDtool for aberrant behaviour detection HWPREDICT
21
When the calculations have been performed and the specic RRAs updated then the FAILURES section is where the actual aberrant behaviour is indicated.
2.3.4
Cacti
Cacti [Cacti, 2007] is described as being the complete front end to RRDtool providing a web based framework for aggregating data sources and displaying graphs dependent on user conguration. A lot of the functionality is provided via RRDtool; what Cacti oers above RRDtool alone is, in essence, exactly how it is described, a more easily congurable interface which can be altered to a network administrators preference to provide a coherent front-end display of available network usage information. It contains support for graphing based upon SNMP queries and drawing graphs from any data source can be made to utilise the create and update functionality in RRDtool. It does not have any inherent ability for analysing and processing network trac data other than that within RRDtool itself, it merely allows the data to be displayed in a coherent fashion so visual analysis can be carried out. This means that aberrant behaviour detection can be performed using RRDtools Holt-Winters implementation, but the network trac data must rst be processed to present it to RRDtool in a compatible format. Having done this there is no provided inteface to the aberrant behaviour detection results other than a presentation of the data as a graph. This only provides a visible indication of what time the aberrant event occurred and would require a network administrator to conduct further separate research using alternative tools to diagnose the perceived problem.
2.3.5
Flow-Tools
Flow-Tools is a software package for collecting and processing NetFlow data created by Mark Fulmer [Flow-Tools, 2007]. It can be used to collect raw Network Flow data from routers/servers and then process it to create reports on network activity. NetFlow that is collected using the ow-capture command will be written to disk in les that cover a user congurable time period and compression is applied. The les can be congured to expire, either after a set amount of time has passed or when a certain amount of disk space has been used. A rotation prcoess occurs so older les are expired rst. The interface to the ow-tools les for querying and analysis purposes is largely at the commandline, there are commands to process the data in a way completely congurable by the user, but there are also inbuilt commands to aid searching for set kinds of aberrant behaviour, such as scanning trac on the network. Flow-Tools can be congured to produce reports and graphs (through RRDtool) about behaviour it witnesses on the network, but any aberrant or anomalous behaviour detection via this method would be rule based. It could be congured to sent the appropriate
22
information to RRDtool to make use of the in-built aberrant behaviour detection but this is not available as standard. A statistical analysis could also be applied to owtools archived les but again this is something which must be congured by the network administrators individually, there are no in-built capabilities for this.
2.3.6
NfDump
NfDump is a command line application written principally by Peter Haag to collect, process and produce analytical reports of Network Flow data [NfDump, 2007]. It is a key part of the wider NfSen project which will be mentioned later in this section. It has a built in NetFlow capture daemon, nfcapd, which runs as a background system process collecting the NetFlow data as it is exported from the router. The data is then stored in 5 minute long timeslices in a proprietary format which can be accessed using other NfDump command line tools. It contains the facility for viewing archived NetFlow data corresponding to dened lters, a simplied example taken from the nfdump manpage [Haag, 2005b] would be:
nfdump -r nfcapd.200407110845 inet6 and tcp and (src port > 1024 and dst port 80)
This displays all IPv6 connections on port 80 to any webserver that occured within the timeframe that the specied nfdump le covered. This lter syntax is capable of examining a range of timestamped nfcapd les can produce detailed statistics very quickly, for example, the top 20 statistics during the two given timeslices in the regular format [Haag, 2005b]:
nfdump -r nfcapd.200407110845:nfcapd.200407110945 -S -n 20
An example of the output format:

Date flow start 2004-07-11 08:59:52.338 2004-07-11 09:15:03.422 Duration 0.001 5.301 Proto UDP TCP Src IP Addr:Port 36.249.80.226:3040 -> 36.249.80.226:4314 -> Dst IP Addr:Port Pkts 92.98.219.116:1431 1 92.98.219.116:1222 45 Bytes 404 2340 Flows 1 2
NfDump provides a highly exible and quick interface at the command line to view specic information pertaining to network events. It can be congured to to produce graphs via its sister application NfSen which uses RRDtool as a back end. There is no command line facility for detecting aberrant behaviour other than examining the statistical information by hand or by processing the stored data les using some external statistical analysis program but as with ow-tools, this is something that a network administrator would have to create and congure using their personal knowledge of the network. One extra possibility provided by NfDump is the use of the command line tool nfprole to use
23
stored lters (known as proles within NfDump) to process specied trac into either an ASCII formatted human readable report, or a binary formatted data le which can be analysed again using the NfDump command line tools. This can be congured to occur as the les are stored, or when the administrator initiates analysis. This could allow data of a particular type/to a particular subnet/from a particular ip to be stored separately from normally stored data to ease analysis. Overall, NfDump provides a very nice solution for processing and organising collected NetFlow into a format which can be analysed for aberrant behaviour, but it does not provide any kind of aberrant behaviour detection itself. The only restriction on the amount of data which can be held is disk space and stored data is not held in any compressed way, even so, as les are rotated every 5 minutes and marked with the datestamp they cover, in cases such as DANTE where the amount of NetFlow being stored is too much for the machine to keep longer than two weeks, the older les could easily be automatically deleted.
2.3.7
NfSen
NfSen, or NetFlow Sensor, is described as a graphical web based front end for the nfdump netow tools [NfDump, 2007]. Combined with NfDump it makes up what the author Peter Haag refers to as the NfSen Project. It provides an interface to the NfDump command line tools, as well as illustrating network usage via graphs using data processed and stored using the nfcapd NetFlow capture daemon. Figure 2.2 illustrates exactly how NfDump and NfSen interact [Kiss & Moh`csi, 2006]. a
Figure 2.2: NfSen Architecture By default it will produce graphs based on the live NetFlow data being captured to display current network trac behaviour over various timeframes. It also oers the ability
24
to dene further proles, specics of data you wish to see graphs separate from the live display. The details of these can be congured via the web front end, including the amount of disk space the RRD for that prole will take up. and it makes use of the proling feature within NfDump. This size per prole conguration means that an reliable estimate of disk space can be obtained before any data is captured, as well as giving the ability to allow more detailed proles to use more space and hence hold their granularity for a longer period. NfSen uses NfDump as its back end for capturing and processing the data, this means it is only capable of monitoring network trac based upon Network Flow. Due to this focus on one source of data however, the analysis that it is capable of is perhaps more detailed than other similar monitoring solutions and the presentation of the results is specically tailored to this kind of information. It also uses RRDtool as its back end for producing a graphical display, this means that it would be possible to harness the aberrant behaviour capabilities of its built in Holt-Winters algorithms, though NfSen does not have any inherent solution for displaying such information. As with any system utilising RRDtool for its data storage, disk space is a known quantity though, as I mentioned previously, NfDump does not have any compression facility for its data storage. There is modular frame work for adding plugins to the system, one popular example is PortTracker which monitors the connections to various ports in a graphical way. There is also the facility for automatic alerting via email according to given rules, but this has to be congured by a network administrator with specic to the network in question and its uses.
2.3.8
Overview
There are quite a number of network monitoring solutions available but few, if any, support aberrant behaviour detection and indication. TCPdump and Snort are far too intensive to use in any kind of passive monitoring environment and the data they provide is very specic to the rules used. The analysis they perform is not adaptive and in most cases requires a very good knowledge of the network environment which is being monitored. A side factor is the legal implications of the data that is produced, for example, with Snort it is possible to view data held in the application layer of packets traversing the network. A network administrator using such a tool to analyse network trac may inadvertently nd network trac which contains illegal material, and in such a case the law is not entirely clear regarding the administators position in having viewed it. Whilst it is important to ensure there are rules and restrictions regarding network use in action, it is not necessarily the place of a network administrator to enforce such legislation, and so, whilst diagnosing network faults the potential for such inadvertent discoveries might be something to be avoided. Especially for administrators in a situation like DANTE where their responsibility is simply for the links between separate service providers, institutions whos place it is to be enforcing such network usage restrictions rather than DANTE.
25
RRDtool is the most promising service oering the capability of aberrant behaviour detection using the Holt-Winters algorithm based upon supplied data. It is based upon Round Robin principles and hence uses a static amount of disk space to store its data as well as having the ability to produce graphical representations of any kind of information, provided it is inputed in the correct format. Unfortunately this is where RRDtool is not enough, it has no capability for analysing or processing data merely dealing with pre-processed values submitted with correct ags. Further applications are necessary to give RRDtool its full potential. Cacti provides a fully functional interface to RRDtool, allowing the ability to create and view graphs of various data sources, even including the facility to carry out SNMP queries. Whilst this solves some of the initial interface problems faced when using RRDtool alone, it still leaves a need for some form of pre-processing of any network data other than SNMP before it can be inputed to RRDtool. Also Cacti does not have any interface specically designed to tailor for the results what might be produced by RRDtools aberrant behaviour detection, so this would also need to be created. Flow-Tools might be a solution to the need for preparatory processing and analysis, it automatically captures and stores NetFlow data in a compressed format and can be congured to output to RRDtool depending on how the data presentation required. This still leaves a need for a front end presentation of the data, both regular and aberrant behaviour related. Finally there is NfDump and NfSen, two applications which are closely linked. One supplies a NetFlow capturing and storing facility, with processing and proling capability, the other a web based interface to RRDtool. This is the most complete package overall, within the context of this project. It is lacking in a number of areas however, there is no inherent ability for NfSen to provide any aberrant behaviour detection or indication, and the web interface only allows the creation of basic data proles not the ability to set up or view data being processed by the Holt-Winters algorithm. There is no pre designed facility for indication network events when they occur other than by viewing the specic graphs at the right time. This would appear to be the best on oer, but requires more development to be an ideal solution.
2.4
NfSen-HW
NfSen-HW is an extension to NfSen currently being developed as an attempt to make full use of the Holt-Winters aberrant behaviour detection capabilities within RRDtool ` [NfSen-HW, 2007]. It was initially presented to JRA2, the GEANT2 security team, in September 2006 as a project being undertaken by network administrators at HUNGARNET the Hungarian research and education network [Kiss & Moh`csi, 2006]. The aim a was to aid the work of the Computer Security Incident Report Teams (CSIRT) in their usual work process; nd abnormal behaviour, report and coordinate incidents, the goal being to help visually detect abnormal behaviour [Kiss & Moh`csi, 2006, Slide 2]. a
26
Whilst it is at the very cutting edge of development, it does provide a combination of everything previously listed as being a requirement; NfDump for the underlying data processing and proling, and a customised NfSen interface to create and view data sources, including instances of aberrant behaviour detected using the Holt-Winters functionality within RRDtool.
2.4.1
Architecture and Organisation
The architecture of NfSen-HW is much like the architecture of NfSen, the main dierences being the extra processing done as part of RRDtool, and the redesign of the front end. As you can see from comparison of gure 2.2 and 2.3, there are no alterations to the actual framework of NfDump and NfSen, merely the addition of taking into account the Holt-Winters forecasting within RRDtool [Kiss & Moh`csi, 2006]. a
Figure 2.3: NfSen-HW Architecture The forecasting algorithm reads from and updates the individual RRD les, adding data into the Holt-Winters specic RRAs. When a forecasted value is considered to be too deviant, it is marked within the RRD les such that during the next scheduled processing event this is being displayed on the web front end. The plugin architecture within NfSen is such that perl modules of a particular format can be included as processing scheduled to be run every time an update occurs, every ve minutes. These plugins are held within a rigid framework and can provide information to a front end plugin, simply a php page included in the front end plugin directory. Using
27
this method certain extra processing can be performed tailored to a specic network or need. In the case of NfSen-HW, the plugin architecture has not been used to implement the extra processing and changes required to update RRDtool for Holt-Winters forecasting correctly. Gabor Kiss said this is due to the organisation of NfSens modular structure; in order to have acheived what he has within a plugin, he would have had to repeat large pieces of the underlying code base within the plugin itself, because of this he chose to simply modify the source code and has submitted suggestions to Peter Haag as to how the modular framework could be improved. 1 In conclusion, this provides a very useful platform for detecting aberrant behaviour, but it does not full all of the criteria laid down for use within DANTE. In their case the amount of network data available is incredibly large, and even in graphical form it can be too much to take in visually. With NfSen-HW there is no immediate way of indicating network anomalies without an administrator examining the correct graph at the right time. This might not seem like much of an initial issue, but due to the size of the NetFlow data being captured per day they can only keep hold of a certain amount of NetFlow data, and from that there would not be the space to hold unlimited sizes of RRD les for proles. If an RRD can only ever be a certain size, that size might only be one days worth of aberrant behaviour indications and hence after 24 hours the indication of aberrant behaviour for that prole is lost.
2.4.2
Holt-Winters Forecasting
There are three separate sections which explain the mathematical process which constitutes Holt-Winters Forecasting, rstly: Single Exponential Smoothing This is a simple algorithm for predicting the next data value in a time series and can only be used for predictions in time series where there are no trends in results. A weighted average is taken of all previous time series values, weighted such that the most recently recorded values are worth the most. This is because logically the most recent values are the most relevant to any further values. This is acheived by assigning geometrically declining weights to previous values which decrease over a constant ratio the further back they go. This forecast can be updated using only two pieces of information, the latest observed value and the previously calculated forecast. For this to work successfully it is important to choose the smoothing constant carefully, high values (0.8/0.9) will place a heavy emphasis on the newest values in the time series where as low values (0.1/0.2) will stretch the weight further giving further promenance to values in the past. A smoothing
This explanation occurred during a telephone conference on 24th January 2007 involving myself, Maurizio Molina (Network Engineer, DANTE), J`nos Moh`csi and Gabor Kiss (NfSen-HW Developers). a a
1
28
constant value of 1 would result in the forecasted value being equal to the previously observed result. Holts Method The second section is what is known as Holts Method, the introduction of the possibility of some trend in the values of a time series, and to take this into consideration when forecasting the next result. This is done by creating another variable, the slope variable, which keeps track of the direction in which the trend is heading. This variable is also updated using exponential smoothing hence there are two smoothing constants to choose values for. In the initial case these must be given values, usually in the region of 0.02 < 0 1 < 0.2 where a0 and a1 are the two smoothing constants. Holt-Winters Forecasting The third and most important section is the actual forecasting algorithm. This is an extension to Holts method which not only takes into account the possibility of some trend in time series values, but also the potential for seasonal variation over dierent time periods, for example daily, monthly or yearly seasonal traits. The observed time series is broken down into three componants, each of which can be calculated to forecast further values: The Baseline (or Intercept) The Linear Trend (or Slope as it was referred to previously) The Seasonal Trend The results are still calculated using exponential smoothing, but dierent weighting is applied dependent on which component is involved. In the case of the seasonal trend, since the current point within the season is known, the last known value for the same point in the season can be referenced and given most relevance in calculating a prediction. Aberrant Behaviour Detection is then performed using condence bands. Since an resonably accurate prediction can be made regarding the next value in a series, it is possible also to dene limits that condently the value will fall between. In other words, in the case that the actual next value is not exactly the same as the predicted next value, to what limits are we condent that it still follows the current trend and seasonal variations. If the actual value is beyond these limits, either higher or lower, then depending on the magnitude by which the prediction is incorrect, the actual value can be classied as aberrant, compared to previous known values.
29
This is a very simplied explanation of the Holt-Winters forecasting process with a reduced emphasis on the mathematical formulae involved. It is based upon information given by Jake Brutlag and on Chateld and Yars investigation into the practical issues of Holt-Winters forecasting, more detailed explanations of the algorithm can be found in these sources [Brutlag, 2000a; Chateld & Yar, 1988].
3 Design
3.1
3.1.1
Requirements
Network Operators Workow
The creation of requirements for this project requires an understanding of the situation in which the system will be used. A network operator has a considerable amount of day to day responsibilities other than identifying and rectifying network problems, in some cases detecting issues on the network will be less of a proactive feature of their work, more something which might be triggered by a report from a user of a specic issue they are facing. The result of this is that quite often network issues will go unnoticed and unattended until they become enough of a problem for an end user to complain. In a situation like that of DANTE the size of the network that is being monitored and they amount of data that traverses it means that even with a network monitoring application showing graphs and trends of network activity, it is very easy to miss a network event which only eects a small portion of the network, or small number of sources of trac data. Figure 3.1 gives an example of the actions taken by a network operator when a problem is detected or reported. With their current infrastructure, the majority of that process involves tracking down the problem and then using separate applications and tools to gain a better understanding. There is no facility for easily seeing other aected sites without going through the same process multiple times. In order to discover if other network operators have previously investigated or dealt with the problems identied they must access a separate ticketing system and specically identify the sources and time periods in question. This means that if a problem has already been analysed and explained previously, there is the possibility a second operator may have to go through the same process a second time. Finally, as mentioned previously, due to the amount of data being monitored network problems could be missed. In a case where someone reports a problem which has been ongoing for longer than a certain period then the original NetFlow data covering the time that the event began will probably have been deleted, in DANTEs network monitoring setup the length of that window is two weeks. This would result in all analysis and diagnosis being performed based on the data held in the RRD les and graphs which, due to the Round Robin nature of RRDtool, will become less accurate as time passes.
3.1.2
Requirements list
Based upon this understanding of the situation, a set list of requirements have been dervived, each of which should be met for the solution be considered a success. This list 30
CHAPTER 3. DESIGN
31
Figure 3.1: Use Case diagram depicting the diagnosis of a network anomaly
was completed after a series of discussions with a network operator at DANTE. Overall Outcome There is a simplistic overall outcome to be achieved by attaining each of the individual requirements which was the initial starting point for derivation of more specic needs.
To assist a network operator in the identication and diagnosis of network problems and illustrate how the inclusion of automated aberrant behaviour detection could improve large network monitoring.
CHAPTER 3. DESIGN High Level Requirements
32
Working from this overall end aim has produced a short list of high level, slightly more focussed requirements: A Automatically indicate aberrant network behaviour instances as they occur in a clear, coherent fashion. B Allow the display of aberrant network behaviour instances to be tailored to the information the operator deems relevant. C Supply enough information about each aberrant network behaviour instance that a preliminary analysis can be made straight away. D Indicate possible links between indicated aberrant network behaviour instances. E Keep a historical record of aberrant network behaviour instances and basic analytical details. F Provide an exible interface to past aberrant network behaviour information. G Provide a means of indicating that aberrant network behaviour instances have been investigated. Fully Derived Requirements List Finally based upon the high level requirements, a fully derived requirements list can be created. These are broken down into separate tables relevant to the high level requirement they satisfy. These numbered requirements will be reviewed at the end of the project as part of the Testing and Evaluation chapter of this report.
A A.1 A.2 A.3 A.4 A.5 A.6
Automatically indicate aberrant network behaviour instances as they occur in a clear, coherant fashion. Aberrant network behaviour instances should be displayed together on one page organised by the time they occurred. Only the most relevant information for each aberrant behaviour instance should be displayed. Aberrant network behaviour instances should be aggregated to display one event per continuously agged period. This display should automatically update as new aberrant behaviour is detected on the network. The display should be accessible from machines other than the machine it is installed on. Each aberrant network behaviour events should be displayed in an identical style so quick comparisons of information can be made. Table 3.1: Derived Requirements List for High Level Requirement A
CHAPTER 3. DESIGN
33
Allow the display of aberrant network behaviour instances to be tailored to the information the operator deems relevant B.1 The information displayed as part of the live update can be ltered to show only instances which match particular conditions. B.2 The default update should contain information the network operator believes to be the most relevant in the rst instance. Table 3.2: Derived Requirements List for High Level Requirement B
C.1 C.2
C.3 C.4
Supply enough information about each aberrant network behaviour instance that a preliminary analysis can be made straight away. Further information should be available for each aberrant network behaviour instance on request. This information should include, at the least, a graph of the time frame in question and a brief statistical synopsis for the given period and trac type. This information should be persistant beyond deletion of the actual NetFlow records for that aberrant network behaviour event. It should be made obvious if a particular aberrant network behaviour event has been agged as a false positive when examining further details. Table 3.3: Derived Requirements List for High Level Requirement C
Indicate possible links between indicated aberrant network behaviour instances. D.1 If further information about an aberrant network behaviour event is requested then a display should also be provided of possible associated events. D.2 Further information pertaining to these associated aberrant network behaviour events should be available on request. Table 3.4: Derived Requirements List for High Level Requirement D
Keep a historical record of aberrant network behaviour instance and basic analytical details. E.1 Detected aberrant network behaviour events should be recorded in some form of persistant database. E.2 The database should be reliable, quick to query, and scale well to holding potentially very large data sets. Table 3.5: Derived Requirements List for High Level Requirement E
CHAPTER 3. DESIGN
34
F F.1 F.2 F.3 F.4 F.5 F.6
Provide an flexible interface to past aberrant network behaviour information. It should be possible to view past aberrant network behaviour event details based upon a number of criteria; Exact Start time and End time. Start time somewhere between two given dates and times. End time somewhere between two given dates and times. Alongside queries based upon the starting and end times results should be chosen according to further specic information; type/source/prole etc. When results have been found it should be possible to view further information about an event in the same way it would be possible for a live event. Table 3.6: Derived Requirements List for High Level Requirement F
Provide a means of indicating that aberrant network behaviour instances have been investigated. G.1 Aberrant network behaviour events stored in the system should be able to be agged as acknowledged when they have been dealt with. G.2 Aberrant network behaviour events stored in the system should be able to be agged as a false positive if they have been identied as such. G.3 Operators who have dealt with a particular aberrant network behaviour event should be able to leave some comment regarding their ndings for the benet of later users. Table 3.7: Derived Requirements List for High Level Requirement G
CHAPTER 3. DESIGN
35
3.2
Design Decisions
A brief justication of the tools and systems being used within the Sentinel system design.
3.2.1
NfSen-HW
This system has been chosen to provide a basis for the network trac data analysis and for the aberrant behaviour detection. This is for a few reasons, rstly it is the most complete package available in this area of network monitoring. What it provides is a reliable, mathematically proven platform for detecting network anomalies packaged such that installation and conguration is not an arduous task. Secondly the Network Operators at DANTE already have good working experience of NfSen, the non aberrant behaviour ` detection capable version of this software. Due to this the exchange of GEANT2 data for Sentinel development and testing should be more straightforward as the ows can be transferred as already organised compatible format les.
3.2.2
Java
Java 1.5 will be used to process the RRD les produced by NfSen-HW for aberrant behaviour marks. This was intended to be done using a Java RRD library, allowing Java to directly interface with the RRD les, some examples of such libraries are compared in the JRA1 Perfsonar wiki [RRD Java Libraries]. Unfortunately due to the version of RRDtool required for use with NfSen-HW the libraries will not read the RRD les that are produced by it. The most complete library, JRobin, required the use of a convertor before the RRD libraries and whilst JRobin itself would produce the results I required, the convertor did not support the version of the RRD les being used and so could not convert them [JRobin, 2006]. Instead then a tool within RRDtool will be used, rrdtool dump. This was mentioned earlier in the Background and Related Work section and produces a full XML representation of the contents of the RRD les. Java by default contains very exible XML parsing libraries and so once the RRD les have been exported to an XML format, it should be possible to read in the appropriate aberrant behaviour results. Java also contains methods and functions for connecting to, querying and alerting sql compliant databases, and this will be used to insert the collated aberrant behaviour events into the database to be used by the front end. Using Java in this fashion should mean that the nished application is completely portable to any system upon which NfSen-HW has been installed, regardless of architecture unlike RRDtool. Java is portable to any system or architecture providing it has been installed, and this should mean that the end application will run in any NfSen-HW environment,
CHAPTER 3. DESIGN
36
3.2.3
MySQL and PHP
The database will be stored using MySQL 5.0 and the web front end written using PHP5 [MySQL, 2007; PHP, 2007]. MySQL is an open source database implementation very widely used in web based applications and PHP a server side embedded scripting language which allows processing to be applied with results displayed to a webpage. These are two highly exible and frequently integrated pieces of software which should provide an excellent platform for the aberrant indication history and presentation. They provide all the necessary tools and functions to complete the project in the easiest way possible, allowing complex database queries and functionality within PHP for service and IP address lookups.
3.2.4
Debian GNU/Linux
Debian GNU/Linux will be the operating system platform for Sentinel. This is rst and foremost because the installation of NfSen-HW requires a well maintained and compliant Linux distribution, but secondly because of my familiarity with Debians system architecture and knowledge of Debians excellent package management system apt [Debian GNU/Linux, 2007]. This should mean that the installation of certain necessary software, such as Java and PHP, will be a simple process leaving more time for development and testing. Also Linux generically comes with a number of useful applications which will be required for this project, the most important being Bash or Bourne-Again SHell and Cron. Bash is the command line interpreter which comes as standard with GNU operating systems [Bash, 2007]. It provides a text based user interface to execute commands but also allows les containing commands to be created, Bash scripts, which is what will be used to initiate the Sentinel java process on specic RRD les. Cron, or more specically Vixie Cron, is a background process or daemon which exists to execute scheduled commands at specic times. Using formated conguration les known as crontabs Cron can be set to run a particular command or script at a set point every minute/hour/day/month. Cron will be used to ensure that the runSentinel.sh script is executed every ve minutes to correspond with NfSen updates.
CHAPTER 3. DESIGN
37
3.3
System Architecture
As described within the Design Decisions section, the system is made up of many smaller componants which interact with NfSen-HW, NfDump and RRDtool.
Figure 3.2: Overview of Proposed System Architecture Figure 3.3 illustrates how the various sections and systems interface with each other. As can be seen, NfSen-HW is an important part of the back end, Sentinel bases its aberrance detection upon the events identied by RRDtools Holt-Winters forecasting. The rest of this section should give a more detailed description of what each of the
CHAPTER 3. DESIGN individual components do.
38
3.3.1
NfSen-HW and NfDump
The operation of these two applications has already been covered by previous chapters of this report, but here is an overview of their use within the wider Sentinel indication system.The Network Flow data from all sources is captured, as shown in Figure 3.2, by individual instances of the nfcapd capture daemon. This is then analysed and organised according to specied prole lters by NfDump. This information is then processed by the front end system, NfSen-HW and the specic parameters are passed to RRDtool for the creation of RRD les for each source/prole. The Holt-Winters forecasting occurs as part of this process within RRDtool itself, and the resultant aberrance indication data is updated to each individual RRD in RRA sections specic to aberrant behaviour detection. Once this information has been stored within the RRD les then NfSen-HW plays no further part in the abberance indication process. This occurs once every 5 minutes, as the nfcapd NetFlow data les are rotated allowing new NetWork trac data to be analysed.
3.3.2
runSentinel.sh
runSentinel.sh is a Bash script which is executed once every ve minutes by Cron. It traverses the directory structure that holds the NfSen-HW RRD les, uses the rrdtool dump command to export them to XML and runs Sentinel.jar on each le to pull out the aberrant behaviour. This is merely a method of ensuring that the aberrant event database is updated every ve minutes, the same as the RRD les themselves, which should ensure that no aberrant events are missed.
3.3.3
Sentinel.jar
This is the Java le which is responsible for interpreting the contents of the RRD les and then for inserting that information into the Sentinel database. This is done by parsing the XML outputted version of each RRD le created using runSentinel.sh and then using Javas inbuilt SAX XML parsing libraries. The default XML handler provided by the library is extended to create an XML handler which only looks for the specic sections of XML that are required to retrieve the aberrant behaviour data. Information about each agged event is pulled out and placed in an AberrantBehaviour object and once the XML le has been parsed to pick up ever instance of aberrant behaviour, the collection of AberrantBehaviour objects are inserted into the Sentinel database. It makes use of the JDBC libraries within Java for connecting to and manipulating data within databases, in this case using the MySQL connector.
CHAPTER 3. DESIGN
39
Figure 3.3 depicts a high level UML diagram of the classes within the Sentinel Java component. As you can see from the diagram most of the complexity is within the RRDDatabase, the XML parser simply pulls out the relevant information. It should be noted that there are two forms of parse available within Sentinel, the rst is the default, a scan for any aberrant behaviour which has been indicated in the ve minutes previous to the last updated time. This is the form that will be run by the runSentinel.sh script every 5 minutes, and ensures that only the latest information is pulled into the database as it is updated. The second is a full scan, trigged by a command line argument, which will go through and parse an XML le for every single aberrant event that it contains. This is designed to be run the rst time the system is put into operation, to retrieve the backlog of aberrant events into the database for historical purposes.
Figure 3.3: Sentinal Java UML Diagram
CHAPTER 3. DESIGN
40
3.3.4
Sentinel Database
This MySQL data base stores all information about aberrant network events, including their type, source, prole and a basic amount statistical information. Here is an entity relationship diagram for the database schema:
Figure 3.4: Sentinal Database Entity Relationship Diagram As you can see, an aberrant event can have one type, prole and source, but each of those could be applicable to many events. Here is an overview of the contents and responsibilities of each table within the database schema. events Holds information relevant to an aberrant network event, including the type, source and prole via foreign key links to other tables. A start time and end time is held per event, as well as a comment and a marker indicating acknowledged and false positive status. Also brief statistics are held, taken from nfdump and a lookup of port/hostname. A simple table containing all possible types of network data and an id number for linking purposes. Contains all the sources seen so far with an id number for linking and a description eld to store further brief information about each source. Contains all the proles seen so far with an id number for linking and a description eld to store further brief information about each proles. Table 3.8: Sentinel Database Tables This nal diagram illustrates how the tables will link together and the connections that
types sources proles
CHAPTER 3. DESIGN will take place using foreign keys.
41
Figure 3.5: Simple foreign key linking example These diagrams give a good precise description of the contents of the tables and the relationships between them, but it is also important to understand how the data within the tables will be used by the other sections of the Sentinel system. Firstly, and most importantly, the events table which links together all the relevant information about a particular aberrant event. The table contains a unique event id as its primary key; by using an integer and separating this necessity away from the actual held data should mean that indexing of the table is a lot quicker and lookup times should be improved. Second to that are two columns relating to the time that the event took place. The way that aberrant behaviour detection is implemented within NfSen-HW and RRDtool means that one particular network event will be agged within the RRD as a continuing series of 5 minute long segments. Since it is quite obvious from viewing the produced graphs that each individual 5 minute long segment is not an aberrant event in its own right, this design holds single events by storing the start time and the end time of each event; from the RRD this would be the rst 5 minute segment that the aberrant behaviour was indicated, and subsequently the last 5 minute segment that it was indicated. In the case of a live updated page, the end time would be the last time that aberrant event was seen as active as, without seeing the next segment in time, we cannot predict when a series of aberrant markers is going to end. The events table then holds three foreign keys, linking to tables containing information about the type, source and prole of an aberrant event. Next is a comment eld where network operators can comment on an event, leaving messages about any research they have undertaken to solve a problem. From that there are two boolean ags, rstly an acknowledged eld, where network operators may mark events as having been dealt with, and secondly a false positive eld. This can be used if NfSen-HW has incorrectly identied a period of time as an aberrant event. These elds are merely present for ltering purposes, when using the
CHAPTER 3. DESIGN
42
system a network operator does not want to be presented with falsely identied events if they have been idened as such. The nal two elds within this table are simply text elds containing more detailed information about the ows which were occurring during the idened time frame of the particular network protocol. While the system is being used as a live update, this information is will most likely be retrieved from NfDump directly but once an event has been marked as ended and time has passed without it becoming active this information will be stored in the database for two reasons. Firstly, this will speed up the front end considerably, once an event has nished there will be no new ow data added to it, the information which can be garnered from ow statistics and hostname lookups is not going to change so removing the need to requery the stored ows should save time. Secondly, in cases such as at DANTE where the Network Flow data is only held for a restricted amount of time, this will keep at least some basic level of information connected to an event where it can be examined at a later date. If this were not done, at a point in the future when information about a past event was retrieved, the lookup from NfDump could not be performed due to the NetFlow data no longer being present on the system. The other tables in the database schema are quite similar, the types table contains a numeric primary key and a corresponding network trac type. NfSen-HW chooses to specify 15 types of network trac data which do not change throughout the rest of the system, these correspond to ows, packets and nally trac. For each of these there are 5 subcatergories, rstly all trac within that classication, then all tcp trac, all udp trac, all icmp trac and nally other which catches all other kinds of network trac protocol (for example, PIM or OSPF). The proles and sources tables are practically identical other than content, one contains information regarding the data sources being used, the other information regarding the proles that have been congured. They both contain a numeric primary key and a name for the source/prole being stored. The nal optional eld is a description eld, a place for further information about a source or prole. This might be used to clarify a certain source or proles reference, something which might not be immediately apparent from the short name.
3.3.5
Sentinel Web Interface
The Sentinel web interface should provide three dierent views on the same data. The rst view is a live update screen showing all the aberrant behaviour which has beem identied by the system during a congured amount of time, for example, the last 24 hours. The second is a more detailed view of a specic aberrant event with further information and details to help identify the source of the problem. The third is an interface to search the database of stored aberrant events based on when they occurred, what kind of trac was involved, which sources. Each of these interfaces will be discussed in turn with a prototype of the end design.
CHAPTER 3. DESIGN Live Update
43
Figure 3.6: Proposed Live Update Web Interface
This interface is designed to be simple and easy to view at a glance. The aberrant events which have occurred within the specied time frame are displayed in a tabular format, just containing the information immediately necessary to gain an initial understanding of what has happened. They are ordered by end time, in other words, the events which were active most recently are near the top. It is possible to further lter the aberrant events displayed, perhaps to group together events aecting a particular data source or trac type. This is done using the lter interface at the top of the screen, on clicking submit the page would be refreshed showing only the data relevant to the options selected. By default on this display any events which have been marked as a false positive, or as acknowledged will not be displayed. This stems from understanding a network operators workow, in most cases if an event has been dealt with or is being dealt with then it should not be listed as an event in the Live Update requiring attention. It could be necessary for an operator to compare a currenrly un acknowledged event with other previous events regardless of acknowledged or false positive status, in this case the lters can be temporarily alters through the lter interface to display all events within the given
CHAPTER 3. DESIGN timeframe, regardless of the ags applied. Details
44
Figure 3.7: Proposed Details Web Interface
The Details interface is designed to contain as much information about the event as possible in one place. The stored information about the event is rst presented in text form at the top of the page, this is the longer form of the information containing comments, ags and descriptions. The information specic to this event can be edited from this page,
CHAPTER 3. DESIGN
45
further towards the bottom there is a small entry form. This will be auto completed to contain the information that is stored currently about that event so it may be edited / removed as appropriate. The graph covers a time period relevant to the event and underneath is a brief synopsis of statistical analysis from NfDump based upon the start and end times and the classication of trac that was indicated as aberrant. There is also here a presentation of the top few ows with the port numbers and hostnames looked up. This is to present the operator with as much information as possible in one place so they arent required to use separate applications to perform the analysis necessary. Finally on the page is a display of events that have been idened as associated with this one, primarily by when the events occurred. If two events are entered as starting and ending at identical times, then logically they are going to be related in some way. This gives a network operator a better feeling for how widespread the problem is. Review
Figure 3.8: Proposed Review Web Interface
CHAPTER 3. DESIGN
46
The Review section is primarily for accessing data about events that have fallen outside of the Live Update time period. Events can be searched for initially based upon their start and end date/time and secondly against lters like the Live Update page. Further Details of events which have been found will be displayed through the same Details page as mentioned previously. Similarly to the Live Update section, the Review page is concerned with presenting the essential pieces of information clearly and concisely, if an operator is interested in a particular event then further information can be found by clicking on it.
4 Implementation
4.1 Method of Implementation
The system was implemented over a number of weeks, initially though the focus was on gaining a full understanding of NfSen-HW and its operation alongside RRDtool and how to use it. Once this had been acheived the focus moved to the creation of the Java package to parse the XML using small RRD les dumped to XML format for testing as it was produced. A small diculty was encountered with the parsing due to the organisation and content of the XML le and this will be discussed in more depth in the Sentinel.jar section of this chapter. Once the package was operating correctly the database tables were created and development in Java continued to ensure the correct insertion of data. Next the Bash script runSentinel.jar was created and then the back end of the system could be put into proper operation. Lastly the web interface was produced taking data from the Sentinel Database. The implementation of each of these sections will be discussed in more detail under the headings which follow.
4.2
NfSen-HW
This is was not technically implemented as part of the system, but its installation and use did cause some initial problems for development. NfSen-HW is based on the 20060412 snapshot of NfSen, a non-stable version which appears to have some bugs. The biggest problem was with the creation of NfSen-HW instances with previous data; the application would work perfectly if, on creation, each source was specied and there was no earlier data to be imported. Unfortunately due to the way data was transfered from DANTE the only data available for use was technically past data from a large number of sources which needed to be imported before graphs or aberrant detection could be displayed. After a large amount of experimentation it became apparent that if all past data is present at the very moment you create the NfSen-HW instance for the rst time, on the initial start up it will go through every stored NetFlow le and create appropriately dated RRD les. If past data is added at a future time even initiating a rebuild of the RRD les will not allow the creation of graphs based on this new data. This is because when NfSen-HW creates the RRD les it has to give a starting time for the data it contains, any later addition of previous data failed to change the starting date and so any earlier data was ignored. Another problem was the inability to add new sources of data once an NfSen-HW instance
47
CHAPTER 4. IMPLEMENTATION
48
had been initialised. The conguration options can be altered but on rebuild even though the correct RRD les would be constructed, no data was added to them. This resulted in graphs claming to contain data from new sources but never actually displaying any content. The data received from DANTE contained over twenty separate sources of NetFlow data, and was not delivered to any installation of NfSen-HW live, that is to say, as it was produced by the routers. What was received was backlog of nfcapd archived les since the last update of NetFlow data was performed from my machine performed using rsync over ssh. The only solution to this and the previously mentioned problem was, with every fresh installment of DANTE NetFlow, to reinstall NfSen-HW and rebuild the RRD libraries which was somewhat time consuming. Due to the amount of data being received, a rebuild of RRD les after a reinstall could take over three hours, the size of data was approximately 20Gb per week, with an initial download of 83Gb, now nearing the end of the project the space required to hold the NetFlow has surpassed 400Gb. The implications of this were a little further reaching; as the data received from DANTE was never live it meant that it was impossible to test the system using that data in a live situation with aberrant network data events being updated at ve minute intervals. Initially I thought that having the RRDs based on the old data would allow me to do a full scan and collect the aberrant network behaviour events which occurred throughout the period the data covered. Unfortunately it appeared that the RRD les only hold the aberrant behaviour markers for 24 hours, after which point they are removed. This caused me to think about how RRDs work, they archive information based upon a number of averages; over time the results lose their granularity and trends become more vague. In the case of Holt-Winters Forecast results, they are held within the RRD structure as a binary 1 or 0 marker. Binary data like this cannot be averaged, a 1 or 0 result makes no sense if it becomes translated into 0.8 at some point in the future, and so it became obvious that RRD les must only hold their Holt-Winters marks for a set period of time. Through discussions with a network operator at DANTE and a telephone conference with Gabor Kiss and J`nos Moh`csi, the developers of NfSen-HW, it appeared a a that Gabor when creating the system had never set it up personally without having old RRD les which he wanted to reimport. He then ran a perl script called Holt Winters Reapply to take data from the RRDs les, run Holt Winters forecasting on it, and create new HW capable RRDs for use with his system. Whilst in my case there was old data being imported into NfSen-HW, it was not in RRD format, but infact NfDump archived NetFlow data. This meant that the entire process of creating RRD les was done via NfSen-HW, and the default time period parameters were hard coded. The solution was to run the Holt Winters Reapply script once the any NetFlow data had been incorporated into a new NfSen-HW install which took more time upon each new NetFlow installment arriving. Even having done this, the RRD les will only hold their Holt-Winters marks for two weeks. This meant that the historical perspective functionality of my system became all the more critical. In order to ensure the Sentinel system was working correctly collecting its information from a live data source, another installation of NfSen-HW was performed, this time
49
running using data exported from a router based in my home on a small scale testing network. Whils the data size is not nearly as large as that from DANTE, it worked as was expected detecting aberrant network behaviour events of various kinds. It is using this installation that the majority of the development work was performed.
4.3
runSentinel.sh
The Bash script holds everything together by navigating the directory structures and converts each RRD le to their XML equivalent. It then runs the Java XML parser over the XML le with the correct parameters and nally deletes the XML le so as not to interfere with further conversions. To understand the way the script works an understanding of the directory structure used by NfSen-HW is required.
/home/nfsen-hw/profiles/
This is the root directory for any RRD data to be held. RRD les created using no lters or proles are stored within a prole known as live. It can generally be accepted that there will be a live prole part of every installation of NfSen-HW which actively captures NetFlow data but runSentinel.sh does not make that assumption.
sara@fairlop: profiles$ ls live/ profile1/ profile2/
Asking for a directory listing of the proles directory would yield results similar to this, where each of the les listed is actually a directory containing all data related to that prole name. Looking inside a prole directory shows the actual images which RRDtool creates.
sara@fairlop: live$ ls flows-day.gif DataSource1/ flows-month.gif DataSource1.rrd flows-week.gif packets-day.gif flows-year.gif packets-month.gif
packets-week.gif packets-year.gif profile.dat traffic-day.gif
traffic-month.gif traffic-week.gif traffic-year.gif
The two important listings here are DataSource1.rrd, the RRD le containing all the data for this prole and data source, and the directory DataSource1/ which contains all of the nfcapd archived NetFlow les. This is repeated for every named prole directory within /home/nfsen-hw/profiles. The Bash script therefore works by changing directory into /home/nfsen-hw/profiles and reading in the directory listing as a list of les. For every le found in proles, it changes to that directory, and reads in a le list of all les that ends in .rrd. This
50
should give a list of all RRD les, and hence Data Sources for that Prole. Knowing this, and its current working directory, it then executes the command rrdtool dump on each RRD in turn, creating the XML formatted le, and then runs Sentinel.jar passing in the correct directory paths as parameters to read the XML le.
4.4
Sentinel.jar
The Java XML parser was completed mostly to the specication given in the Design chapter. Here is a more specic UML diagram of the component classes.
Figure 4.1: Sentinel Java UML Class Diagram
51
There is one additional class which was not present in the original design, the AberrantMark class. This is due to some unforeseen problems with parsing the XML les which shall be mentioned in more detail later in this section.
4.4.1
Implementation Overview
Sentinel.jar is executed via a call from the runSentinel.sh Bash script and is passed the appropriate parameters to know the location of the RRD le that to be processed. When passing the parameters it is important that the full path to the chosen RRD le is given; this is because the RRD les themselves contain no reference to the prole or source they correspond to. Such information can only be retrieved from the directory structure and lename. When Sentinel.jar is run, the rst thing that happens is the passed in parameter is broken down into its component parts and the source and prole name stored. An instance of RRDDatabase is created and the prole and source are set within it. This is where the information about the RRD le being parsed will be saved, including a list of AberrantBehaviour objects, one for each aberrant nework behaviour event that is retrieved. When the XML parsing has nished, the Main driver class gets the Vector of AberrantBehaviours from the RRDDatabase using the getAberrantBehaviour() method. This Vector is then iterated through and based upon its start and end time, the information is inserted into the database.
4.4.2
Problems with XML Parsing
Originally I had assumed that parsing the XML le would be as straightforward was lookng for the tags within the FAILURES section which were marked as 1.0000000000e+00 rather than 0.0000000000e+00 and retrieve the timestamp for that FAILURES entry. Unfortunately on closer examination of the RRD structure, the individual entries in the FAILURES section do not contain a timestamp. The only timestamp available within the le is the one marking the instant that it was last updated. In order to solve this problem in a generic and portable way, every aberrant network behaviour marker required its time calculating based upon its place in the le, working backwards from the last entry, which logically is equal to the last update time. An AberrantMark class was created which is created whenever an aberrant mark is found, as the le is parsed every entry which would have occured with a time update is counted and when an aberrant marker is located, the number of the row it was from is stored within the AberrantMark object created. When the le has been fully processed then the exact number of entries is known and the timestamp for each event can be worked out using simple mathematics. Secondary to this, when an AberrantMark is located, it is important to note which eld or elds the mark occurred in. For each update there are multiple types of trac being graphed, and the type of trac the aberrant behaviour occurs as part of determines which eld the mark occurs in. This number was stored inside the AberrantMark instance for each event and translated back into a human readable name in the Main driver class.
52
4.4.3
Database Insertion
Sentinel.jar uses the JDBC libraries for connection to the Sentinel MySQL database but there are some checks made before the data is inserted. The database design is such that each aberrant instance cannot just be inserted into the events table as any number of aberrant network behaviour marks may be aggregated into one event if they are part of a series with the same type, source and prole. The rst information to be updated is the prole and source data, this is so the identier can be retrieved to be inserted as a primary key in the events table. A check is made to ensure that the same prole and source are not already present in the database, if not they are stored and the id numbers saved. Once this has taken place, the actual aberrant network behaviour information can be checked. For every event, the type id is retrieved, and then a query performed to see whether an entry exists with the exactly the same information apart from an end time stamp 5 minutes previous. If this is the case then the end time in the database is updated to the end time of the new aberrant instance, and the rest of the information left as it was. If there werent any prior entries in the table tting that descriprion then a new entry is created with that information, and so the process continues until there are no more AberrantBehaviour objects left.
4.5
Sentinel Database
The creation of the database was exactly as laid out in the design, here is a more detailed UML diagram of the datatypes and interactions between tables.
Figure 4.2: Sentinel Database UML Diagram
53
4.6
Sentinel Web Interface
The web interface was implemented as illustrated in the Design report also, written in PHP and divided over 3 separate sections. Here is a brief overview of each page and what how it was implemented.
4.6.1
Live Update
The Live Update operates initally using a default SQL query. It makes a connection to the Sentinel database and retrieves all events whose end timestamp was with the last 24 hours. It lters the results in order to not show events which have been acknowleged or marked as a false positive as part of the default view. Second to that there are three smaller queries which get a current list of all proles, sources and types being used within the system. Along with the option to show acknowledged and false positive events, this information is used to create the lter functionality. Operators can choose certain information they would like to see by ticking checkboxes. When the submit button is clicked, the values that have been selected are submitted back to the same page, the page detects that selections have been made and the choices are retrieved from the POST array and assembled into appropriate SQL queries. Here it was important to ensure that the SQL logic was correct using brackets to separate parts of queries. The assembled queries are performed and the results displayed in the same style as the default query would. Alongside this, the page auto-refreshes every ve minutes to ensure the displayed results are as up to date as possible.
4.6.2
Details
The Details page is initialised by a user clicking on an event for more information, this passes the event id via GET to the Details page. For security reasons it is important when using the GET method in this situation to validate the value that has been passed; in Sentinels case this checks that the value passed is numeric, which removes the ability for malicious users to perform SQL commands upon the Sentinel database. The graph is drawn by sending the appropriate values as part of the GET request to rrdgraph.php, a part of NfSen-HW. The appropriate values are: prole name : separated list of sources proto type: any, TCP, UDP, ICMP, other ows, packets, trac prole start time - UNIX format start time - UNIX format end time - UNIX format left time of marker - UNIX format; 0 is no marker
CHAPTER 4. IMPLEMENTATION right time of marker - UNIX format; 0 is no marker width of graph heigh of graph light version ( small graphs ) - no title or footer linear or log y-Axis linear or log y-Axis
54
Using this it is possible to draw a graph for any period, with specic markers depending on the options chosen. The Details page draws graphs starting a number of hours before than the actual start time of the aberrant network event. Also an amount of time is added to the end time, just to give a better view of what happened at that time which removed the aberrant marker. This is done by either adding one hour or showing all trac up until the current last update time, whichever is smaller. Having done this, the time period covered by the event itself is marked and on the graph appears as highlighted in green. The next stage displays some statistical analysis of the ows during that time period, either by retrieving the already performed nfdump query from the database or, if the event is still ongoing, by directly querying nfdump via PHP exec() and displaying the results back to the webpage. This was not as straight forward as passing in the start and end times due to the way Holt-Winters forecasting detects aberrant results. An aberrant mark is displayed based upon the next value in a time series being mathematically too deviant from what was expected, because of this an assessment of aberrance cannot occur at the exact time the network event begins to happen, it is only realised a set amount afterwards and marked from then onwards. I found that if analysis was performed based upon the exact start and end times, then the results would quite often not cover the period of aberrance. To get around this I conducted some experimentation into the average amount of time passes between the aberrant event starting and the aberrant mark being set. Figure 4.3 llustrates the dierence between the start of the aberrant event, and the initial marker being placed. I found that in most cases, unless the aberrant event was exceptionally out of the ordinary, if 40 minutes was subtracted from the aberrant event markers start time then the start of the actual network activity was included in the statistics. Figure 4.4 gives an example of this, and further examples are available in the appendix, section E.
55
Figure 4.3: Aberrant Marking Example
Figure 4.4: Subtracting 40 Minutes Example
56
Depending on the aberrant network event, dierent lters are applied to the nfdump query to produce the most appropriate results, for example, only showing TCP trac. A second version of the query is also performed to retrieve only the top four ow statistics of that kind, and the result is requested in machine readable format. This produces a similar result but there are no human readable lables and each entry is separated by the pipe symbol. From this I pulled out the source and destination ip addresses and port numbers, and these are looked up using two PHP commands, getServByPort() and getHostByIP(). The results are then displayed to the page in a similar format to the nfdump output. The last two sections of the Details page deal with editing information about a particular stored event and showing potential links between it and other stored events. The details which can be edited are those which are immediately specic to that event, so the comment and the acknowledged/false positive ags. Source and prole descriptions are not specic to an event so they are edited elsewhere. The method of implementation is a simple form which redirects the details lled in via the POST array to another page where it is inserted into the database. Associated events are displayed in a similar style to Live Update, but only if they match the strict similarity criteria; starting and ending at the same time as the currently viewed event. Details of these events can be viewed in the same way as from the Live Update page.
4.6.3
Review
The Review page is quite straight forward in comparison, past data can be queried using a form allowing searching by exact start and end time, start time between two dates and end time withing two dates. Other lters can be applied such as specic source, prole or type. Acknowledged and false positive events can be excluded or included and the results are displayed, again, similarly to the Live Update page complete with a link to view further details.
5 System Operation
To illustrate the systems operation this section will contain a walk through of the workow as experienced by a network operator and will conclude with a comparison of this and the previous processed dened in the Design chapter.
5.1
Usage Scenario
A Network Operator wishes to know if there are any current problems with external connectivity (i.e. connections to the wide Internet) on the network. There is already a prole set up in NfSen-HW to monitor trac passing outward/inward to the Internet known as External. The process of investigation follows these steps: 1. 2. 3. 4. Examining Live Update for indications of Aberrant Behaviour. Filtering the results to only show relevant information. Viewing further details of a specic event. Analysing the results and editing the event details to reect the results.
5.1.1
Examining Live Update for Aberrant Behaviour
Figure 5.1: Investigation Process Step 1
57
CHAPTER 5. SYSTEM OPERATION
58
This page indicates all currently or recently active aberrant events as detected by the system. As you can see it contains both data from the live prole and the External prole. The view can be tailored to see only the External sources.
5.1.2
Filtering the results
Only aberrant behaviour involving data being received from or passed to external hosts is now show in the summary. From this more information can be requested.
5.1.3
Viewing further Details
Further details contains more information about the event, including a graph of an appropriate time period with the actual time period marked in green, the top ows ordered by the trac type, in the case of the screenshot this is any trac type, and a lookup of the most relevant hostnames and port numbers. From this display an operator could easily identify that the activity here is nothing of concern and then, using the edit section, a comment could be added to this eect.
59
60
5.1.4
Analysis and editing event details
Adding conclusions of the ndings is very simple, and this information is then stored in the database for other operators to see.
Figure 5.4: Investigation Process Step 4 - Editing
Figure 5.5: Investigation Process Step 4 - Inserting
5.1.5
Summary
A comparison between this network operators workow and the original example given in the Design section shows a number of improvements. Firstly, there is one single location for nding aberrant behaviour instances, the operator does not need to view individual graphs of proles and sources as the database picks up all relevant information. This information can be displayed in the manner the operator chooses, so initial assessments of problem scope can be made. Leading on from this, the biggest improvement over the previous process is the ability to see a large amount of relevant information in one place. The Details section provide basic information about the duration and location of the problem as well as suggesting possible causes via NetFlow statistics, and nally indicates likely explanations of the issue by indicating the hostnames and services in use. This is information that previously the operator would have had to nd out by hand. The
61
Details section also provides an assessment of possibly associated events which can be viewed in more detail. This reduces the amount of time the operator might have required to nd out other areas and services aected by the event. Figure 5.6 shows a nal sequence diagram depicting the system in operation during the preceding usage scenario.
Figure 5.6: Sequence Diagram of System Operation
6 Testing and evaluation

6.1 Testing
In order to thoroughly test the system I used a number of dierent testing stratgies, each of which will be covered in detail during this chapter.
6.1.1
Defect and Component Testing
According to Ian Sommervile the goal of defect testing is to expose defects in a software system before the system is delivered [2004 p442]. He provides a graphical example of a general model of the defect testing process [2004 p443].
Figure 6.1: General Defect Testing Model
His suggestion for testing system usage and operational features is to meet the following criteria [2004 p443]. 1. All system functions that are accessed through menus should be tested. 2. Combinations of functions that are accessed through the same menu should be tested. 3. Where user input is provided, all functions must be tested with both correct and incorrect input. Test Cases have been identied in order to meet this criteria. For this stage of the testing cycle I have separated out the components to test their individual correctness. Once this testing has been completed there will be further testing to ensure that the integrated system works as it should. The conclusions reached by carrying out this testing will be discussed afterwards. 62
CHAPTER 6. TESTING AND EVALUATION
63
Firstly, some tests to ensure that Sentinel.jar is parsing the XML le and inserting the data into the database correctly. For these tests the Java application will be treated as a separate componant reading from the XML outputted format of a small RRD le. The results will be displayed at the command line rather than inserted into the database, apart from the test cases which involve testing database connectivity and correctness. Each test case will be dened by a number, a description of the test, the expected outcome and the result. The sections are broken in to separate tables for ease of viewing. Sentinel.jar - XML Parsing No. Test Description Expected Outcome 100 Parse XML le for the last update The correct last update time is time. printed to screen 101 Parse XML le for Aberrant Marks. Seven Aberrant Marks are found and printed to screen. 102 Correctly specify the times that The times the Aberrant the Aberrant Marks occurred. Marks occurred are correct. 103 Parse the XML le for the trac The correct list of trac types in use. types are printed to screen. 104 Correctly identy the trac type The trac type of each in use for each Aberrant Mark. Aberrant Mark is correct. Table 6.1: Sentinel.jar Testing - XML Parsing
Result PASS PASS PASS PASS PASS
Sentinel.jar - Source and Prole detection No. Test Description Expected Outcome 105 The path to the RRD le specied The correct path as specied as a command line argument can be at the command line is printed read in by the system. to screen 106 The path can be broken down into The correct path and source is the correct prole and source. found and printed to screen. Table 6.2: Sentinel.jar Testing - Source and Prole Detection
Result
PASS PASS
From the tests specied in Figures 6.1, 6.2 and 6.3 it can be seen that the Java portion of the Sentinel system is working correctly, both in its XML parsing and in its database connectivity. It is able to retrieve the source and prole names from the path it is supplied. It can also retrieve all the necessary information from the RRD les in XML form, including the correct date and time per Aberrant Mark, which was a concern at the Implementation stage. It is capable of querying the database for results already held, and based upon that knowledge can insert or update currently held event information.
64
Sentinel.jar Database Connectivity No. Test Description Expected Outcome 107 Check for the presence of the trac All of the data types are types in the database. present in the database. 108 Retrieve the ID numbers of each The correct ID numbers and trac type from the database. trac types are printed to screen. 109 Check for the presence of the found Of the two detected data data sources in the database. sources, one is present in the database and one is not. 110 Check for the presence of the found Of the two detected data data proles in the database. proles, one is present in the database and one is not. 111 Insert the data source not already The data source not present present, into the database. should be inserted into the database 112 Insert the data prole not already The data prole not present present, into the database. should be inserted into the database. 113 Retrieve the ID numbers for each of The correct ID number and source the data sources from the database. should be printed to screen. 114 Retrieve the ID numbers for each of The correct ID number and prole the data proles from the database. should be printed to screen. 115 Check for the existence of an Two of the detected Aberrant Aberrant Event in the database with Marks have equivalent entries in the same start time, prole, type the database, the rest do not. and source as each Aberrant Mark found, but with an end time ve minutes earlier. 116 Insert the full details of each The ve Aberrant Marks without detected Aberrant Event which does equivalent entries should be not have an equivalent Aberrant inserted into the database. event already present in the database. 117 Update the end time of each of the The two Aberrant Events in the equivalent Aberrant Events in the database should have their end database to be the same as the found time updated to be the same as Aberrant Mark it matches. the two Aberrant Marks. Table 6.3: Sentinel.jar Testing - Database Connectivity
Result PASS
PASS
PASS
PASS
PASS
PASS PASS PASS
PASS
PASS
PASS
65
Next, runSentinel.sh must be tested to ensure its correct operation. This will be carried out using a mock directory structure containing two proles and two data sources. The script will be modied rst of all to not delete the XML output it creates, and secondly to print the command it will use to execute Sentinel.jar to the screen instead of running it. This will ensure that it is behaving correctly, further testing will ensure that the two components work together in the proper way. runSentinel.sh No. Test Description Expected Outcome 200 XML les are created for every valid After the script has been run RRD le within the directory appropriate XML les should structure. exist. 201 For every XML le created it should The correct command to run create a valid set of arguments to Sentinel.jar should be printed run Sentinel.jar correctly to screen as the script meets each XML le to be parsed. Table 6.4: runSentinel.sh Testing
Result
PASS
PASS
As these test runSentinel.sh passes in every case possible to test while each component is being dealt with independently.
6.1.2
Functional and Integration Testing
Functional testing, sometimes known as Black Box testing is, according to Ian Sommerville an approach to testing where the tests are derived from the program or component specication, the system is a Black Box and its behaviour can only be determined by studying its inputs and the related outputs[2004 p443].
Figure 6.2: Functional Testing Model
66
In the case of Sentinel, this is also a form of integration testing as all the individual components have to work together in order to meet the specication goals. Sommerville provides a graphical example of functional testing which illustrates how to view the system when conducting the tests which is shown in Figure 6.2. Testing in this area is mostly centered around the use of the User Interface, in Sentinels case the web front end. Inputs will be chosen as test cases and outputs recorded. This section uses a database which contains a certain amount of test data, including two data sources, two proles and 25 events of multiple trac types. Some of the events are older than 24 hours, which in the test case is the amount of data to be shown in the Live Update. One of the events is marked as a False Positive, one as Acknowledged, and one as both; all three have a comment stored. Three of the events share the same start and end times. During the test I simulated some aberrant behaviour by pinging a machine on the network at a very high speed for approximately 15 minutes. The tables of test cases are shown over the next three pages. From the results of the test cases it can be seen that the Sentinel system has integrated successfully and works as it was intended.
67
Table 6.5: Sentinel UI Functional Testing - Live Update Sentinel UI - Live Update No. Test Description Expected Outcome Result 301 Opening a web browser and loading Page should display only a the Sentinel Live Update page. set period of Aberrant Events and a complete list of types, sources and proles excluding those agged as Acknowleged or False Positive. PASS 302 Filter Aberrant Events for only one In every case, only Aberrant data source. Should be tried with Events involving that data every data source listed. source should be shown. PASS 303 Filter Aberrant Events for only one In every case, only Aberrant trac type. Should be tried with Events involving that trac every trac type listed. type should be shown. PASS 304 Filter to display events which have The event agged as been agged as Acknowleged. Acknowledged should be shown alongside the normal results but not the event marked as Acknowledged and False Positive PASS 305 Filter to display events which have The event agged as been agged as False Positive. False Positive should be shown alongside the normal results but not the event marked as Acknowledged and False Positive PASS 306 Filter to display events which have The events agged as been agged as False Positive or False Positive or Acknowledged Acknowledged. alongside normal results as well as the events agged as both Acknowledged and False Positive. PASS 307 Display every Aberrant Event within The default list of Aberrant the set timeframe by selecting every Events should be displayed, kind of lter possible at once. plus the Events which had been agged as Acknowledged or False Positive. PASS 308 Leave the Live Update page open for Every ve minutes the table approximately an hour. (During this should refresh. At some point time, aberrant network trac will during the hour, the new be created.) Aberrant Behaviour should be detected and displayed. PASS 309 Click on the Details link of a The Details page should be particular Aberrant Event. displayed with information pertaining to that event. PASS
68
Table 6.6: Sentinel UI Functional Testing - Details Sentinel UI - Details No. Test Description Expected Outcome 310 Leading on from test 309, load the The Details page should be Details page by clicking on the Details loaded with information about link from an Aberrant Event that event. A graph should be shown which covered the related time period, the exact times of the event should be highlighted in green. Statistical details based on the ows should be shown and a lookup of the top top IP addresses and port numbers. 311 Check for associated Aberrant Events. These should be displayed at the bottom of the page. 312 Click on the Details link from one of A similar Details page should the associated Aberrant Events. be loaded with details relevant to the new event. 313 Edit the details of an event by The UI should redirect to a changing the acknowledgement status dierent page indicating the and false positive status to yes success of the alteration. and adding/altering a comment. The new details should be inserted into the database. 314 Return to the Live Update page and The newly edited event should lter the results to show events not have been displayed which have been marked as false initially but should appear positive and acknowledged. then the lter is applied.
Result
PASS PASS
PASS
PASS
PASS
69
Table 6.7: Sentinel UI Functional Testing - Review Sentinel UI - Review No. Test Description Expected Outcome Result 315 Load the Review page of the web No events should be shown, interface. but a lter interface for searching. PASS 316 Search for an event with a specic Using a specied start and start time and end time. No other end time known to be in the lters. database, four events should be found; one from each prole and source. PASS 317 Search for the same start and end The two results as shown times as test 316 but lter to previously connected to that only show sourceA. source should be displayed. PASS 317 Search for the same start and end The two results as shown times as test 316 but lter to previously connected to that only show proleA. prole should be displayed. PASS 318 Search for events starting between All the events which start in two dates. that period should be displayed and no others. PASS 319 Search for events ending between All the events which end in two dates. that period should be displayed and no others. PASS 320 Search for events ending between the In each case, only the events same dates as test 319, but only those which concern that network of each listed type individually. trac type should be shown. If there are none then there should be nothing displayed. PASS 321 Search for events starting between the In each case, only the events same dates as test 318, but only those which concern that network of each listed type individually. trac type should be shown. If there are none then there should be nothing displayed. PASS 321 Search for events starting and ending In each case, only the events at the same times as test 316 but which concern that network only those of each listed type trac type should be shown. individually. If there are none then there should be nothing displayed. PASS
70
6.2
User Interface Evaluation
The user interfaces within Sentinel are intended to be simplistic, but functional and their use should be fairly straight forward. One of the key things kept in mind when designing these interfaces was the situation in which they would be used. When diagnosing a problem, a network operator does not want the information to be spread across multiple pages, the information should be presented in a coherant clear way in as little time as possible. For this evaluation I shall assess each section of the user interface in turn. The idea behind the Live Update page was two fold, it should be functional such than an operator could use it on their personal machine, but also so it could be used in an oce environment as a network monitoring tool. The page if displayed on a larger screen would give anyone concerned an instant overview of any strange behaviour on the network which they could then investigate more thoroughly, using the same interface at their personal machine. The colour scheme is very basic, colour is not very important as long as the information is clear. The lters are very clear and simple; it should be fairly obvious how they are intended to be used, but the aim was to provide a quick way of ` narrowing down on a particular problem - on a network the size of GEANT2 there could be a large amount of aberrant behaviour occuring at any one time and it is important that the operators should be able to see exactly what they require. The Details page was designed with similar aspirations. The page has no use as an oce wide monitoring solution so there is no requirement for the information to be so stripped down. This page is to provide more information to the network operator so that they can perform some analysis and hopefully make an initial suggestion as to the cause of the anomaly. There are three pieces of information which are very important, the rst is the graph of the time period. This gives an instant view of what was happening on the network as this aberrant event was triggered. As mentioned previously, it displays information about a longer period of time than the actual event lasted, this is to give a better overview; operators can see what was happening in the build up to an event, and in the case of events which have ended, what happened at the end to cause the event to nish. The second important section is the statistical analysis. This shows in detail what the graph gives an indication of, broken down into what was causing the most trac (be that ows, packets or bytes) and what protocols it was using. The IP addresses and port numbers are then looked up to provide an extra level of information. The nal important section shows potentially associated events. The aim of this information is to give the operator a better indication of the scope of the problem, and to provide easy links to investigate other problematic areas. It is displayed in the same style as the Live Update page, full details are not required unless the operator is specically interested in them, in which case the details link can be clicked. The Details page also provides the ability to edit the details of an event via a form. The details of the event are automatically lled into the details of the form so if an
71
operator chooses to change something, they know what the current values are before they start. It is fairly simplistic, but again was designed with the aim of being quick and simple. An operator does not want to be delayed when performing his work by a complicated interface design. The Review page is simply an interface to the historical information, a way of building queries for the database. The most important factor was to provide a exible lter system, operators can search for events either by knowing the exact start and end times, or just requesting any events which started or nished between two dates. This then can be ltered further in a similar style to the Live Update page, by selecting dierent pieces of information which should be present in the results. The style is common across all sections of the interface so that once the technique of ltering is understood there is no further knowledge required. Results are again presented in the same style as on the Live Update page with further details available on request. Overall the user interface serves its purpose, it is clean, clear and simple which are the most important factors for how it is intended to be used. The design could have been more polished, and the lters organised in a more exible way but due to the nature of the data, it cant be known before then system is run how many sources and proles there are. They should be organised into blocks of ve per line, incase more than 5 are listed which keeps them together in a sensible way which shouldnt overll the webpage. Other than that the design is a successful interpretation of the requirements and needs of a large network support oce.
6.3
Evaluation
This is broken down into two sections, a comparison of the original derived requirements list and the nished system, and then an overview containing some feedback from the network operator at DANTE who has been my liaison for the project.
6.3.1
Requirements List Review
In order to evaluate the success of the system I am going to go back over each of the derived requirements from the Design chapter, and assess how well this requirement has been met. A.1 Aberrant network behaviour instances should be displayed together on one page organised by the time they occurred. This is fully realised in the Live Update section. Aberrant network events are displayed in a tabular format organised by the end time, so the events more recently active are
72
displayed nearest the top of the list. The reasoning behind this was so the Live Update page could be used as an oce wide network monitoring screen, where every event could be seen easily. I feel this has been acheived successfully. A.2 Only the most relevant information for each aberrant behaviour instance should be displayed. The interface design was created so that the Live Update page would be merely a list of events, with very basic information. I decided that the most important information was the start and end times, since this should be what the events are ordered by. Then the trac type, prole and source, as this identies where on the network the behaviour is occurring. Then the two ags are listed, whether the events are acknowledged or marked as false positive. This is not necessarily vital information about the event, but it aids understanding of the interface as a lter is applied removing events marked with those ags at the start. The last entry per event is a simple link to the Details page where more information can be found. I believe this requirement has been met successfully. A.3 Aberrant network behaviour instances should be aggregated to display one event per continuously agged period. This functionality is provide via the database and the Java component. As an event is added to the database, the database is checked to see if there is an existing event with identical details, other than the end time. If so, only the end time is updated rather than a new entry made. This was of crucial importance to the design as it made the potentially large amounts of held data much more manageable and created the possibility for associated events to be identied very easily. A.4 This display should automatically update as new aberrant behaviour is detected on the network. This has also been acheived, the Live Update page automatically refreshes and retrieves any new aberrant network event data when it does so. A.5 The display should be accessible from machines other than the machine it is installed on. This requirement is met as the interface is all web based, the communication to the database occurs over the network so providing the server can be accessed then the web interface can be too.
73
A.6 Each aberrant network behaviour events should be displayed in an identical style so quick comparisons of information can be made. The same information is retrieved for each aberrant network event, and this is displayed as a list in a table. The table is organised by time, so it should be easy to scan the list for the events you are looking for. This style is carried throughout the system and is used on other pages so when an operator gets used to the layout and information presentation it will aid his work. B.1 The information displayed as part of the live update can be ltered to show only instances which match particular conditions. All of the results displayed on the Live Update page can be lted to show only specic information. This is exible, so any amount of lters for things to be included can be added. The ltered display is only a temporary thing however, perhaps functionality to persist lters across aberrant event display updates might have been useful. The implemented system does acheive what was stated but could perhaps have been more usable with a slightly dierent implementation. B.2 The default update should contain information the network operator believes to be the most relevant in the rst instance. This is connected to the previous requirement, Sentinel is implemented so that events of any trac type, source and prole are displayed as a normal Live Update, but the display is ltered by default to not show any events which have been marked as acknowledged or as a false positive. This was implemeted after discussions with both the network operator at DANTE and a network specialist based at Lancaster University and the reasoning behind it is that if an event has been dealt with then it no longer needs to be shown as a current event. If there is another event which is connected to an acknowledged one, then it should be shown as associated from within the Details section. This is simply for speed of viewing on the main update page and has been implemented successfully. C.1 Further information should be available for each aberrant network behaviour instance on request. This requirement is met via the Details section, every event displayed as part of the Live Update also supplies a link to view further details if it is required.
74
C.2 This information should include, at the least, a graph of the time frame in question and a brief statistical synopsis for the given period and trac type. The Details page shows all the required information and also gives further details by performing a hostname lookup on the top four IP addresses, and a service lookup on their respective source and destination ports. C.3 This information should be persistant beyond deletion of the actual NetFlow records for that aberrant network behaviour event. The statistical analysis and service/hostname lookup information is held as a text record as part of each event in the database. The graph however is only held for as long as the RRD les are scheduled to last, and over time will lose its accuracy. The statistical analysis is more information that would normally have been available in such a situation however, and so I feel this requirement has been met. C.4 It should be made obvious if a particular aberrant network behaviour event has been agged as a false positive when examining further details. On the Details page, if the specied event has been marked as a false positive, this is displayed in large writing above the graph. This is so a network operator does not waste time re analysing something which has already been assesed and found to be an error. D.1 If further information about an aberrant network behaviour event is requested then a display should also be provided of possible associated events. The Details page provides this as a table in a similar style to the Live Update display. The results present in this table to not follow the same restrictions as the Live Update page and this list includes events which have been marked as acknowledges or false positive. This is because the events are related regardless of whether they have been dealt with or determined to be inaccurate. D.2 Further information pertaining to these associated aberrant network behaviour events should be available on request.
75
In the same way as the Live Update page, the associated events which have been identied provide a link back to the Details page for more information regarding themselves. E.1 Detected aberrant network behaviour events should be recorded in some form of persistant database. This is one of the most basic requirements and it has been achieved admirably, without the existence of a database the system would not function. E.2 The database should be reliable, quick to query, and scale well to holding potentially very large data sets. The database server used is MySQL which is commonly used in industry for much more time critical applications than the regular ve minute updates that Sentinel works from. The database has been implemented so there is no data repetition, most searching and queries are done based upon ID numbers which are the easiest thing to index and generally the quickest information to query upon. The way the database has been designed to hold information about continuous aberrant behaviour marks as one entry, rather than as a series means that the data sets involved will be considerably smaller, and it increases the ease with which searches based on date/time can be performed. F.1 It should be possible to view past aberrant network behaviour event details based upon a number of criteria; This is requirement is met via the Review page, the next few requirements specify more detail. F.2 Exact Start time and End time. This is possible using the query form on the Review page. Only events which start and end at that exact time specied will be displayed. F.3 Start time somewhere between two given dates and times. This is also possible using a dieremt section of the query form on the Review page.
CHAPTER 6. TESTING AND EVALUATION F.4 Start time somewhere between two given dates and times. Lastly, this is also possible, queried in a similar way to F.3.
76
F.5 Alongside queries based upon the starting and end times results should be chosen according to further specic information; type/source/prole etc. This functionality is also provided alongside the date based lters, other options can be selected to show only events which match those details. Ideally it might have been useful to have been able to specify queries like everything but not using proleX more easily than ticking every prole apart from proleX, but there is still the capability for doing queries of that kind so the requirement is adequately met. F.6 When results have been found it should be possible to view further information about an event in the same way it would be possible for a live event. The results displayed on the Review page are of a similar style to the Live Update. Each one has a basic amount of information, but there is a link to access much more detailed results via the Details page. G.1 Aberrant network behaviour events stored in the system should be able to be agged as acknowledged when they have been dealt with. This feature is built into the database and is possible via the Details interface. G.2 Aberrant network behaviour events stored in the system should be able to be agged as a false positive if they have been identied as such. This is also provided in the database design and again is accessed via the Details interface. G.3 Operators who have dealt with a particular aberrant network behaviour event should be able to leave some comment regarding their ndings for the benet of later users. This is provided with a comment eld in the database, per event. The Details page oeres an interface to leave a comment, or alter a comment that someone else has made. This information is not displayed as part of the Live Update screen, but is shown when more details are requested about a specic event.
77
6.3.2
Summary and Feedback from DANTE
The overall aim of this project was; To assist a network operator in the identication and diagnosis of network problems and illustrate how the inclusion of automated aberrant behaviour detection could improve large network monitoring. From this there were certain lower level requirements derived which I have shown to all have been successfully met, however, I wanted to look at the statement again and just assess how far this project has gone to achieve those aims. The Sentinel system is functional and it does exactly what it set out to do, it assists a network operator by aiding his workow, providing all the right information in one place. That however is a feature of many network monitoring systems, and where Sentinel is dierent is the automated aberrance detection and it is that which makes it the interesting prospect that it is. When I had nished the implementation I contacted the network operator in DANTE who I have been liaising with throughout the duration of the project, his name is Maurizio Molina. He very kindly oered to review my project and provide me with feedback about how well it met his requirements. Overall he was very pleased, its a functional self contained project which acheives what it set out to - the initial requirements were created based upon conversations with him about his workow and the way he utilised the NfSen data as part of his work. His initial hopes had been that I would take NfSen-HW and use it to examine whether it was useful in detecting aberrance, rather than providing a fully functional solution for viewing the aberrant events it detected, so in some ways my project was beyond his expectations. His only criticism was that he would have appreciated more research into NfSen-HW and how successful its aberrance detection was based upon the ` GEANT2 data, but having implemented what I have has indicated that NfSen-HW does indeed do as it claims and Holt-Winters Forecasting is generally successful, within its ` limits. More research would have been carried out based upon the data from GEANT2 if it had not been for the data source issues present in NfSen-HW. Most importantly I believe that the project does indicate the usefulness of aberrant behaviour detection as part of a wider network monitoring strategy for large network providers. It is another source of information when diagnosing problems which hasnt really been taken advantage of. The prototype I have created could be extremely informative, perhaps with a little more development regarding anomaly detection methods, this is something I will discuss in my conclusion.
7 Conclusion
7.1 Overview
Looking back over the project as a whole, I am extremely happy with the outcome. Firstly, I have gained quite an indepth understanding of network monitoring techniques, aberrance detection methods, a selection of Linux tools and conguration options, and the experience of dealing with a real world network operation centre in DANTE. This is all knowledge I have developed whilst working on the Sentinel system, and it has been a very worthwhile learning experience. Secondly, the system that has been developed works as it was intended, and it provides and interesting angle on network monitoring which according to my research, has not been widely utilised. The feedback I received from Maurizio helped prove to me that the project has been a success and that this is an area where there is still much work to be done. This is something Ill come back to as part of the Further Work section. There are some areas where I think further development could have been undertaken. The database was more or less exactly what was required based on the needs of the data, so if I were to redevelop from scratch, I think I would keep the same database design. One area I would look into changing is the method of picking up data from the RRD les. Whilst Java provide a functional, working application, it probably isnt a language most suited to the task. Unfortunately I spent so much time researching NfSen, NfDump, NfSen-HW and RRDtool that when I discovered the Java RRD libraries were not compatible, there was not enough time to change the development plan. Hence the XML parsing solution was implemented, and successfully, but with hindsight I would have looked into developing that part of the project using a lower level language, perhaps Perl where there is a substantial chance of an RRD interface library that works. The second thing which would be reconsidered is the development of the web interface as an independent entity. It might have been possible, having used a Perl RRD interface, to tie the web interface into NfSen-HW as a plugin. I investigated the possibility of implementing the web interface as a stand alone front end plugin, that is one without a back end plugin to go along side it, but unfortunately it seems that the NfSen-HW snapshot does not recognise front end plugins withoug a corresponding perl module back end. I dont think that the Sentinel web interface lost anything by being an independent system however, it just might have been a nice touch to tie everything together. I also believe that the web interface could have been made more sophisticated, I am not a web developer by choice and the interface, whilst meeting all of the desire criteria, was very plain and simple. Perhaps using some other development language it might have been easier to implement, but PHP met all the needs and satised all of the requirements. This is again something which I would examine if I were do either continue the project or to start again. Something that would need to be added in order to move the system from being a prototype to becoming a live service would be user authentication and permissions, this would require changes to the database and front end, but 78
CHAPTER 7. CONCLUSION
79
shouldnt be too dicult. Its not something which causes problems for the prototype, and security has been considered in its development, but it would be a required feature in any real world network operations centre. One of the biggest problems was infact the software I was intefacing with, NfSen-HW. It is a project very much in beta development, and it is based upon an old version of an application which at the time of the snapshot was still having major bugs ironed out of it. The problems cannot be rectied in NfSen-HW as it stands due to the amount that has been added to the default codebase, so the development team from Hungarnet would have to start from scratch with the latest version. If I had not encountered so much ` diculty with adding the GEANT2 data into NfSen-HW, there would have been more research into the accuracy of the results and the implications for ne tuning, this would have been a highly useful addition to the report, sadly it was just not possible with the time available.
7.2
Further work
There is a lot of scope for this project to be taken further. What I have produced is an indication of how incredibly useful aberrance detection could be in real world network monitoring environments, but it bases all of its aberrance detection on one method and one technology. As I identied within the Background and Related Work section, there are many dierent aberrance detection methods being researched, the most interesting of which I believe to be the use of entropy to detect changes in network use. If this project were to be continued I would like to see an investigation into the use of entropy to detect anomalous events, and a comparison of those results with further study regarding the accuracy of NfSen-HW, as I mentioned in the previous section. Whilst there is a great amount of further research to be done surrounding this topic, I feel that in order for NfSen-HW to be regarded as a decent platform for development some time should be given to bring the project up to speed. The latest versions of NfSen use an entirely dierent RRD structure compared to the snapshot NfSen-HW is built on which means that their results are incompatible. The RRD structure in the newer version is more distributed and logical, each trac type is divided out into its own RRD le rather than storing everything in one source RRD. Allowing time for development could also mean that Peter Haag has had chance to implement the suggestions Gabor Kiss made regarding the plugin functionality, and that could result in NfSen-HW being implemented simply as a plugin for NfSen. This would be the ideal as it would allow the version of NfSen to always be the most up to date and least likely to cause problems in development.
A Acknowledgements
The following individuals helped in the develoment and design of this project:
Maurizio Molina, DANTE Network Operator For large amounts of help and advice regarding the process and systems in place at DANTE, and for some very delicious Italian food. Gabor Kiss and Janos Moh`csi a For taking the time to answer my questions regarding NfSen-HW and RRDtool.
80
B Project Proposal
See the pages following.
81
C JavaDoc
See the pages following.
82
D NfDump(1) Manpage
nfdump(1) NAME nfdump - netflow display and analyze program SYNOPSIS nfdump [options] [filter] DESCRIPTION nfdump is the netflow display and analyzing program of the nfdump set. It reads the netflow data from files stored by nfcapd and cesses the flows according the options given. The filter syntax is parable to tcpdump and extended for netflow data. Nfdump can also play many different top N flow and flow element statistics. nfdump(1)
tool procomdis-
OPTIONS -r inputfile Read input data from inputfile. Default is read from stdin. -R expr Read input from a be one of: /any/dir /dir/file /dir/file1:file2
sequence of files in the same directory. expr may Read all files in directory dir. Read all files beginning with file. Read all files from file1 to file2.
-M expr Read input from multiple directories. expr looks like: /any/path/to/dir1:dir2:dir3 etc. and will be expanded to the directories: /any/path/to/dir1, /any/path/to/dir2 and /any/path/to/dir3 Any number of colon separated directories may be given. The files to read are specified by -r or -R and are expected to exist in all the given directories. The options -r and -R must not contain any directory part when used in conjunction with -M. -m Sort the netflow records according the date first seen. This option is usually only useful in conjunction with -M, when netflow records are read from different sources, which are not necessarily sorted. -w outputfile If specified writes binary netflow records to outputfile ready to be processed again with nfdump. The default output is ASCII on stdout. -f filterfile Reads the filter syntax from filterfile. Note: Any filter directly on the command line takes precedence over -f.
specified
83
APPENDIX D. NFDUMP(1) MANPAGE

-t timewin Process only flows, which fall in the time window timewin, where timewin is YYYY/MM/dd.hh:mm:ss[-YYYY/MM/dd.hh:mm:ss]. Any parts of the time spec may be omitted e.g YYYY/MM/dd expands to YYYY/MM/dd.00:00:00-YYYY/MM/dd.23:59:59 and processes all flow from a given day. The time window may also be specified as +/- n. In this case it is relativ to the beginning or end of all flows. +10 means the first 10 seconds of all flows, -10 means the last 10 seconds of all flows. -c num Limit number of records to process to the first num flows. -a Aggregate netflow data. Aggregation is done at connection level. -A fields[/netmask] Aggregate netflow data using the specified fields, where fields is a , separated list out of srcip dstip srcport dstport. The default is using all fields: srcip,dstip,srcport,dstport. An additional netmask may be given. In that case flows from the same subnets are aggregated. In order to do proper aggregation, the IP version is important, for which the mask applies. Therefore the IP protocol version must be given in the form of: srcip4/24 for IPv4 or srcip6/64 for IPv6 address aggregation. Apply the protocol version for dstip respectively. Only flows of the same IP protocol tcp, udp, icmp etc. are aggregated. -I Print flow statistics from file specified by -r, or timeslot specified by -R/-M. The printed information corresponds to pre nfdump 1.5 nfcapd stat files. -S Compatibility option record/packets/bytes. with pre 1.4 nfdump. Is equal to -s
84
-s statistic[:p][/orderby] Generate the Top N flow or flow element statistic. statistic can be: record Statistic about arregated netflow records. srcip Statistic about source IP addresses dstip Statistic about destination IP addresses ip Statistic about any (source or destination) IP addresses srcport Statistic about source ports dstport Statistic about destination ports port Statistic about any (source or destination) ports srcas Statistic about source AS numbers dstas Statistic about destination AS numbers as Statistic about any (source or destination) AS numbers inif Statistic about input interface outif Statistic about output interface proto Statistic about IP protocols By adding :p to the statistic name, the resulting statistic is splitted up into transport layer protocols. Default is transport protocol independant statistics. orderby is optional and specifies the order by which the statistics

is ordered and can be flows, packets, bytes, pps, bps or bpp. You may specify more than one orderby which results in the same statistic but ordered differently. If no orderby is given, statistics are ordered by flows. You can specify as many -s flow element statistics on the command line for the same run. Example: -s srcip -s ip/flows -s dstport/pps/packets/bytes -s record/bytes -O orderby Specifies the default orderby for flow element statistics -s, which applies when no orderby is given at -s. orderby can be flows, packets, bytes, pps, bps or bpp. Defaults to flows. -l [+/-]packet_num Limit statistics output to those records above or below the packet_num limit. packet_num accepts positive or negative numbers followed by K , M or G 10E3, 10E6 or 10E9 flows respectively. See also note at -L -L [+/-]byte_num Limit statistics output to those records above or below the byte_num limit. byte_num accepts positive or negative numbers followed by K , M or G 10E3, 10E6 or 10E9 bytes respectively. Note: These limits only apply to the statistics and aggregated outputs generated with -a -s or -S. To filter netflow records by packets and bytes, use the filter syntax packets and bytes described below. -n num Define the number for the Top N statistics. Defaults to 10. If 0 is specified the number is unlimited. -o format Selects the output format to print flows or flow record statistics (-s record). The following formats are available: raw Print each file flow record on multiple lines. line Print each flow on one line. Default format. long Print each flow on one line with more details extended Print each flow on one line with even more details. pipe Machine readable format: Print all fields | separated. fmt:format User defined output format. For each defined output format except -o fmt:<format> an IPv6 long output format exists. line6, long6 and extended6. See output formts below for more information. -K key Anonymize all IP addresses using the CryptoPAn (Cryptography-based Prefix-preserving Anonymization) module. The key is used to initialize the Rijndael cipher. key is either a 32 character string, or a 64 hex digit string starting with 0x. Anonymizing takes place after applying the flow filter, but before printing the flow or writing the flow to a file. See http://www.cc.gatech.edu/computing/Telecomm/cryptopan/ for more
85

information about CryptoPAn. -q Suppress the header line and the statistics at the bottom. -z Zero flows. Do not dump flows into the output statistics record. file, but only the
86
-Z Check filter syntax and exit. Sets the return value accordingly. -X Compiles the filer syntax and dumps the filter engine table to std out. This is for debugging purpose only. -V Print nfdump version and exit. -h Print help text on stdout with all options and exit. RETURN VALUE Returns 0 255 254 250
No error. Initialization failed. Error in filter syntax. Internal error.
OUTPUT FORMATS The output format raw prints each flow record on multiple lines, including all information available in the record. This is the most detailed view on a flow. Other output formats print each flow on a single line. Predefined output formats are line, long and extended The output format line is the default output format when no format is specified. It limits the imformation to the connection details as well as number of packets, bytes and flows. The output format long is identical to the format line, and additional information such as TCP flags and Type of Service. includes
The output format extended is identical to the format long, and includes additional computed information such as pps, bps and bpp. Fields: Date flow start: Start time flow first seen. ISO 8601 format including miliseconds. Duration: Duration of the flow in seconds and miliseconds. If flows are aggregated, duration is the time span over the entire periode of time from first seen to last seen. Proto: Protocol used in the connection. Src IP Addr:Port: Source IP address and source port.

Dst IP Addr:Port: Destination IP address and destination port. Flags: TCP flags ORed of the connection. Tos: Type of service. Packets: The number of packets in this flow. If flows are aggregated, the packets are summed up. Bytes: The number of bytes in this flow. If the bytes are summed up. flows are aggregated,
87
pps: The calculated packets per second: number of packets / duration. If flows are aggregated this results in the average pps during this periode of time. bps: The calculated bits per second: 8 * number of bytes / duration. If flows are aggregated this results in the average bps during this periode of time. Bpp: The calculated bytes per packet: number of bytes / number of packets. If flows are aggregated this results in the average bpp during this periode of time. Flows: Number of flows. If flows are listed only, this number is alwasy 1. If flows are aggregated, this shows the number of aggregated flows to one record. Numbers larger than 1048576 (1024*1024), are scaled to 4 digits and one decimal digit including the scaling factor M, G or T for cleaner output, e.g. 923.4 M To make the output more readable, IPv6 addresses are shrinked down to 16 characters. The seven most and seven least digits connected with two dots .. are displayed in any normal output formats. To display the full IPv6 address, use the appropriate long format, which is the format name followed by a 6. Example: -o line displays an IPv6 address as 2001:23..80:d01e where as the format -o line6 displays the IPv6 address in full length 2001:234:aabb::211:24ff:fe80:d01e. The combination of -o line -6 is equivalent to -o line6. The pipe output format is intended to be read by another programm for further processing. Values are separated by a |. IP addresses are printed as 4 consecutive 32bit numbers. Output sequence: Address family Time first seen msec first seen Time last seen msec last seen Protocol PF_INET or PF_INET6 UNIX time seconds Mili seconds first seen UNIX time seconds Mili seconds first seen Protocol

Src address Src port Dst address Dst port Src AS Dst AS Input IF Output IF TCP Flags Src address as 4 consecutive 32bit numbers. Src port Dst address as 4 consecutive 32bit numbers. Dst port Src AS number Dst AS number Input Interface Output Interface TCP Flags 000001 FIN. 000010 SYN 000100 RESET 001000 PUSH 010000 ACK 100000 URGENT e.g. 6 => SYN + RESET Type of Service Packets Bytes are
88
Tos Packets Bytes
For IPv4 addresses only the last 32bit integer is used. All others set to zero.
The output format fmt:<format> allows you to define your own output format. A format description format consists of a single line containing arbitrary strings and format specifier as described below %ts %te %td %pr %sa %da %sap %dap %sp %dp %sas %das %in %out %pkt %byt %fl %pkt %flg %tos %bps %pps %bpp Start Time - first seen End Time - last seen Duration Protocol Source Address Destination Address Source Address:Port Destination Address:Port Source Port Destination Port Source AS Destination AS Input Interface num Output Interface num Packets Bytes Flows Packets TCP Flags Tos bps - bits per second pps - packets per second bps - Bytes per package
For example the standard output format long can be created as

-o "fmt:%ts %td %pr %sap -> %dap %flg %tos %pkt %byt %fl" You may also define your own output format and have it compiled into nfdump. See nfdump.c around line 100 for more details.
89
FILTER The filter syntax is similar to the well known pcap library used by tcpdump. The filter can be either specified on the command line after all options or in a separate file. It can span several lines. Anything after a # is treated as a comment and ignored to the end of the line. There is virtually no limit in the length of the filter expression. All keywords are case independent. Any filter consists of one or more expressions expr. Any number of expr can be linked together: expr and expr, expr or expr, not expr and ( expr ). Expr can be one of the following filter primitives: protocol version inet for IPv4 and inet6 for IPv6 protocol proto <protocol> where protocol can be any known protocol such TCP, UDP, ICMP, ICMP6 GRE, ESP, AH, or a valid protocol number.
as
IP address [SourceDestination] IP <ipaddr> or [SourceDestination] HOST <ipaddr> with <ipaddr> as any valid IPv4 or IPv6 address. SourceDestination may be omitted. SourceDestination defines the IP address to be selected and can be SRC DST or any combination of SRC and|or DST. Ommiting SourceDestination is equivalent to SRC or DST. inout defines the interface to be selected and can be IN or OUT. network [SourceDestination] NET a.b.c.d m.n.r.s. for IPv4 with m.n.r.s as netmask. [SourceDestination] NET <net> / num with <net> as a valid IPv4 or IPv6 network and num as maskbits. The number of mask bits must match the appropriate address familiy IPv4 or IPv6. Networks may be abreviated such as 172.16/16 if they are unambiguous. Port [SourceDestination] PORT [comp] num with num as a valid port ber. If comp is omitted, = is assumed.
num-

Interface [inout]
90
IF num with num as an interface number.
Flags flags tcpflags with tcpflags as a combination of: A ACK. S SYN. F FIN. R Reset. P Push. U Urgent. X All flags on. The ordering of the flags is not relevant. Flags not mentioned are treated as dont care. In order to get those flows with only the SYN flag set, use the syntax flags S and not flags AFRPU. TOS Type of service: tos value with value 0..255. Packets packets [comp] num [scale] to specify the packet count in the netflow record. Bytes bytes [comp] num [scale] to specify the byte count in record. Packets per second: Calculated value. pps [comp] num [scale] to specify the pps of the flow. Duration: Calculated value duration [comp] num to specify the duration in miliseconds of the flow. Bits per second: Calculated value. bps [comp] num [scale] to specify the bps of the flow. Bytes per packet: Calculated value. bpp [comp] num [scale] to specify the bpp of the flow. AS [SourceDestination] AS num with num as a valid AS number.
the
netflow
scale scaling factor. Maybe k m g. Factor is 1024 comp The following comparators are supported: =, ==, >, <, EQ, LT, GT . If comp is omitted, = is assumed. EXAMPLES nfdump -r /and/dir/nfcapd.200407110845 172.16.17.18 or dst ip 172.16.17.19 records which match the given filter:
-c 100 tcp and ( src ip ) Dumps the first 100 netflow
nfdump -R /and/dir/nfcapd.200407110845:nfcapd.200407110945 host 192.168.1.2 Dumps all netflow records of host 192.168.1.2 from July 11

08:45 - 09:45 nfdump -M /to/and/dir1:dir2 -R nfcapd.200407110845:nfcapd.200407110945 -S -n 20 Generates the Top 20 statistics from 08:45 to 09:45 from 3 sources nfdump -r /and/dir/nfcapd.200407110845 -S -n 20 -o the Top 20 statistics, extended output format extended Generates
91
nfdump -r /and/dir/nfcapd.200407110845 -S -n 20 in if 5 and bps > 10k Generates the Top 20 statistics from flows comming from interface 5 nfdump -r /and/dir/nfcapd.200407110845 inet6 and tcp and ( src port > 1024 and dst port 80 ) Dumps all port 80 IPv6 connections to any web server. NOTES Generating the statistics for data files of a few hundred MB is no problem. However be careful if you want to create statistics of several GB of data. This may consume a lot of memory and can take a while. Also, anonymizing IP addresses is time consuming and uses a lot of CPU power, which reduces the number of flows per second. Therefore anonymizing takes place only, when flow records are printed or written to files. Any internal flow processing takes place using the original IP addresses. SEE ALSO nfcapd(1), nfprofile(1), nfreplay(1) BUGS There is still the famous last bug. Please report them - all the last bugs - back to me.
2005-08-19
nfdump(1)
E Holt-Winters Forecasting Examples
Figure E.1: Aberrant Marking Example
92
APPENDIX E. HOLT-WINTERS FORECASTING EXAMPLES
93
Figure E.2: Subtracting 40 Minutes Example 1
Bibliography
[Working Documents] Available from: http://www.lancs.ac.uk/burys1/fyp [Barford & Plonka 2001] Barford, P. & Plonka, D. (2001) Characteristics of Network Trac Flow Anomalies. In:IMW 01: Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement, San Francisco, California, USA. ACM Press, New York, NY, USA. pp69-73. [Barford et al. 2002] Barford, P., Kline, J., Plonka, D. & Ron, A. (2002) A Signal Analysis if Network Trac Anomalies. In:IMW 02: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurement, Marseille, France. ACM Press, New York, NY, USA. pp71-82. [Bash, 2007] Bash (2007) The Bash Reference Manual [Internet]. Available from: <http://www.gnu.org/software/bash/manual/bashref.html>[Accessed on 20th February 2007]. [Braukho et al. 2006] Braukho, D., Tellenbach, B., Wagner, A., May, M. & Lakhina. A. (2006) Impact of Packet Sampling on Anomaly Detection Metrics. In: IMC 06: Proceedings of the 6th ACM SIGCOMM on Internet measurement, Rio de Janeriro, Brazil. ACM Press, New York, NY, USA. pp159-164. [Brutlag, 2000a] Brutlag, J. (2000) Aberrant Behaviour Detection in Time Series for Network Monitoring. In:LISA 00: Proceedings of the 14th USENIX conference on System Administration, New Orleans, Louisiana. USENIX Association, Berkeley, CA, USA. pp139-146. [Brutlag, 2000b] Brutlag, J. (2000) Notes on RRDTOOL implementation of Aberrant Behavior Detection [Internet], Microsoft WebTV, Mountain View, California, USA. Available from: <http://cricket.sourceforge.net/aberrant/rrd hw.htm/>[Accessed 20th February 2007]. [Cacti, 2007] Cacti (2007) Cacti - The complete rrd based graphing solution [Internet]. Available from: <http://cacti.net/features.php/>[Accessed 25th February 2007]. [Chateld & Yar, 1988] Chateld, C & Yar, M. (1988) Holt-Winters Forecasting: Some Practical Issues The Statistician, Vol. 37, No. 2, Special Issue: Statistical Forecasting and Decision-Making. 1988, pp. 129-140. [Cricket, 2007] Cricket (2007) Cricket [Internet]. <http://cricket.sourceforge.net/>[Accessed 20th February 2007]. Available from:
[DANTE, 2007] DANTE (2007) Delivery of Advanced Network Technology to Europe [Internet], Cambridge, UK. Available from: <http://www.dante.net/>[Accessed 20th February 2007]. [Debian GNU/Linux, 2007] Debian GNU/Linux (2007) Debian GNU/Linux [Internet]. Available from: <http://www.debian.org/>[Accessed 21st February 2007].
94
BIBLIOGRAPHY
95
[Flow-Tools, 2007] Flow-Tools (2007) Flow-Tools - A toolset for working with NetFlow data [Internet]. Available from: <http://www.splintered.net/sw/ow-tools/>[Accessed 23rd February 2007]. ` ` ` [GEANT2, 2007] GEANT2 (2007) GEANT2 [Internet], Cambridge, UK. Available from: <http://www.geant2.net/>[Accessed 20th February 2007]. [Haag, 2005a] Haag, P. (2005) Watch your ows with NfSen and NfDump [Internet], Presented at 50th RIPE Meeting, Stockholm, Sweden, May 3rd 2005. Available from: <http://www.ripe.net/ripe/meetings/ripe-50/presentations/ripe50-plenarytue-nfsen-nfdump.pdf>[Accessed 10th March 2007]. [Haag, 2005b] Haag, P. (2005) NfDump(1) Manpage. Installed with NfDump the application. Available as an appendix of this report [Appendix C]. [JRobin, 2006] JRobin, (2006) JRobin - A Java port of RRDtool by Sasa Markovic [Internet]. Available from: <http://www.jrobin.org/index.php/Main Page>[Accessed 10th March 2007]. [Kim et al. 2004] Kim, M.-S., Kang, H.-J., Hung, S.-C., Chung, S.-H. & Hong, J. W. (2004) A Flow-based Method for Abnormal Network Trac Detection. In: Proceedings of the IEEE/IFIP Network Operations and Management Symposium, Seoul, April 2004. [Kiss & Moh`csi, 2006] Kiss, G. & Moh`csi, J. (2006) Anomaly detection for a a NFSen/nfdump netow engine - with Holt-Winters algorithm Presented at 19th TF-CSIRT Meeting, Espoo, Finland, 21st September 2006. Avail able from: <http://bakacsin.ki.iif.hu/kissg/project/nfsen-hw/JRA2-meeting-atEspoo slides.pdf>[Accessed 10th March 2007]. [Korzyk, 1998] Korzyk, A. D. Sr, (1998) A Forecasting Model for Internet Security Attacks. In: NISSC 98. Proceedings of the National Information System Security Conference, Crystal City, Virginia, USA, October 6th-9th 1998. [libpcap, 2007] libpcap (2007) libpcap - Packet Capture Library [Internet]. Available from <http://www.tcpdump.org/>[Accessed 21st February 2007]. [MySQL, 2007] MySQL (2007) MySQL - The worlds most popular open source database [Internet]. Available from <http://www.mysql.org/>[Accessed 21st February 2007]. [NetFlow, 2007] NetFlow (2007) Cisco IOS NetFlow [Internet]. <http://www.cisco.com/go/netow/>[Accessed 21st February 2007]. Available from:
[NfDump, 2007] NfDump (2007) NfDump - NetFlow Dump [Internet]. Available from: <http://nfdump.sourceforge.net/>[Accessed 10th March 2007]. [NfSen, 2007] NfSen (2007) NfSen - NetFlow Sensor [Internet]. <http://nfsen.sourceforge.net/>[Accessed 10th March 2007]. Available from:
[NfSen-HW, 2007] NfSen-HW (2007) NfSen - Holt Winters [Internet]. Available from: <http://bakacsin.ki.iif.hu/ kissg/project/nfsen-hw/>[Accessed 10th March 2007].
BIBLIOGRAPHY
[PHPL, 2007] PHP (2007) PHP: Hypertext Preprocessor <http://www.php.net>[Accessed 21st February 2007]. [Internet]. Available
96
from
[Roesch, 1999] Roesch, M. (1999) Snort - Lightweight Intrusion Detection for Networks. In:LISA 99: Proceedings of the 13th USENIX conference on System Administration, Seattle, Washington, USA. USENIX Association, Berkeley, CA, USA. pp229-238. [RRDtool, 2007] RRDtool (2007) RRDtool - logging and graphing [Internet]. Available from: <http://oss.oetiker.ch/rrdtool/>[Accessed 21st February 2007]. [RRD Java Libraries] RRD Java Libraries (2007) RRD Libraries for Java [Internet]. Available from: <http://monstera.man.poznan.pl/wiki/index.php/RRD Java libraries>[Accessed 10th March 2007]. [sFlow, 2007] sFlow (2007) sFlow End User Forum [Internet]. <http://www.sow.org/index.php>[Accessed 22nd February 2007] Available from:
[SNMP, 2007] SNMP (2007)Information about Simple Network Management Protocol and Management Information Base [Internet]. Available from <http://www.snmplink.org/>[Accessed 22nd February 2007]. [Sommerville, 2004] Sommerville, I. (2004) Software Engineering. Seventh Ed. Harlow, Pearson Education Limited. [TCPdump, 2007] TCPdump (2007) TCPdump - Network debugging tool [Internet]. Available from <http://www.tcpdump.org/>[Accessed 21st February 2007]. [Thottan & Ji, 2003] Thottan, M. & Ji, C. (2003) Anomaly Detection in IP Networks. IEEE Transactions On Signal Processing. Vol. 51, No. 8, August 2003. pp2191-2204. [Wagner & Plattner, 2005] Wagner, A. & Plattner, B. (2005) Entropy Based Worm and Anomaly Detection in Fast IP Networks. In:WETICE 05: Proceedings of the 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise, Linkping University, Sweden, June 13-15 2005. IEEE Computer Society, Washo ington, DC, USA. pp172-177.

FYP Report

Uploaded by

Copyright:

Available Formats

FYP Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FYP Report

Uploaded by

Copyright:

Available Formats

Sara Elizabeth Bury Sentinel Aberrant Network Behaviour Indication and Analysis

BSc. Computer Science 22nd March 2007

NfSen-HW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 2.4.2 Architecture and Organisation . . . . . . . . . . . . . . . . . . . . Holt-Winters Forecasting . . . . . . . . . . . . . . . . . . . . . . .

7 Conclusion 7.1 7.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A Acknowledgements B Project Proposal C JavaDoc D NfDump(1) Manpage E Holt-Winters Forecasting Examples

LIST OF FIGURES 6.2 Functional Testing Model . . . . . . . . . . . . . . . . . . . . . . . . . .

` DANTE and GEANT2

` Figure 1.1: GEANT2 Network Topology

2 Background and Related Work

CHAPTER 2. BACKGROUND AND RELATED WORK

CHAPTER 2. BACKGROUND AND RELATED WORK

CHAPTER 2. BACKGROUND AND RELATED WORK

Sources of Network Data

CHAPTER 2. BACKGROUND AND RELATED WORK

Individual Packet Capture

CHAPTER 2. BACKGROUND AND RELATED WORK

CHAPTER 2. BACKGROUND AND RELATED WORK

Existing Network Monitoring Solutions

CHAPTER 2. BACKGROUND AND RELATED WORK

CHAPTER 2. BACKGROUND AND RELATED WORK Capabilities

Figure 2.1: Section of an RRD exported to XML format

CHAPTER 2. BACKGROUND AND RELATED WORK

CHAPTER 2. BACKGROUND AND RELATED WORK

CHAPTER 2. BACKGROUND AND RELATED WORK

An example of the output format:

CHAPTER 2. BACKGROUND AND RELATED WORK

CHAPTER 2. BACKGROUND AND RELATED WORK

CHAPTER 2. BACKGROUND AND RELATED WORK

CHAPTER 2. BACKGROUND AND RELATED WORK

Architecture and Organisation

CHAPTER 2. BACKGROUND AND RELATED WORK

CHAPTER 2. BACKGROUND AND RELATED WORK

CHAPTER 2. BACKGROUND AND RELATED WORK

CHAPTER 3. DESIGN High Level Requirements

A A.1 A.2 A.3 A.4 A.5 A.6

F F.1 F.2 F.3 F.4 F.5 F.6

MySQL and PHP

CHAPTER 3. DESIGN individual components do.

NfSen-HW and NfDump

Figure 3.3: Sentinal Java UML Diagram

types sources proles

CHAPTER 3. DESIGN will take place using foreign keys.

Sentinel Web Interface

CHAPTER 3. DESIGN Live Update

Figure 3.6: Proposed Live Update Web Interface

CHAPTER 3. DESIGN timeframe, regardless of the ags applied. Details

Figure 3.7: Proposed Details Web Interface

Figure 3.8: Proposed Review Web Interface

packets-week.gif packets-year.gif profile.dat traffic-day.gif

traffic-month.gif traffic-week.gif traffic-year.gif

Figure 4.1: Sentinel Java UML Class Diagram

Problems with XML Parsing

Figure 4.2: Sentinel Database UML Diagram

Sentinel Web Interface

Figure 4.3: Aberrant Marking Example

Figure 4.4: Subtracting 40 Minutes Example

Examining Live Update for Aberrant Behaviour

Figure 5.1: Investigation Process Step 1

CHAPTER 5. SYSTEM OPERATION

Filtering the results